public_docs/recover_innodb_cluster.md

4.4 KiB

Assuming the following details, and we are trying to recover mysql-innodb-cluster/0

mysql-innodb-cluster/0   active    idle   0/lxd/6  10.0.1.137                     Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/1   active    idle   1/lxd/6  10.0.1.114                     Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/2*  active    idle   2/lxd/6  10.0.1.156                     Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
  1. Grab a backup of the mysql Database

    juju run-action --wait mysql-innodb-cluster/leader mysqldump
    
  2. Stop the mysql on the unit mysql-innodb-cluster/0

    sudo systemctl stop mysql
    
  3. Remove the member from the cluster

    juju run-action --wait mysql-innodb-cluster/leader remove-instance address=10.0.1.137
    

    If the above command doesn't work run with the parameter force=true

    juju run-action --wait mysql-innodb-cluster/leader remove-instance address=10.0.1.137 force=true
    

    Confirm it worked by checking the IP is removed from:

    juju run-action --wait mysql-innodb-cluster/leader cluster-status
    
  4. Re-initialise the DB on the machine locally on problematic node i.e. mysql-innodb-cluster/0

    juju ssh mysql-innodb-cluster/0
    
    sudo -i
    cd /var/lib
    mv mysql mysql.old.$(date +%s)
    mkdir mysql
    chown mysql:mysql mysql
    chmod 700 mysql
    mysqld --initialize
    systemctl start mysql
    

    Check the mysql status: sudo systemctl status mysql

  5. A temporary password would have been created via the --initialize above, so this needs to be updated. You will find the temporary password in /var/log/mysql/error.log, and the line should look similar to the one below

    [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: wyPd_?kEd03p
    

    This can easily grabbed by running the following command

    grep temporary /var/log/mysql/error.log
    

    First we need to get the root password that is stored

    juju run --unit mysql-innodb-cluster/leader 'leader-get mysql.passwd'
    

    Now, we can update the password, login using the password that was suggested in the error.log i.e. wyPd_?kEd03p

    mysql -p -u root
    

    Once logged in, we can update the password to the value that mysql-innodb-cluster charm knows about

    ALTER USER 'root'@'localhost' IDENTIFIED BY 'zn8K73dmqnkZd99JZxXwcFmxWqTxPYgw3Hjx5sk';
    
  6. Remove the flags using Juju:

    Clear flags to force charm to re-create cluster users

    juju run --unit mysql-innodb-cluster/0 -- charms.reactive clear_flag local.cluster.user-created
    juju run --unit mysql-innodb-cluster/0 -- charms.reactive clear_flag local.cluster.all-users-created
    juju run --unit mysql-innodb-cluster/0 -- ./hooks/update-status
    

    After that, you can confirm it worked by getting the password:

    juju run --unit mysql-innodb-cluster/leader leader-get cluster-password
    

    Connect to the unit mysql-innodb-cluster/0 and use the password above:

    mysql -u clusteruser -p -e 'SELECT user,host FROM mysql.user'
    
  7. Re-add instance to cluster (you may need to replace leader by the unit number that is Mode: R/W:

    juju run-action --wait mysql-innodb-cluster/leader add-instance address=10.0.1.137
    juju run-action --wait mysql-innodb-cluster/leader cluster-status
    

Note: If the instance is not added to the cluster use mysqlsh to do this with the step below:

juju ssh mysql-innodb-cluster/2

mysqlsh clusteruser@10.0.1.156 --password=<clusterpassword> --cluster
cluster.add_instance("10.0.1.137:3306")

Choose the option => "[C]lone YES"

You might need to run the command below to configure your instance if you have the error-output below: "NOTE: Please use the dba.configure_instance() command to repair these issues."

If you have the output above run the command below using the mysqlsh CLI:

dba.configure_instance("clusteruser@10.0.1.137:3306")

Note: You will be asked for the password of the user clusteruser and after this step you can add the instance back to the cluster:

cluster.add_instance("10.0.1.137:3306")

choose the option => "[C]lone YES"

After that check the cluster status:

juju run-action --wait mysql-innodb-cluster/leader cluster-status