public_docs/mysql_recover.md

5.2 KiB

Recover percona cluster

  1. Identify the new master:

Identify the unit which has safe_to_bootstrap=1

juju status mysql

Note #1: If all are '0', SSH to the one with the biggest sequence number. If all have the same seqno, SSH to a random one, and then:

sudo vi /var/lib/percona-xtradb-cluster/grastate.dat

Change safe-to-bootstrap from '0' to '1'

To get the seqnos:

juju run --application=mysql cat /var/lib/percona-xtradb-cluster/grastate.dat

Note #2: If all are '-1', run mysqld_safe --wsrep-recover on the 3 and compare, line will say recovered position uuid,nodeno. Take a note of that nodeno.

For the rest of the example, we assume the master is mysql/0

  1. Before bootstrapping the master, it's a good idea to move the VIP there, and also prevent Juju from trying to do any operations in the slaves. The following steps will stop MySQL and move the VIP away from the slaves:
juju run-action hacluster-mysql/1 --wait pause
juju run-action mysql/1 --wait pause
juju run-action hacluster-mysql/2 --wait pause
juju run-action mysql/2 --wait pause

Note: that the number of this unit might be different

Confirm mysql is stopped in those units and kill any mysqld processes if necessary. Also confirm that the VIP is now placed in the master unit.

  1. Bootstrap master:

Confirm everything is down and kill if necessary

sudo systemctl stop mysql.service
sudo systemctl start mysql@bootstrap.service # Bionic
sudo /etc/init.d/mysql bootstrap-pxc && sudo systemctl start mysql # Xenial
  1. Run show global status and confirm that the master is the Primary unit with the size of 1 (wsrep_cluster_size and wsrep_cluster_status):
MYSQLPWD=$(juju run --unit mysql/0 leader-get mysql.passwd)
juju run --unit mysql/0 "mysql -uroot -p${MYSQLPWD} -e \"SHOW global status;\"" | grep -Ei "wsrep_cluster"
  1. Update juju status mysql to confirm the master is now active:
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql

Your cluster should be operational by now, the slaves still have to be added

  1. Start the first slave:
juju run-action mysql/1 --wait resume
juju run-action hacluster-mysql/1 --wait resume
  1. Run show global status and confirm that the master is the Primary unit and that the size has increased by 1 (wsrep_cluster_size and wsrep_cluster_status):
MYSQLPWD=$(juju run --unit mysql/0 leader-get mysql.passwd)
juju run --unit mysql/0 "mysql -uroot -p${MYSQLPWD} -e \"SHOW global status;\"" | grep -Ei "wsrep_cluster"
  1. Confirm the start slave is now active in Juju:

    juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql
    
  2. If the state is still not active, note that sometimes, in the slaves, the systemctl status mysql output shows as failed or timed out even if everything looks alright. This happens because the systemd unit times out before MySQL stops resyncing. Restart the service if that's the case:

    juju run --unit=mysql/1 "sudo systemctl restart mysql.service"
    
  3. Confirm it is now active:

    juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql
    
  4. Repeat steps 6-10 to mysql/2

  5. Final check:

    juju status mysql
    

Note #1: If any of the units shows seeded file missing at the end of the procedure, you can fix it like this:

   juju run --unit=mysql/X 'echo "done" | sudo tee -a /var/lib/percona-xtradb-cluster/seeded && sudo chown mysql:mysql /var/lib/percona-xtradb-cluster/seeded'

Note #2: If one of the slaves doesn't start at all, showing something in the lines of "MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid, try this:

juju ssh to the unit
sudo systemctl stop mysql
Stop/kill pending mysqld processes;
sudo rm -rf /var/run/mysqld.*
sudo systemctl start mysql
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql

Note #3: As a last resort, if one of the slaves doesn't start at all, you might have to rebuild its DB from scratch. Use this following procedure:

juju run-action hacluster-mysql/X --wait pause
juju run-action mysql/X --wait pause
juju ssh mysql/X

Stop/kill pending mysqld processes

sudo mv /var/lib/percona-xtradb-cluster /var/lib/percona-xtradb-cluster.bak
sudo mkdir /var/lib/percona-xtradb-cluster
sudo chown mysql:mysql /var/lib/percona-xtradb-cluster
sudo chmod 700 /var/lib/percona-xtradb-cluster
juju run-action mysql/X --wait resume
sudo du /var/lib/percona-xtradb-cluster # to see replication progress

Once it's done, check if processes are running (sudo ps -ef | grep mysqld) and if the service is showing as up (sudo systemctl status mysql); if the service shows as timed out (sometimes systemd times out before sync finishes), restart it: sudo systemctl restart mysql

juju run-action hacluster-mysql/X --wait resume
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql