public_docs/mysql_recover.md

127 lines
5.2 KiB
Markdown
Raw Normal View History

# Recover percona cluster
1. Identify the new master:
Identify the unit which has `safe_to_bootstrap=1`
juju status mysql
Note #1: If all are '0', SSH to the one with the biggest sequence number. If all have the same seqno, SSH to a random one, and then:
sudo vi /var/lib/percona-xtradb-cluster/grastate.dat
Change safe-to-bootstrap from '0' to '1'
To get the seqnos:
juju run --application=mysql cat /var/lib/percona-xtradb-cluster/grastate.dat
Note #2: If all are '-1', run `mysqld_safe --wsrep-recover` on the 3 and compare, line will say recovered position uuid,nodeno. Take a note of that nodeno.
# For the rest of the example, we assume the master is mysql/0
2. Before bootstrapping the master, it's a good idea to move the VIP there, and also prevent Juju from trying to do any operations in the slaves. The following steps will stop MySQL and move the VIP away from the slaves:
```
juju run-action hacluster-mysql/1 --wait pause
juju run-action mysql/1 --wait pause
juju run-action hacluster-mysql/2 --wait pause
juju run-action mysql/2 --wait pause
```
**Note**: that the number of this unit might be different
Confirm mysql is stopped in those units and kill any mysqld processes if necessary. Also confirm that the VIP is now placed in the master unit.
3. Bootstrap master:
Confirm everything is down and kill if necessary
```
sudo systemctl stop mysql.service
sudo systemctl start mysql@bootstrap.service # Bionic
sudo /etc/init.d/mysql bootstrap-pxc && sudo systemctl start mysql # Xenial
```
4. Run `show global status` and confirm that the master is the Primary unit with the size of 1 (`wsrep_cluster_size` and `wsrep_cluster_status`):
```
MYSQLPWD=$(juju run --unit mysql/0 leader-get mysql.passwd)
juju run --unit mysql/0 "mysql -uroot -p${MYSQLPWD} -e \"SHOW global status;\"" | grep -Ei "wsrep_cluster"
```
5. Update `juju status mysql` to confirm the master is now active:
```
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql
```
Your cluster should be operational by now, the slaves still have to be added
6. Start the first slave:
```
juju run-action mysql/1 --wait resume
juju run-action hacluster-mysql/1 --wait resume
```
7. Run `show global status` and confirm that the master is the Primary unit and that the size has increased by 1 (`wsrep_cluster_size` and `wsrep_cluster_status`):
```
MYSQLPWD=$(juju run --unit mysql/0 leader-get mysql.passwd)
juju run --unit mysql/0 "mysql -uroot -p${MYSQLPWD} -e \"SHOW global status;\"" | grep -Ei "wsrep_cluster"
```
8. Confirm the start slave is now active in Juju:
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql
9. If the state is still not active, note that sometimes, in the slaves, the `systemctl status mysql` output shows as `failed` or `timed out` even if everything looks alright. This happens because the systemd unit times out before MySQL stops resyncing. Restart the service if that's the case:
juju run --unit=mysql/1 "sudo systemctl restart mysql.service"
10. Confirm it is now active:
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql
11. Repeat steps 6-10 to `mysql/2`
12. Final check:
juju status mysql
**Note #1**: If any of the units shows `seeded file missing` at the end of the procedure, you can fix it like this:
juju run --unit=mysql/X 'echo "done" | sudo tee -a /var/lib/percona-xtradb-cluster/seeded && sudo chown mysql:mysql /var/lib/percona-xtradb-cluster/seeded'
**Note #2**: If one of the slaves doesn't start at all, showing something in the lines of "MySQL PID not found, pid_file detected/guessed: `/var/run/mysqld/mysqld.pid`, try this:
```
juju ssh to the unit
sudo systemctl stop mysql
Stop/kill pending mysqld processes;
sudo rm -rf /var/run/mysqld.*
sudo systemctl start mysql
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql
```
**Note #3**: As a last resort, if one of the slaves doesn't start at all, you might have to rebuild its DB from scratch. Use this following procedure:
```
juju run-action hacluster-mysql/X --wait pause
juju run-action mysql/X --wait pause
juju ssh mysql/X
```
Stop/kill pending mysqld processes
```
sudo mv /var/lib/percona-xtradb-cluster /var/lib/percona-xtradb-cluster.bak
sudo mkdir /var/lib/percona-xtradb-cluster
sudo chown mysql:mysql /var/lib/percona-xtradb-cluster
sudo chmod 700 /var/lib/percona-xtradb-cluster
juju run-action mysql/X --wait resume
sudo du /var/lib/percona-xtradb-cluster # to see replication progress
```
Once it's done, check if processes are running (`sudo ps -ef | grep mysqld`) and if the service is showing as up (`sudo systemctl status mysql`); if the service shows as timed out (sometimes systemd times out before sync finishes), restart it: `sudo systemctl restart mysql`
```
juju run-action hacluster-mysql/X --wait resume
juju run --application mysql "hooks/update-status" && juju run --application hacluster-mysql "hooks/update-status" && juju status mysql
```