Background:
At around 5:05, the single threaded radius boxes (HX1, HX2, HX3, MK3 and
MK5) seemed to have stopped responding. It looked like they were doing nothing at all or no radius requests were coming in for the whole 25 minutes. This is very unusual as all our radius boxes are busy all the time. Please can you investigate what happened.
The situation now is that mk-xx-1 master server has recovered and working as normal, however to enable replication on the slave it requires the bin…14 file, where the master has moved on to 15. Many attempts/tricks have been tried to get this to work , but we keep getting the following error :
Last_error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
We have decided that the only way to make sure the 2 servers are sync'd is to rebuild the slave from a static backup of the master mk-xx-1.
During the day
1) Stop slave
on mk-xx-2.server
Connect to mysql
mysql -uroot --p***
mysql > show slave status \G
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status \G
*************************** 1. row ***************************
Master_Host: mk-xx-1.server
Master_User: repl
Master_Port: 3306
Connect_retry: 60
Master_Log_File: mk-xx-1-bin.015
Read_Master_Log_Pos: 98
Relay_Log_File: mk-xx-2-relay-bin.001
Relay_Log_Pos: 46
Relay_Master_Log_File: mk-xx-1-bin.015
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_do_db:
Replicate_ignore_db:
Last_errno: 0
Last_error:
Skip_counter: 0
Exec_master_log_pos: 98
Relay_log_space: 46
1 row in set (0.00 sec)
3) take backup of slave just radius database – mk-xx-2.
cd /home/backup/
./backup_radius.sh
[FOR INFO:
bash-2.05# cat backup_radius.sh
#!/bin/sh
EMAIL_LIST="dbateam@xx.com"
date=`/bin/date +"%d-%m-%y-%H:%M"`
umask 066
( /usr/local/bin/mysqldump -uroot --p*** --single-transaction --add-drop-table --database radius | gzip ) > /home/backup/radius-$date-`hostname -s`.sql.gz 2> /home/backup/radius-$date-`hostname -s`.sql.err.log
ls -ltr /home/backup/radius-$date-`hostname -s`.sql.gz > /home/backup/filelist.out
ls -ltr /home/backup/radius-$date-`hostname -s`.sql.err.log >> /home/backup/filelist.out
mail -s "mySQL Backup on `hostname -s`" $EMAIL_LIST < /home/backup/filelist.out
END]
3) Check Change Request is approved
4) Stop the application (took 1hr 20)
on mk-xx-1
(NB home is /var/db/mysql)
5) Take note of replication log information on master
Connect to mysql
mysql -uroot --p***
type in
show master status \G
"Keep information safe somewhere"
5) Kick off backup on prod
cd /home/backup/
./backup_radius.sh
Takes about 12 min
5A) restart App (took 1hr). We can carry on with step 6.
6) copy file over to mk-xx-2
rename file first to get rid of colon
mv MYSQL-BACKUP-FILENAME.gz restore.gz
scp restore.gz mk-xx-2.server:/home/backup
root MK.. pwd
7) do the restore on slave
on mk-xx-2.server
cd /home/backup
nohup gunzip < restore.gz | mysql -uroot --p*** &
v.slow box, will take about 3 days to run. Have turned off monitoring and the backup.
ps aux |grep nohup
8) change master log and file position
Connect to mysql
mysql -uroot --p***
type in
--- use Exec_master_log_pos from above
CHANGE MASTER TO MASTER_LOG_FILE='mk-xx-1-bin.0??';
CHANGE MASTER TO MASTER_LOG_POS=???;
start slave;
show slave status\G
flush privileges; à make sure new settings from restore are used
exit
mysql -uroot --p*** à check you can login.
9. turn on backup and monitoring again.
Turn on monitoring and the backup.
No comments:
Post a Comment