主库异常恢复场景
恢复被MGR踢出集群的主库分两种情况,一种情况是被踢出集群的主库的数据文件没有损坏,数据库可以正常启动,这种情况,数据库启动之后,以备库的角色重新加入集群,继续同步新主库产生的日志。另外一种情况是被踢出集群的主库的数据文件损坏或者由于其他原因,数据库无法正常启动,针对这种情况,需重启初始化数据库,重新加入MGR集群,重新开始同步数据。
主库正常启动
- 主库重新加入MGR。
- 查看原先库主从信息。
登录mysql-1数据库:
1
kubectl exec -it mysql-1 -n ns-mysql-test -- mysql -uroot -p
登录MySQL后,查看MGR主从库信息:
SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | group_replication_applier | e44f902b-bf50-11ea-89d7-000000536bb6 | mysql-1 | 3306 | ONLINE | PRIMARY | 8.0.19 | | group_replication_applier | e51ea27b-bf50-11ea-b256-000000536bb7 | mysql-2 | 3306 | ONLINE | SECONDARY | 8.0.19 | | group_replication_applier | e5b471d9-bf50-11ea-8639-000000536bb8 | mysql-3 | 3306 | ONLINE | SECONDARY | 8.0.19 | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------
确认原先主库为mysql-1,从库为mysql-2、mysql-3。
退出数据库。
exit;
- 删除主库mysql-1 Pod(模拟主库异常)。
1
kubectl delete pod mysql-1 -n ns-mysql-test
pod "mysql-1" deleted
1
kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-66bff467f8-9zmdb 1/1 Running 0 2d kube-system coredns-66bff467f8-f5ghl 1/1 Running 0 2d kube-system etcd-centos-10 1/1 Running 0 2d kube-system kube-apiserver-centos-10 1/1 Running 0 2d kube-system kube-controller-manager-centos-10 1/1 Running 0 2d kube-system kube-ovn-cni-97f48 1/1 Running 0 2d kube-system kube-ovn-controller-749cfd66d7-7n64n 1/1 Running 0 2d kube-system kube-ovn-pinger-c756s 1/1 Running 0 2d kube-system kube-proxy-gmfmg 1/1 Running 0 2d kube-system kube-scheduler-centos-10 1/1 Running 0 2d kube-system ovn-central-6458555b48-lkhp2 1/1 Running 0 2d kube-system ovs-ovn-hsr2f 1/1 Running 0 2d ns-mysql-test mysql-2 1/1 Running 0 46m ns-mysql-test mysql-3 1/1 Running 0 25m
- 再次查看主从库情况。
登录mysql-2数据库:
1
kubectl exec -it mysql-2 -n ns-mysql-test -- mysql -uroot -p
查看当前MGR的主从库信息:
SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | group_replication_applier | e51ea27b-bf50-11ea-b256-000000536bb7 | mysql-2 | 3306 | ONLINE | PRIMARY | 8.0.19 | | group_replication_applier | e5b471d9-bf50-11ea-8639-000000536bb8 | mysql-3 | 3306 | ONLINE | SECONDARY | 8.0.19 | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
可见mysql-2已自动升级为主库。
退出mysql-2数据库。
exit;
- 恢复失效的Pod(上文中使用delete模拟失效的mysql-1)。
1
kubectl apply -f ./mysql_deployment.yaml
namespace/ns-mysql-test unchanged subnet.kubeovn.io/ns-mysql-test unchanged pod/mysql-1 created pod/mysql-2 configured pod/mysql-3 configured
1
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-66bff467f8-9zmdb 1/1 Running 0 2d1h kube-system coredns-66bff467f8-f5ghl 1/1 Running 0 2d1h kube-system etcd-centos-10 1/1 Running 0 2d1h kube-system kube-apiserver-centos-10 1/1 Running 0 2d1h kube-system kube-controller-manager-centos-10 1/1 Running 0 2d1h kube-system kube-ovn-cni-97f48 1/1 Running 0 2d1h kube-system kube-ovn-controller-749cfd66d7-7n64n 1/1 Running 0 2d1h kube-system kube-ovn-pinger-c756s 1/1 Running 0 2d1h kube-system kube-proxy-gmfmg 1/1 Running 0 2d1h kube-system kube-scheduler-centos-10 1/1 Running 0 2d1h kube-system ovn-central-6458555b48-lkhp2 1/1 Running 0 2d1h kube-system ovs-ovn-hsr2f 1/1 Running 0 2d1h ns-mysql-test mysql-1 1/1 Running 0 22s ns-mysql-test mysql-2 1/1 Running 0 61m ns-mysql-test mysql-3 1/1 Running 0 40m
可见mysql-1已重新运行。
此处mysql-1是在原先物理机上恢复运行,所以数据库及配置文件中MGR配置信息未丢失,则只需在mysql-1中执行启动MGR命令即可加入群组。
- 在mysql-1所在物理机上执行如下命令,根据提示正确输入MySQL的root密码后,登录mysql-1 Pod中的MySQL。
1
kubectl exec -it mysql-1 -n ns-mysql-test -- mysql -uroot -p
- 登录后执行如下命令启动mysql-1的MGR功能。
START GROUP_REPLICATION;
- 查看主从库信息。
SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | group_replication_applier | e44f902b-bf50-11ea-89d7-000000536bb6 | mysql-1 | 3306 | ONLINE | SECONDARY | 8.0.19 | | group_replication_applier | e51ea27b-bf50-11ea-b256-000000536bb7 | mysql-2 | 3306 | ONLINE | PRIMARY | 8.0.19 | | group_replication_applier | e5b471d9-bf50-11ea-8639-000000536bb8 | mysql-3 | 3306 | ONLINE | SECONDARY | 8.0.19 | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ 3 rows in set (0.00 sec)
可见mysql-1已重新加入群组,角色为从库。
- 手动切换主库到指定目标库。
如上文,mysql-1异常后,mysql-2自动升级为了主库,mysql-1恢复并重新加入了群组,但是mysql-1在群组中的角色是从库,主库仍然是mysql-2。下文以将主库切换回mysql-1为例,说明手动切换主库的操作步骤。
- 登录mysql-1 Pod中的MySQL数据库,并根据提示正确输入MySQL root用户的密码。
kubectl exec -it mysql-1 -n ns-mysql-test -- mysql -uroot -p
- 登录MySQL成功后,使用show variables like '%server_%';语句查看本MySQL的server_uuid信息(即MGR的MEMBER_ID)。
show variables like '%server_%';
+---------------------------------------------------+--------------------------------------+ | Variable_name | Value | +---------------------------------------------------+--------------------------------------+ | group_replication_recovery_ssl_verify_server_cert | OFF | | immediate_server_version | 999999 | | innodb_ft_server_stopword_table | | | original_server_version | 999999 | | server_id | 1 | | server_id_bits | 32 | | server_uuid | e44f902b-bf50-11ea-89d7-000000536bb6 | +---------------------------------------------------+--------------------------------------+ 7 rows in set (0.01 sec)
- 使用查询到的server_uuid作为参数,执行语句切换主库的命令。
SELECT group_replication_set_as_primary('换成目标库的server_uuid');
若切换时间较长,期间可使用如下语句查看切换进度:
SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
本文切换主库到mysql-1示例如下:
SELECT group_replication_set_as_primary('e44f902b-bf50-11ea-89d7-000000536bb6');
+--------------------------------------------------------------------------+ | group_replication_set_as_primary('e44f902b-bf50-11ea-89d7-000000536bb6') | +--------------------------------------------------------------------------+ | Primary server switched to: e44f902b-bf50-11ea-89d7-000000536bb6 | +--------------------------------------------------------------------------+ 1 row in set (1.00 sec)
SELECT event_name, work_completed, work_estimated FROM performance_schema.events_stages_current WHERE event_name LIKE "%stage/group_rpl%";
Empty set (0.00 sec)
SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ | group_replication_applier | e44f902b-bf50-11ea-89d7-000000536bb6 | mysql-1 | 3306 | ONLINE | PRIMARY | 8.0.19 | | group_replication_applier | e51ea27b-bf50-11ea-b256-000000536bb7 | mysql-2 | 3306 | ONLINE | SECONDARY | 8.0.19 | | group_replication_applier | e5b471d9-bf50-11ea-8639-000000536bb8 | mysql-3 | 3306 | ONLINE | SECONDARY | 8.0.19 | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+ 3 rows in set (0.00 sec)
mysql-1角色已切换为主库,切换成功。
主库无法正常启动
数据文件损坏导致的原主库不能重新启动,若想再次加入MGR集群,将被作为一个新节点重新加入。详细操作步骤请参见MGR添加新成员,节点被加入到group后,重新进行数据同步。
父主题: MySQL MGR维护