侧边栏壁纸
博主头像
张种恩的技术小栈博主等级

行动起来,活在当下

  • 累计撰写 748 篇文章
  • 累计创建 65 个标签
  • 累计收到 39 条评论

目 录CONTENT

文章目录

ZFS + GlusterFS 集群故障恢复

zze
zze
2021-01-13 / 0 评论 / 0 点赞 / 1449 阅读 / 10972 字

单机换盘

查看组成存储池的磁盘设备:

$ zpool status
  pool: storage
 state: DEGRADED
status: One or more devices has been removed by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: resilvered 235K in 0h0m with 0 errors on Fri Jan  8 01:49:23 2021
config:

	NAME        STATE     READ WRITE CKSUM
	storage     DEGRADED     0     0     0
	  raidz1-0  DEGRADED     0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	    sdd     REMOVED      0     0     0

可以看到此时 sdd 是处于不可用的状态,此时新加入一块新磁盘,插入后验证在 /dev/ 目录下能找到该设备:

$ ls /dev/sde
/dev/sde

用新插入的磁盘 sde 替换不可用的磁盘 sdd

$ zpool replace storage sdd /dev/sde

检查存储池状态:

$ zpool status
  pool: storage
 state: ONLINE
  scan: resilvered 1.07M in 0h0m with 0 errors on Fri Jan  8 02:04:29 2021
config:

	NAME        STATE     READ WRITE CKSUM
	storage     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	    sde     ONLINE       0     0     0

整机故障

整机故障有两种情况:

  • 一、机器本身正常,但 ZFS 存储池中磁盘设备不可用两个以上,则该节点就相当于不可用,此时需要替换不可用的磁盘为新盘,重新格式化原来的磁盘并构建ZFS 存储池;
  • 二、机器宕机,此时需要更换一台新机器,在新机器中添加新盘构建存储池,并将新机器设为 Gluster 节点,以替换原来的故障节点;

原机换盘恢复

我这里 gluster 集群由 10.0.1.111、10.0.1.112、10.0.1.113 组成,下面模拟 10.0.1.113 中挂掉两块磁盘,然后更换 10.0.1.113 的这两块坏盘进行恢复。

检查 ZFS 存储池发现两块磁盘挂掉了:

$ zpool status
  pool: storage
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	storage     UNAVAIL      0     0     0  insufficient replicas
	  raidz1-0  UNAVAIL      0     0     0  insufficient replicas
	    sdb     FAULTED     12     0     0  too many errors
	    sdc     FAULTED      9     0     0  too many errors
	    sdd     ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	    sdf     ONLINE       0     0     0
errors: List of errors unavailable: pool I/O is currently suspended

此时由这五块磁盘组成的存储池已经是不可用了,并且此时的 storage 池处于无法 destroy 的状态。

sdbsdc 这两块坏盘拔掉,查看块设备列表:

lsblk
NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                         8:0    0   30G  0 disk 
├─sda1                      8:1    0    1M  0 part 
├─sda2                      8:2    0    1G  0 part /boot
└─sda3                      8:3    0   29G  0 part 
  └─ubuntu--vg-ubuntu--lv 253:0    0   20G  0 lvm  /
sdd                         8:48   0   20G  0 disk 
├─sdd1                      8:49   0   20G  0 part 
└─sdd9                      8:57   0    8M  0 part 
sde                         8:64   0   20G  0 disk 
├─sde1                      8:65   0   20G  0 part 
└─sde9                      8:73   0    8M  0 part 
sdf                         8:80   0   20G  0 disk 
├─sdf1                      8:81   0   20G  0 part 
└─sdf9                      8:89   0    8M  0 part 
sr0                        11:0    1 1024M  0 rom  

插上新盘,重启后查看块设备列表:

$ lsblk
NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                         8:0    0   30G  0 disk 
├─sda1                      8:1    0    1M  0 part 
├─sda2                      8:2    0    1G  0 part /boot
└─sda3                      8:3    0   29G  0 part 
  └─ubuntu--vg-ubuntu--lv 253:0    0   20G  0 lvm  /
sdb                         8:16   0   20G  0 disk 
├─sdb1                      8:17   0   20G  0 part 
└─sdb9                      8:25   0    8M  0 part 
sdc                         8:32   0   20G  0 disk 
├─sdc1                      8:33   0   20G  0 part 
└─sdc9                      8:41   0    8M  0 part 
sdd                         8:48   0   20G  0 disk 
├─sdd1                      8:49   0   20G  0 part 
└─sdd9                      8:57   0    8M  0 part 
sde                         8:64   0   20G  0 disk 
├─sde1                      8:65   0   20G  0 part 
└─sde9                      8:73   0    8M  0 part 
sdf                         8:80   0   20G  0 disk 
├─sdf1                      8:81   0   20G  0 part 
└─sdf9                      8:89   0    8M  0 part 
sr0                        11:0    1 1024M  0 rom 

重启后会发现原来的 storage 存储池已不存在了:

$ zpool status
no pools available

重新创建存储池:

$ zpool create storage raidz1 sdb sdc sdd sde sdf -f

查看存储池状态:

$ zpool status
  pool: storage
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	storage     ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	    sdf     ONLINE       0     0     0

errors: No known data errors

挂载存储池到一个新目录:

$ mkdir -p /data/gluster_brick_new
$ zfs set mountpoint=/data/gluster_brick_new storage

重启 Gluster 服务:

$ systemctl restart glusterd

检查存储卷各 brick 的连通状态:

$ gluster volume heal gv info summary
Brick 10.0.1.111:/data/gluster_brick
/8 
/ 
Status: Connected
Number of entries: 2

Brick 10.0.1.112:/data/gluster_brick
/ 
/8 
Status: Connected
Number of entries: 2

Brick 10.0.1.113:/data/gluster_brick
Status: Transport endpoint is not connected
Number of entries: -

可以看到 10.0.1.113 此时是未连接的状态,此时存储卷依旧是由 10.0.1.113 原来已损坏的 brick 组成:

$ gluster volume info gv
 
Volume Name: gv
Type: Disperse
Volume ID: 3efefbf1-d29b-4708-8ca2-f0241b030152
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.1.111:/data/gluster_brick
Brick2: 10.0.1.112:/data/gluster_brick
Brick3: 10.0.1.113:/data/gluster_brick
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on

使用 10.0.1.113 的新 brick 替换原来的 brick:

$ gluster volume replace-brick gv 10.0.1.113:/data/gluster_brick 10.0.1.113:/data/gluster_brick_new commit force
volume replace-brick: success: replace-brick commit force operation successful

再次检查存储卷各节点的连通状态:

$ gluster volume heal gv info 
Brick 10.0.1.111:/data/gluster_brick
Status: Connected
Number of entries: 0

Brick 10.0.1.112:/data/gluster_brick
Status: Connected
Number of entries: 0

Brick 10.0.1.113:/data/gluster_brick_new
Status: Connected
Number of entries: 0

新机恢复

我这里 gluster 集群由 10.0.1.111、10.0.1.112、10.0.1.113 组成,下面模拟 10.0.1.113 直接挂掉,然后新加机器 10.0.1.114 到 gluster 集群替换 10.0.1.113。

检查节点池发现 10.0.1.113 挂掉:

$ gluster pool list
UUID					Hostname 	State
363e49d5-c2d6-4d76-8eb5-2b1bc5e8c3c9	10.0.1.112    	Connected 
196374b7-6c4b-4dfc-86ad-35166eccafb5	10.0.1.113    	Disconnected 
892d9c99-79af-4cfb-99a2-c79aeefc5d36	localhost		Connected 

找一台新机 10.0.1.114,在 10.0.1.114 上安装好 gluster 服务和 zfs 包:

# gluster 服务的安装参考 GlusterFS 部署文档
# 安装 zfs 包
$ apt-get install zfsutils-linux -y

10.0.1.114 创建 ZFS 存储池:

$ zpool create storage raidz1 sdb sdc sdd sde sdf 

启动 10.0.1.114 上的 gluster 服务:

$ systemctl start glusterd

在 10.0.1.114 上挂载 ZFS 存储池用作 gluster 的 brick:

$ mkdir -p /data/gluster_brick
$ zfs set mountpoint=/data/gluster_brick storage

在 10.0.1.111 或 10.0.1.112 上关联 10.0.1.114 到 gluster 集群:

$ gluster peer probe 10.0.1.114
peer probe: success. 

检查 gluster 集群列表:

$ gluster pool list
UUID					Hostname 	State
363e49d5-c2d6-4d76-8eb5-2b1bc5e8c3c9	10.0.1.112    	Connected 
196374b7-6c4b-4dfc-86ad-35166eccafb5	10.0.1.113    	Disconnected 
9b4a2dc2-86e7-4346-b536-21fbdb6b371c	10.0.1.114    	Connected 
892d9c99-79af-4cfb-99a2-c79aeefc5d36	localhost		Connected 

查看 gluster 存储卷信息:

$ gluster volume info gv
 
Volume Name: gv
Type: Disperse
Volume ID: 3efefbf1-d29b-4708-8ca2-f0241b030152
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.1.111:/data/gluster_brick
Brick2: 10.0.1.112:/data/gluster_brick
Brick3: 10.0.1.113:/data/gluster_brick_new
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on

$ gluster volume heal gv info
Brick 10.0.1.111:/data/gluster_brick
Status: Connected
Number of entries: 0

Brick 10.0.1.112:/data/gluster_brick
Status: Connected
Number of entries: 0

Brick 10.0.1.113:/data/gluster_brick_new
Status: Transport endpoint is not connected
Number of entries: -

可以看到此时存储卷 gv 依旧使用了 10.0.1.113 的 brick 但是 10.0.1.113 已经是处于不可连接的状态了,下面用 10.0.1.114 的 brick 替换 10.0.1.113 的 brick:

$ gluster volume replace-brick gv 10.0.1.113:/data/gluster_brick_new 10.0.1.114:/data/gluster_brick commit force
volume replace-brick: success: replace-brick commit force operation successful

检查存储卷各 brick 的连通状态:

$ gluster volume heal gv info summary
Brick 10.0.1.111:/data/gluster_brick
Status: Connected
Number of entries: 0

Brick 10.0.1.112:/data/gluster_brick
Status: Connected
Number of entries: 0

Brick 10.0.1.114:/data/gluster_brick
Status: Connected
Number of entries: 0

将 10.0.1.113 从存储池移除:

$ gluster peer detach 10.0.1.113 
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success
0

评论区