系列链接
- https://www.jianshu.com/p/f18a1b3a4920 如何用kolla来部署容器化ceph集群
- https://www.jianshu.com/p/a39f226d5dfb 修复一些部署中遇到的问题
- https://www.jianshu.com/p/d520fed237c0 在kolla ceph中引入device classes特性
- https://www.jianshu.com/p/d6e047e1ad06 支持bcache设备和iscsi及多路径设备
本篇主要介绍如何在kolla-ceph中支持bcache设备和iscsi及多路径设备.
commit url
kolla和kolla-ansible本身不支持bcache磁盘和多路径磁盘, 我提交了这两个commit来支持以上磁盘,和原有实现最大的区别是, 原有实现利用partname来建立osd分区软链接, 我是使用partuuid来建立软链接, 具体实现我参考了ceph-disk的做法.
kolla: https://review.opendev.org/#/c/599961/
kolla-ansible: https://review.opendev.org/#/c/599962/
Kolla Ceph 使用Bcache磁盘
Bcache介绍
Bcache是Linux内核块设备层cache,支持多块HDD使用同一块SSD或者NVME作为缓存盘。它让SSD作为HDD的缓存成为了可能。由于SSD价格昂贵,存储空间小,而HDD价格低廉,存储空间大,因此采用SSD作为缓存,HDD作为数据存储盘,既解决了SSD容量太小,又解决了HDD运行速度太慢的问题。
为什么要在bluestore中使用Bcache磁盘
我们知道, bluestore不使用本地文件系统,直接接管裸设备,由于操作系统支持的aio操作只支持directIO,所以对Block设备的写操作直接写入磁盘。相比filestore, 跳过写日志的步骤, 写两次变成写一次, 理论上写入速度应该变大. 所以设计bluestore的初衷是为高速磁盘使用, 但是没办法, 经费决定着我们必须以普通磁盘为主. 对于普通磁盘来说,它的IO瓶颈决定了性能的上限, 为了提高这个上限, 我们需要加一层缓存给它, 这就是bcache的目的.
构建Bcache磁盘
以下都在我的虚拟机上进行测试, 环境为centos7
节点 | ssd磁盘 | 普通磁盘 |
---|---|---|
ceph-node1 | sdb | sdc,sdd |
ceph-node2 | sdb | sdc,sdd |
ceph-node3 | sdb | sdc,sdd |
- 首先对ssd磁盘进行分区, 我们要做的是一个ssd磁盘(sdb)对应两个普通磁盘(sdc,sdd)
sudo sgdisk --zap-all -- /dev/sdb
parted /dev/sdb -s -- mklabel gpt mkpart bcache0 1 25000
parted /dev/sdb -s mkpart bcache1 25001 100%
- 安装bcache
# 我的环境缺少以下两个包, blkid和uuid, 根据build的错误自行安装对应包
yum install libblkid-devel uuid -y
#安装bcache-tools
git clone https://evilpiepirate.org/git/bcache-tools.git
cd bcache-tools
make
make install
# 内核加载bcache模块
modprobe bcache
- 清除旧的bcache分区数据
dd if=/dev/zero of=/dev/sdb1 bs=512k count=200
dd if=/dev/zero of=/dev/sdb2 bs=512k count=200
ps: bcache提示可以使用wipefs -a /dev/sdb1
来清除, 但是这个命令在我的环境上有个bug, 比如我想清除之前的bcache缓存重新分盘, 但是执行wipefs命令后缓存磁盘又会出现在/sys/fs/bcache下面.导致后续操作都出现"Device or resource busy".
- 清除bcache后端设备分区
sudo sgdisk --zap-all -- /dev/sdc
sudo sgdisk --zap-all -- /dev/sdd
- 新建bcache设备
make-bcache -C /dev/sdb1 -B /dev/sdb --writeback
make-bcache -C /dev/sdb2 -B /dev/sdc --writeback
- 查看
[root@ceph-node1 bcache]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdd 8:48 0 50G 0 disk
└─bcache1 252:128 0 50G 0 disk
sdb 8:16 0 50G 0 disk
├─sdb2 8:18 0 26.7G 0 part
│ └─bcache1 252:128 0 50G 0 disk
└─sdb1 8:17 0 23.3G 0 part
└─bcache0 252:0 0 50G 0 disk
sdc 8:32 0 50G 0 disk
└─bcache0 252:0 0 50G 0 disk
[root@ceph-node1 bcache]# fdisk -l
Disk /dev/bcache0: 53.7 GB, 53687083008 bytes, 104857584 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/bcache1: 53.7 GB, 53687083008 bytes, 104857584 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
现在我们可以把bcache磁盘当做正常的磁盘使用了.
使用bcache磁盘来部署kolla ceph
- 准备kolla ceph 磁盘
sudo sgdisk --zap-all -- /dev/bcache0
sudo sgdisk --zap-all -- /dev/bcache1
sudo /sbin/parted /dev/bcache0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 -1
sudo /sbin/parted /dev/bcache1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO2 1 -1
使用我的commit可部署成功.
- 如果使用kolla和kolla-ansible的原有代码去部署, 会报个错误:
"+ sudo -E kolla_set_configs\n
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json\n
INFO:__main__:Validating config file\n
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS\n
INFO:__main__:Copying service configuration files\n
INFO:__main__:Copying /var/lib/kolla/config_files/ceph.conf to /etc/ceph/ceph.conf\n
INFO:__main__:Setting permission for /etc/ceph/ceph.conf\n
INFO:__main__:Copying /var/lib/kolla/config_files/ceph.client.admin.keyring to /etc/ceph/ceph.client.admin.keyring\n
INFO:__main__:Setting permission for /etc/ceph/ceph.client.admin.keyring\n
INFO:__main__:Writing out command to execute\n
++ cat /run_command\n
+ CMD='/usr/bin/ceph-osd -f --public-addr 10.34.135.160 --cluster-addr 10.34.135.160'\n
+ ARGS=\n
+ [[ ! -n '' ]]\n
+ . kolla_extend_start\n
++ [[ ! -d /var/log/kolla/ceph ]]\n
+++ stat -c %a /var/log/kolla/ceph\n
++ [[ 2755 != \\7\\5\\5 ]]\n
++ chmod 755 /var/log/kolla/ceph\n
++ [[ -n 0 ]]\n
++ CEPH_JOURNAL_TYPE_CODE=45B0969E-9B03-4F30-B4C6-B4B80CEFF106\n
++ CEPH_OSD_TYPE_CODE=4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D\n
++ CEPH_OSD_BS_WAL_TYPE_CODE=0FC63DAF-8483-4772-8E79-3D69D8477DE4\n
++ CEPH_OSD_BS_DB_TYPE_CODE=CE8DF73C-B89D-45B0-AD98-D45332906d90\n
++ ceph quorum_status\n
++ [[ False == \\F\\a\\l\\s\\e ]]\n
++ [[ bluestore == \\b\\l\\u\\e\\s\\t\\o\\r\\e ]]\n
++ [[ /dev/bcache0 =~ /dev/loop ]]\n
++ sgdisk --zap-all -- /dev/bcache01\n
Problem opening /dev/bcache01 for reading! Error is 2.\n
The specified file does not exist!\n
Problem opening '' for writing! Program will now terminate.\n
Warning! MBR not overwritten! Error is 2!\n",
从日志中可以看到kolla识别的磁盘如下:
{
"bs_blk_device": "",
"bs_blk_label": "",
"bs_blk_partition_num": "",
"bs_db_device": "",
"bs_db_label": "",
"bs_db_partition_num": "",
"bs_wal_device": "",
"bs_wal_label": "",
"bs_wal_partition_num": "",
"device": "/dev/bcache0",
"external_journal": false,
"fs_label": "",
"fs_uuid": "",
"journal": "",
"journal_device": "",
"journal_num": 0,
"partition": "/dev/bcache0",
"partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1",
"partition_num": "1"
}
出错的原因就是这段代码(kolla/docker/ceph/ceph-osd/extend_start.sh):
if [[ "${OSD_BS_DEV}" =~ "/dev/loop" ]]; then
sgdisk --zap-all -- "${OSD_BS_DEV}""p${OSD_BS_PARTNUM}"
else
sgdisk --zap-all -- "${OSD_BS_DEV}""${OSD_BS_PARTNUM}"
fi
kolla的代码中只有当设备是/dev/loop才给子分区前面加p,而bcache0的第一个分区是bcache0p1, kolla只能处理成而bcache01,所以会出现这个错误.
卸载bcache磁盘
#删除后端设备
echo 1 > /sys/block/bcache<N>/bcache/stop
# 删除cache设备
echo 1 > /sys/fs/bcache/<uuid>/unregister
ps: 注意删除顺序, 如果先删除了cache设备,而没有停止绑定的后端设备, 则cache设备会自动恢复
多路径磁盘
节点 | 磁盘 | 用途 | IP |
---|---|---|---|
ceph-node1 | sdb,sdc,sdd | 目标节点 | 192.168.10.11 |
ceph-node2 | sdb,sdc,sdd | 目标节点 | 192.168.10.12 |
ceph-node3 | sdb,sdc | 目标节点 | 192.168.10.13 |
ceph-node4 | sdb,sdc,sdd | 源节点, 双网卡 | 192.168.10.14/192.168.11.14 |
源节点初始化
- 安装相关包
yum install targetd targetcli -y
systemctl enable target && systemctl start target
- 准备逻辑卷
sudo sgdisk --zap-all -- /dev/sdb
sudo sgdisk --zap-all -- /dev/sdc
sudo sgdisk --zap-all -- /dev/sdd
pvcreate /dev/sdb
vgcreate vg00 /dev/sdb
lvcreate -l 100%free -n lv00 vg00
pvcreate /dev/sdc
vgcreate vg01 /dev/sdc
lvcreate -l 100%free -n lv01 vg01
pvcreate /dev/sdd
vgcreate vg02 /dev/sdd
lvcreate -l 100%free -n lv02 vg02
- 查看逻辑卷
[root@ceph-node3 irteamsu]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root centos -wi-ao---- 45.99g
swap centos -wi-ao---- 3.00g
lv00 vg00 -wi-a----- <50.00g
lv01 vg01 -wi-a----- <50.00g
lv02 vg02 -wi-a----- <50.00g
- 进入targetcli
[root@ceph-node3 irteamsu]# targetcli
targetcli shell version 2.1.fb46
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.
/>
- 创建多路径磁盘
/backstores/block create disk0 /dev/vg00/lv00
iscsi/ create iqn.2017-05.con.benet:disk0
/iscsi/iqn.2017-05.con.benet:disk0/tpg1/acls create iqn.2017-05.com.benet:192.168.10.11
/iscsi/iqn.2017-05.con.benet:disk0/tpg1/luns create /backstores/block/disk0
/backstores/block create disk1 /dev/vg01/lv01
iscsi/ create iqn.2017-05.con.benet:disk1
/iscsi/iqn.2017-05.con.benet:disk1/tpg1/acls create iqn.2017-05.com.benet:192.168.10.12
/iscsi/iqn.2017-05.con.benet:disk1/tpg1/luns create /backstores/block/disk1
/backstores/block create disk2 /dev/vg02/lv02
iscsi/ create iqn.2017-05.con.benet:disk2
/iscsi/iqn.2017-05.con.benet:disk2/tpg1/acls create iqn.2017-05.com.benet:192.168.10.13
/iscsi/iqn.2017-05.con.benet:disk2/tpg1/luns create /backstores/block/disk2
- 查看
/> ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 3]
| | o- disk0 ..................................................................... [/dev/vg00/lv00 (50.0GiB) write-thru activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- disk1 ..................................................................... [/dev/vg01/lv01 (50.0GiB) write-thru activated]
| | | o- alua ................................................................................................... [ALUA Groups: 1]
| | | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- disk2 ..................................................................... [/dev/vg02/lv02 (50.0GiB) write-thru activated]
| | o- alua ................................................................................................... [ALUA Groups: 1]
| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| o- fileio ................................................................................................. [Storage Objects: 0]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 3]
| o- iqn.2017-05.con.benet:disk0 ....................................................................................... [TPGs: 1]
| | o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| | o- acls .......................................................................................................... [ACLs: 1]
| | | o- iqn.2017-05.com.benet:192.168.10.11 .................................................................. [Mapped LUNs: 1]
| | | o- mapped_lun0 ................................................................................. [lun0 block/disk0 (rw)]
| | o- luns .......................................................................................................... [LUNs: 1]
| | | o- lun0 ................................................................ [block/disk0 (/dev/vg00/lv00) (default_tg_pt_gp)]
| | o- portals .................................................................................................... [Portals: 1]
| | o- 0.0.0.0:3260 ..................................................................................................... [OK]
| o- iqn.2017-05.con.benet:disk1 ....................................................................................... [TPGs: 1]
| | o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| | o- acls .......................................................................................................... [ACLs: 1]
| | | o- iqn.2017-05.com.benet:192.168.10.12 .................................................................. [Mapped LUNs: 1]
| | | o- mapped_lun0 ................................................................................. [lun0 block/disk1 (rw)]
| | o- luns .......................................................................................................... [LUNs: 1]
| | | o- lun0 ................................................................ [block/disk1 (/dev/vg01/lv01) (default_tg_pt_gp)]
| | o- portals .................................................................................................... [Portals: 1]
| | o- 0.0.0.0:3260 ..................................................................................................... [OK]
| o- iqn.2017-05.con.benet:disk2 ....................................................................................... [TPGs: 1]
| o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
| o- acls .......................................................................................................... [ACLs: 1]
| | o- iqn.2017-05.com.benet:192.168.10.13.................................................................. [Mapped LUNs: 1]
| | o- mapped_lun0 ................................................................................. [lun0 block/disk2 (rw)]
| o- luns .......................................................................................................... [LUNs: 1]
| | o- lun0 ................................................................ [block/disk2 (/dev/vg02/lv02) (default_tg_pt_gp)]
| o- portals .................................................................................................... [Portals: 1]
| o- 0.0.0.0:3260 ..................................................................................................... [OK]
o- loopback ......................................................................................................... [Targets: 0]
目标节点建立多路径
- 安装包及配置
yum -y install iscsi-initiator-utils
# 配置InitiatorName, 以ceph-node1为例
vi /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2017-05.com.benet:192.168.10.11
systemctl enable iscsi && systemctl start iscsi
- 扫描设备并展示设备
[root@ceph-node1 irteamsu]# iscsiadm -m discovery -t st -p 192.168.10.14
192.168.10.14:3260,1 iqn.2017-05.con.benet:disk0
192.168.10.14:3260,1 iqn.2017-05.con.benet:disk1
192.168.10.14:3260,1 iqn.2017-05.con.benet:disk2
[root@ceph-node1 irteamsu]# iscsiadm -m discovery -t st -p 192.168.11.14
192.168.11.14:3260,1 iqn.2017-05.con.benet:disk0
192.168.11.14:3260,1 iqn.2017-05.con.benet:disk1
192.168.11.14:3260,1 iqn.2017-05.con.benet:disk2
- 遇到问题:
# 3.10.0-327.el7.x86_64内核的节点配置后扫描设备报错
[root@ceph-node3 ~]# iscsiadm -m discovery -t st -p 192.168.10.14
iscsiadm: Cannot perform discovery. Invalid Initiatorname.
iscsiadm: Could not perform SendTargets discovery: invalid parameter
重启后解决
- 连接设备
# 以ceph-node1为例
iscsiadm -m node -T iqn.2017-05.con.benet:disk0 -p 192.168.10.14 --op update -n node.startup -v automatic
iscsiadm -m node -T iqn.2017-05.con.benet:disk0 -p 192.168.11.14 --op update -n node.startup -v automatic
- 查看网络磁盘
Disk /dev/sde: 53.7 GB, 53682896896 bytes, 104849408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4194304 bytes
Disk /dev/sdf: 53.7 GB, 53682896896 bytes, 104849408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4194304 bytes
- 配置多路径
yum install device-mapper-multipath -y
systemctl enable multipathd.service && systemctl restart multipathd.service
vi /etc/multipath.conf
blacklist {
devnode "^sda"
}
defaults {
user_friendly_names yes
path_grouping_policy multibus
failback immediate
no_path_retry fail
}
- 多路径不能自动识别问题(在4.20.2-1.el7.elrepo.x86_64内核上出现)
[root@ceph-node2 ~]# multipath -v3
...
Apr 28 11:11:16 | mpathc: pgfailback = -2 (config file default)
Apr 28 11:11:16 | mpathc: pgpolicy = multibus (config file default)
Apr 28 11:11:16 | mpathc: selector = service-time 0 (internal default)
Apr 28 11:11:16 | mpathc: features = 0 (config file default)
Apr 28 11:11:16 | mpathc: hwhandler = 0 (internal default)
Apr 28 11:11:16 | mpathc: rr_weight = 1 (internal default)
Apr 28 11:11:16 | mpathc: minio = 1 rq (config file default)
Apr 28 11:11:16 | mpathc: no_path_retry = -1 (config file default)
Apr 28 11:11:16 | mpathc: pg_timeout = NONE (internal default)
Apr 28 11:11:16 | mpathc: fast_io_fail_tmo = 5 (config file default)
Apr 28 11:11:16 | mpathc: retain_attached_hw_handler = 1 (config file default)
Apr 28 11:11:16 | mpathc: deferred_remove = 1 (config file default)
Apr 28 11:11:16 | delay_watch_checks = DISABLED (internal default)
Apr 28 11:11:16 | delay_wait_checks = DISABLED (internal default)
Apr 28 11:11:16 | skip_kpartx = 1 (config file default)
Apr 28 11:11:16 | unpriv_sgio = 1 (config file default)
Apr 28 11:11:16 | mpathc: remove queue_if_no_path from '0'
Apr 28 11:11:16 | mpathc: assembled map [0 0 1 1 service-time 0 2 1 8:64 1 8:80 1]
Apr 28 11:11:16 | mpathc: set ACT_CREATE (map does not exist)
Apr 28 11:11:16 | ghost_delay = -1 (config file default)
Apr 28 11:11:16 | mpathc: domap (0) failure for create/reload map
Apr 28 11:11:16 | mpathc: ignoring map
Apr 28 11:11:16 | const prioritizer refcount 2
Apr 28 11:11:16 | directio checker refcount 2
Apr 28 11:11:16 | const prioritizer refcount 1
Apr 28 11:11:16 | directio checker refcount 1
Apr 28 11:11:16 | unloading const prioritizer
Apr 28 11:11:16 | unloading directio checker
查了一下,主要原因是新版的多路径插件需要启用scsi-mq:
# 如果需要使用scsi-mq,需要添加scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y到内核启动参数,提升盘读写性能
# 在grub.cfg中找到对应的内核, 加入参数scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y
vi /boot/grub2/grub.cfg
### BEGIN /etc/grub.d/10_linux ###
menuentry 'CentOS Linux (4.20.2-1.el7.elrepo.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.el7.x86_64-advanced-be679149-35c2-4143-b8c4-34a594f1b15f' {
load_video
set gfxpayload=keep
insmod gzio
insmod part_msdos
insmod xfs
set root='hd0,msdos1'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1' 9f370650-6e47-4d78-b54d-420c0068cf6b
else
search --no-floppy --fs-uuid --set=root 9f370650-6e47-4d78-b54d-420c0068cf6b
fi
linux16 /vmlinuz-4.20.2-1.el7.elrepo.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y
initrd16 /initramfs-4.20.2-1.el7.elrepo.x86_64.img
}
# 然后需要reboot机器
reboot
# 检查是否生效
[root@ceph-node2 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.20.2-1.el7.elrepo.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y
[root@ceph-node2 ~]# cat /sys/module/scsi_mod/parameters/use_blk_mq
Y
重新执行multipath -v3
后出现多路径磁盘
- 对路径的磁盘进行初始化,即可用来部署ceph(使用我的commit)
sudo sgdisk --zap-all -- /dev/mapper/mpatha
sudo /sbin/parted /dev/mapper/mpatha -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 -1
ps: 多路径磁盘和bcache磁盘都会使用p + number
的子分区后缀, kolla的代码并不支持, 然后在kolla/docker/kolla-toolbox/find_disks.py
中也不支持对发现多路径磁盘的专门逻辑.
网友评论