演示环境介绍
Subnet Manager节点 : Node IO
无Partition节点: H3,H4
Partition pz节点: H1,H2
如下图
+--------------------------------+
| |
| |
| +----------------------+ |
| | +--------------------------------------------------+
| | Node IO | | |
| | Subnet Manager +-----------------------+ |
| | | | | |
| +----+------------+----+ | +----------------------------------------------------+
| | | | | | | |
| +-----+---+ +---+-----+ | | +---------+----------+ +----------+---------+ |
| | | | | | | | | | | |
| | | | | | | | Node H1 | | Node H2 | |
| | Node H3 | | Node H4 | | | | 0xe41d2d0300cac9f6 | | | |
| | | | | | | | 0xe41d2d0300cac9f7 | | 0xe41d2d0300cac9fa | |
| | | | | | | | | | | |
| +---------+ +---------+ | | +--------------------+ +--------------------+ |
| | | |
| | | |
| Default Patition: pkey 0x7fff | | Partition pz: pkey 0x1234 |
| | | |
| | | |
+--------------------------------+ +----------------------------------------------------+
OpenSM节点Partition配置
修改parition.conf配置文件后,需要按照
https://community.mellanox.com/docs/DOC-2901
发一个HUP给opensm进程,使配置生效
Partition.conf配置文件
写法1
[root@IO ~]# cat /etc/opensm/partitions.conf
pz=0x1234,indx0,ipoib,defmember=full:0xe41d2d0300cac9f6,0xe41d2d0300cac9f7,0xe41d2d0300cac9fa ;
写法2
[root@IO ~]# cat /etc/opensm/partitions.conf
pz=0x1234,indx0,ipoib:0xe41d2d0300cac9f6=full,0xe41d2d0300cac9f7=full,0xe41d2d0300cac9fa=full ;
节点间连通性
默认分区
在partitions.conf存在的时候,隐含以下default partition配置
Default=0x7fff,indx0,ipoib:ALL=limited,SELF=full ;
- Default partition: pkey=0x7fff
- OpenSM节点默认权限为full
- 在partitions.conf存在的时候,默认分区的成员权限为limited
使得:
- H3间H4不通
- IO到H3/4互通
PZ分区
通过以下条目明文配置
pz=0x1234,indx0,ipoib,defmember=full:0xe41d2d0300cac9f6,0xe41d2d0300cac9f7,0xe41d2d0300cac9fa ;
- pz partition: pkey=0x1234
- 因为配置indx0,使得pkey table中index为0的默认pkey为pz partition的pkey 0x1234
- 定义三个full成员,分别是H1的两个端口,和H2的端口的GUID
使得:
- H1到H2间互通
- H1/2到H3/4不通
- IO到H1/2不通,因为H1/2默认用0x1234了,而IO用的是SM SELF 0x7fff
PZ分区到OpenSM 节点通过PKEY 0x7fff通信
PZ分区成员可以通过Default partition的pkey和Subnet Manager也就是IO节点通信
查询PKEY INDEX
查询PZ分区成员的默认pkey 0x7fff,看到7fff被分配在table中第19项
[root@h2 ~]# grep 7fff /sys/class/infiniband/mlx5_0/ports/1/pkeys/ -r
/sys/class/infiniband/mlx5_0/ports/1/pkeys/19:0x7fff
[root@h2 ~]# smpquery pkeytable 10
0: 0x9888 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
16: 0x0000 0x0000 0x0000 0x7fff 0x0000 0x0000 0x0000 0x0000
24: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
32: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
40: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
48: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
56: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
64: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
72: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
80: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
88: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
96: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
104: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
112: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
120: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
128 pkeys capacity for this port
用PKEY 0x7fff连接并测试ib_write_bw
Subnet Manager节点
[root@IO ~]# ib_write_bw
************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx4_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
CQ Moderation : 100
Mtu : 2048[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x01 QPN 0x02ad PSN 0x686c48 RKey 0x30010200 VAddr 0x007fd305410000
remote address: LID 0x0a QPN 0x00b5 PSN 0xecb0d6 RKey 0x0137a9 VAddr 0x007ff0caf50000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
65536 5000 6152.22 6152.04 0.098433
---------------------------------------------------------------------------------------
H2节点,ib_write_bw增加--pkey_index=19参数,指定pkey
[root@h2 ~]# ib_write_bw -d mlx5_0 io --pkey_index=19
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 2048[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0a QPN 0x00b5 PSN 0xecb0d6 RKey 0x0137a9 VAddr 0x007ff0caf50000
remote address: LID 0x01 QPN 0x02ad PSN 0x686c48 RKey 0x30010200 VAddr 0x007fd305410000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
Conflicting CPU frequency values detected: 1200.000000 != 2299.951000. CPU Frequency is not max.
65536 5000 6152.22 6152.04 0.098433
---------------------------------------------------------------------------------------
PZ分区上的InfiniBand端口绑定
参照https://community.mellanox.com/docs/DOC-2160
配置文件
[root@h1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
TYPE=InfiniBand
NM_CONTROLLED=no
ONBOOT=yes
MASTER=bbbbond0
SLAVE=yes
BOOTPROTO=none
PRIMARY=no
[root@h1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib1
DEVICE=ib1
TYPE=InfiniBand
NM_CONTROLLED=no
ONBOOT=yes
MASTER=bbbbond0
SLAVE=yes
BOOTPROTO=none
PRIMARY=yes
[root@h1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bbbbond0
DEVICE=bbbbond0
IPADDR=66.66.66.80
NETMASK=255.255.255.0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
BONDING_OPTS="mode=active-backup primary=ib0 miimon=100 updelay=100 downdelay=100"
MTU=2044
配置验证
[root@h1 ~]# cat /proc/net/bonding/bbbbond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: ib1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100
Slave Interface: ib1
MII Status: up
Speed: 56000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 20:00:18:8d:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:ca:c9:f7
Slave queue ID: 0
网友评论