美文网首页
Ceph user for OpenStack 创建问题处理

Ceph user for OpenStack 创建问题处理

作者: Maxwell_Li | 来源:发表于2018-08-31 09:41 被阅读0次

作者:Maxwell Li
日期:2017/01/13
未经作者允许,禁止转载本文任何内容。如需转载请留言。


问题发现

北京时间 1月 10日中午,部署 nosdn-nofeature 场景无法拉起实例,CI 上 Yardstick 和 Functest 均无法通过。

问题定位

Step1 定位 Bug 由哪个 Patch 引入

查看 CI 构建历史,Yardstick 最后一次成功为 CI 时间 1月 1日。由于北京时间 12月 31日 - 1月 5日之间,上游社区 compass-core 项目误将分支代码合入主干,导致那段时间内无法部署。由此推断出,12月 31日 - 1月 10日中合入的 Patch 引入了 Bug。此时间段共合入两个 Patch:

回退黄翔宇合入的 Patch “Fix instance can't get key bug”,发现问题依旧存在。基本可以断定此 Bug 由 Yamllint test 这个 Patch 引入。看来这锅还是自己的。

Step2 查看 log 信息,定位问题

从下往上查看计算节点的 nova-compute.log,发现第一个 ERROR:

2017-01-12 17:41:34.659 7903 DEBUG oslo_messaging._drivers.amqpdriver [req-b3dbb34a-7a17-4079-9c49-5a8d50cf6c85 d0df0826ab3643be8ca6f89dff4ca8f7 5a347ea194a9424bb4eded4c2f72b404 - - -] CAST unique_id: 50e816393e10466f8cc170219703e204 NOTIFY exchange 'nova' topic 'notifications.error' _send /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:432
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [req-b3dbb34a-7a17-4079-9c49-5a8d50cf6c85 d0df0826ab3643be8ca6f89dff4ca8f7 5a347ea194a9424bb4eded4c2f72b404 - - -] [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] Build of instance 035a4604-ff63-455c-adc5-86ce6e440a41 aborted: error opening image 035a4604-ff63-455c-adc5-86ce6e440a41_disk at snapshot None
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] Traceback (most recent call last):
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1779, in _do_build_and_run_instance
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     filter_properties)
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1939, in _build_and_run_instance
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     'create.error', fault=e)
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     self.force_reraise()
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     six.reraise(self.type_, self.value, self.tb)
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1923, in _build_and_run_instance
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     instance=instance)
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     self.gen.throw(type, value, traceback)
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2105, in _build_resources
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     reason=six.text_type(exc))
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] BuildAbortException: Build of instance 035a4604-ff63-455c-adc5-86ce6e440a41 aborted: error opening image 035a4604-ff63-455c-adc5-86ce6e440a41_disk at snapshot None
2017-01-12 17:41:34.664 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]
2017-01-12 17:41:34.666 7903 DEBUG nova.compute.manager [req-b3dbb34a-7a17-4079-9c49-5a8d50cf6c85 d0df0826ab3643be8ca6f89dff4ca8f7 5a347ea194a9424bb4eded4c2f72b404 - - -] [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] Deallocating network for instance _deallocate_network /usr/lib/python2.7/dist-packages/nova/compute/manager.py:1659

查看 /usr/lib/python2.7/dist-packages/nova/compute/manager.py 2105行所在函数:

2013     def _build_resources(self, context, instance, requested_networks,
2014                          security_groups, image_meta, block_device_mapping):
2015         resources = {}
2016         network_info = None
2017         try:
2018             LOG.debug('Start building networks asynchronously for instance.',
2019                       instance=instance)
2020             network_info = self._build_networks_for_instance(context, instance,
2021                     requested_networks, security_groups)
2022             resources['network_info'] = network_info
2023         except (exception.InstanceNotFound,
2024                 exception.UnexpectedDeletingTaskStateError):
2025             raise
2026         except exception.UnexpectedTaskStateError as e:
2027             raise exception.BuildAbortException(instance_uuid=instance.uuid,
2028                     reason=e.format_message())
2029         except Exception:
2030             # Because this allocation is async any failures are likely to occur
2031             # when the driver accesses network_info during spawn().
2032             LOG.exception(_LE('Failed to allocate network(s)'),
2033                           instance=instance)
2034             msg = _('Failed to allocate the network(s), not rescheduling.')
2035             raise exception.BuildAbortException(instance_uuid=instance.uuid,
2036                     reason=msg)
2037 
2038         try:
2039             # Verify that all the BDMs have a device_name set and assign a
2040             # default to the ones missing it with the help of the driver.
2041             self._default_block_device_names(instance, image_meta,
2042                                              block_device_mapping)
2043 
2044             LOG.debug('Start building block device mappings for instance.',
2045                       instance=instance)
2046             instance.vm_state = vm_states.BUILDING
2047             instance.task_state = task_states.BLOCK_DEVICE_MAPPING
2048             instance.save()
2049 
2050             block_device_info = self._prep_block_device(context, instance,
2051                     block_device_mapping)
2052             resources['block_device_info'] = block_device_info
2053         except (exception.InstanceNotFound,
2054                 exception.UnexpectedDeletingTaskStateError):
2055             with excutils.save_and_reraise_exception():
2056                 # Make sure the async call finishes
2057                 if network_info is not None:
2058                     network_info.wait(do_raise=False)
2059         except (exception.UnexpectedTaskStateError,
2060                 exception.VolumeLimitExceeded,
2061                 exception.InvalidBDM) as e:
2062             # Make sure the async call finishes
2063             if network_info is not None:
2064                 network_info.wait(do_raise=False)
2065             raise exception.BuildAbortException(instance_uuid=instance.uuid,
2066                     reason=e.format_message())
2067         except Exception:
2068             LOG.exception(_LE('Failure prepping block device'),
2069                     instance=instance)
2070             # Make sure the async call finishes
2071             if network_info is not None:
2072                 network_info.wait(do_raise=False)
2073             msg = _('Failure prepping block device.')
2074             raise exception.BuildAbortException(instance_uuid=instance.uuid,
2075                     reason=msg)
2076 
2077         try:
2078             yield resources
2079         except Exception as exc:
2080             with excutils.save_and_reraise_exception() as ctxt:
2081                 if not isinstance(exc, (
2082                         exception.InstanceNotFound,
2083                         exception.UnexpectedDeletingTaskStateError)):
2084                     LOG.exception(_LE('Instance failed to spawn'),
2085                                   instance=instance)
2086                 # Make sure the async call finishes
2087                 if network_info is not None:
2088                     network_info.wait(do_raise=False)
2089                 # if network_info is empty we're likely here because of
2090                 # network allocation failure. Since nothing can be reused on
2091                 # rescheduling it's better to deallocate network to eliminate
2092                 # the chance of orphaned ports in neutron
2093                 deallocate_networks = False if network_info else True
2094                 try:
2095                     self._shutdown_instance(context, instance,
2096                             block_device_mapping, requested_networks,
2097                             try_deallocate_networks=deallocate_networks)
2098                 except Exception as exc2:
2099                     ctxt.reraise = False
2100                     LOG.warning(_LW('Could not clean up failed build,'
2101                                     ' not rescheduling. Error: %s'),
2102                                 six.text_type(exc2))
2103                     raise exception.BuildAbortException(
2104                             instance_uuid=instance.uuid,
2105                             reason=six.text_type(exc))
2106

可以看出是由 yield resources 命令错误导致异常退出,猜测 nova-compute.log 中还有其他错误,继续往上翻,果然存在以下 ERROR:

2017-01-12 17:41:32.926 7903 INFO nova.virt.libvirt.driver [req-b3dbb34a-7a17-4079-9c49-5a8d50cf6c85 d0df0826ab3643be8ca6f89dff4ca8f7 5a347ea194a9424bb4eded4c2f72b404 - - -] [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] Creating image
2017-01-12 17:41:32.940 7903 ERROR nova.virt.libvirt.storage.rbd_utils [req-b3dbb34a-7a17-4079-9c49-5a8d50cf6c85 d0df0826ab3643be8ca6f89dff4ca8f7 5a347ea194a9424bb4eded4c2f72b404 - - -] error opening rbd image 035a4604-ff63-455c-adc5-86ce6e440a41_disk
2017-01-12 17:41:32.940 7903 ERROR nova.virt.libvirt.storage.rbd_utils Traceback (most recent call last):
2017-01-12 17:41:32.940 7903 ERROR nova.virt.libvirt.storage.rbd_utils   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 75, in __init__
2017-01-12 17:41:32.940 7903 ERROR nova.virt.libvirt.storage.rbd_utils     read_only=read_only))
2017-01-12 17:41:32.940 7903 ERROR nova.virt.libvirt.storage.rbd_utils   File "rbd.pyx", line 1042, in rbd.Image.__init__ (/build/ceph-XmVvyr/ceph-10.2.2/src/build/rbd.c:9862)
2017-01-12 17:41:32.940 7903 ERROR nova.virt.libvirt.storage.rbd_utils PermissionError: error opening image 035a4604-ff63-455c-adc5-86ce6e440a41_disk at snapshot None
2017-01-12 17:41:32.940 7903 ERROR nova.virt.libvirt.storage.rbd_utils
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [req-b3dbb34a-7a17-4079-9c49-5a8d50cf6c85 d0df0826ab3643be8ca6f89dff4ca8f7 5a347ea194a9424bb4eded4c2f72b404 - - -] [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] Instance failed to spawn
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] Traceback (most recent call last):
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2078, in _build_resources
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     yield resources
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1920, in _build_and_run_instance
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     block_device_info=block_device_info)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2571, in spawn
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     admin_pass=admin_password)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2975, in _create_image
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     fallback_from_host)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3075, in _create_and_inject_local_root
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     instance, size, fallback_from_host)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6547, in _try_fetch_image_cache
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     size=size)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 216, in cache
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     if not self.exists() or not os.path.exists(base):
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 835, in exists
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     return self.driver.exists(self.rbd_name)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 291, in exists
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     read_only=True):
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 83, in __init__
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     driver._disconnect_from_rados(client, ioctx)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     self.force_reraise()
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     six.reraise(self.type_, self.value, self.tb)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 75, in __init__
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]     read_only=read_only))
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]   File "rbd.pyx", line 1042, in rbd.Image.__init__ (/build/ceph-XmVvyr/ceph-10.2.2/src/build/rbd.c:9862)
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41] PermissionError: error opening image 035a4604-ff63-455c-adc5-86ce6e440a41_disk at snapshot None
2017-01-12 17:41:32.943 7903 ERROR nova.compute.manager [instance: 035a4604-ff63-455c-adc5-86ce6e440a41]
2017-01-12 17:41:33.130 7903 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL msg_id: 6036c3f61c304fb69832d3c9e1f5949f exchange 'nova' topic 'conductor' _send /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448

基本可以断定 ERROR 是由 glance、cinder、 ceph 以及 nova 等组件配置错误造成的。

Step3 验证各个组件配置文件

当时提 Yamllint test 那个 Patch 时,修改了很多关于 ceph 的内容。由于 ceph 的配置项基本上都是用 sed 命令修改的,为了保证通过 Yamllint test,曾将好几个 sed 命令进行拆分。其中一小部分如下:

diff --git a/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_conf.yml b/deploy/adapt
index 0496ba9..8451526 100755
--- a/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_conf.yml
+++ b/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_conf.yml
@@ -12,29 +12,113 @@
   when: inventory_hostname in groups['controller']
   tags:
     - ceph_conf_glance
-  ignore_errors: True
+  ignore_errors: "True"
 
 - name: modify glance-api.conf for ceph
-  shell: sed -i 's/^\(default_store\).*/\1 = rbd/g' /etc/glance/glance-api.conf && sed -i '/^\[glance
+  shell: |
+    sed -i 's/^\(default_store\).*/\1 = rbd/g' /etc/glance/glance-api.conf;
+    sed -i '/^\[glance_store/a rbd_store_pool = images' \
+        /etc/glance/glance-api.conf;
+    sed -i '/^\[glance_store/a rbd_store_user = glance' \
+        /etc/glance/glance-api.conf;
+    sed -i '/^\[glance_store/a rbd_store_ceph_conf = /etc/ceph/ceph.conf' \
+        /etc/glance/glance-api.conf;
+    sed -i '/^\[glance_store/a rbd_store_chunk_size = 8' \
+        /etc/glance/glance-api.conf;
+    sed -i '/^\[glance_store/a show_image_direct_url=True' \
+        /etc/glance/glance-api.conf;
   when: inventory_hostname in groups['controller']
   tags:
     - ceph_conf_glance
 
-- name: restart glance
-  shell: rm -f /var/log/glance/api.log && chown -R glance:glance /var/log/glance && service {{ glance
+- name: remove glance-api log
+  shell: |
+    rm -f /var/log/glance/api.log;
+    chown -R glance:glance /var/log/glance;
+  when: inventory_hostname in groups['controller']
+  tags:
+    - ceph_conf_glance
+  ignore_errors: "True"
+
+- name: restart glance service
+  shell: service {{ glance_service }} restart
+  register: result
+  until: result.rc == 0
+  retries: 10
+  delay: 3
   when: inventory_hostname in groups['controller']
   tags:
     - ceph_conf_glance
-  ignore_errors: True

为此,拉取了控制节点和计算节点的 nova、glance、cinder、ceph 四个组件所有的配置文件,与 Yamllint test 之前正确部署且能够拉起实例的配置文件进行对比,没有发现任何异常。

Step4 检查 glance 是否正常

对 cirros 镜像进行上传和下载,并对比 512 码。可以断定 glance 正常。

root@host1:~# wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img
root@host1:~# glance image-create --name "cirros" --file cirros-0.3.4-x86_64-disk.img --disk-format qcow2 --container-format bare
root@host1:~# glance image-download 547abb03-3dca-4c31-815c-3f2063b50ebb --file cirros.img
root@host1:~# md5sum cirros-0.3.4-x86_64-disk.img 
ee1eca47dc88f4879d8a229cc70a07c6  cirros-0.3.4-x86_64-disk.img
root@host1:~# md5sum cirros.img 
ee1eca47dc88f4879d8a229cc70a07c6  cirros.img

Step5 再次分析 nova-compute.log 中的错误,查看代码。

此步骤由梁哥协助完成。
查看 /usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py 文件:


from nova.virt.libvirt.storage import rbd_utils

class Rbd(Image):

    SUPPORTS_CLONE = True

    def __init__(self, instance=None, disk_name=None, path=None, **kwargs):
        if not CONF.libvirt.images_rbd_pool:
            raise RuntimeError(_('You should specify'
                                 ' images_rbd_pool'
                                 ' flag to use rbd images.'))

        if path:
            try:
                self.rbd_name = path.split('/')[1]
            except IndexError:
                raise exception.InvalidDevicePath(path=path)
        else:
            self.rbd_name = '%s_%s' % (instance.uuid, disk_name)

        self.pool = CONF.libvirt.images_rbd_pool
        self.rbd_user = CONF.libvirt.rbd_user
        self.ceph_conf = CONF.libvirt.images_rbd_ceph_conf

        path = 'rbd:%s/%s' % (self.pool, self.rbd_name)
        if self.rbd_user:
            path += ':id=' + self.rbd_user
        if self.ceph_conf:
            path += ':conf=' + self.ceph_conf

        super(Rbd, self).__init__(path, "block", "rbd", is_block_dev=False)

        self.driver = rbd_utils.RBDDriver(
            pool=self.pool,
            ceph_conf=self.ceph_conf,
            rbd_user=self.rbd_user)

    def exists(self):
        return self.driver.exists(self.rbd_name)

通过参看 /etc/nova/nova-compute.conf 查询上面函数中变量的具体值

[libvirt]
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder

手动验证错误

root@host4:~# python
>>> from nova.virt.libvirt.storage import rbd_utils
>>> driver = rbd_utils.RBDDriver(pool="vms", ceph_conf="/etc/ceph/ceph.conf", rbd_user="cinder")
>>> driver.exists("c05cf0ee-6f7f-4b6e-8009-a59ed9a4564a_disk")
ERROR:nova.virt.libvirt.storage.rbd_utils:error opening rbd image c05cf0ee-6f7f-4b6e-8009-a59ed9a4564a_disk
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 75, in __init__
    read_only=read_only))
  File "rbd.pyx", line 1042, in rbd.Image.__init__ (/build/ceph-XmVvyr/ceph-10.2.2/src/build/rbd.c:9862)
PermissionError: error opening image c05cf0ee-6f7f-4b6e-8009-a59ed9a4564a_disk at snapshot None
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 291, in exists
    read_only=True):
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 83, in __init__
    driver._disconnect_from_rados(client, ioctx)
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 75, in __init__
    read_only=read_only))
  File "rbd.pyx", line 1042, in rbd.Image.__init__ (/build/ceph-XmVvyr/ceph-10.2.2/src/build/rbd.c:9862)
rbd.PermissionError: error opening image c05cf0ee-6f7f-4b6e-8009-a59ed9a4564a_disk at snapshot None

到 host3 上尝试 glance 操作 RBD

root@host3:~# python
>>> from nova.virt.libvirt.storage import rbd_utils

>>> driver = rbd_utils.RBDDriver(pool="images", ceph_conf="/etc/ceph/ceph.conf", rbd_user="glance")
>>> driver.exists("c05cf0ee-6f7f-4b6e-8009-a59ed9a4564a_disk")
False

分析 deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_pre.yml 然后进行手动配置

ceph osd pool create testc 50
ceph auth get-or-create client.testc mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=volumes, allow rwx pool=vms'
ceph auth get-or-create client.testc | tee /etc/ceph/ceph.client.testc.keyring && chown glance:glance /etc/ceph/ceph.client.testc.keyring

root@host3:~# python
>>> from nova.virt.libvirt.storage import rbd_utils
>>> driver = rbd_utils.RBDDriver(pool="vms", ceph_conf="/etc/ceph/ceph.conf", rbd_user="testc")
>>> driver.exists("c05cf0ee-6f7f-4b6e-8009-a59ed9a4564a_disk")
False

Step6 再次 Review Ceph 部分代码

检查修改记录,发现换行处理异常。

diff --git a/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_pre.yml b/deploy/adapte
index ece4154..3ff9df4 100755
--- a/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_pre.yml
+++ b/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_pre.yml
@@ -62,15 +62,26 @@
   when: inventory_hostname in groups['ceph_adm']
 
 - name: create ceph users for openstack
-  shell: ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images' && ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images'
+  shell: |
+    ceph auth get-or-create client.cinder mon 'allow r' osd \
+        'allow class-read object_prefix rbd_children, allow rwx pool=volumes, \
+        allow rwx pool=vms, allow rx pool=images';
+    ceph auth get-or-create client.glance mon 'allow r' osd \
+        'allow class-read object_prefix rbd_children, allow rwx pool=images';
   when: inventory_hostname in groups['ceph_adm']

引号内换行,影响了命令的完整性。

Step7 提交 Patch,修复 Bug

为了通过 Yamllint test,只能对这一行进行规避,做了如下修改:

diff --git a/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_pre.yml b/deploy/adapte
index 3ff9df4..a9eb81a 100755
--- a/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_pre.yml
+++ b/deploy/adapters/ansible/roles/ceph-openstack/tasks/ceph_openstack_pre.yml
@@ -61,14 +61,15 @@
     - vms
   when: inventory_hostname in groups['ceph_adm']
 
+# yamllint disable rule:line-length
 - name: create ceph users for openstack
   shell: |
     ceph auth get-or-create client.cinder mon 'allow r' osd \
-        'allow class-read object_prefix rbd_children, allow rwx pool=volumes, \
-        allow rwx pool=vms, allow rx pool=images';
+        'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, all
     ceph auth get-or-create client.glance mon 'allow r' osd \
         'allow class-read object_prefix rbd_children, allow rwx pool=images';
   when: inventory_hostname in groups['ceph_adm']
+# yamllint enable rule:line-length

修改后在本地进行测试和验证,然后提交 Patch。
Patch 地址:

Bug 引入原因

  • CI verify 取消了 Functest 测试,无法对部署的 OpenStack 进行验证。
  • Yamllint test Patch 修改了大量代码,尤其是对 ansible playbook 的换行处理,而 ansible 换行处理极易出错。

问题反思

  1. CI verify 必须尽快加入相关测试,例如 Functest 的 vping 用例。
  2. 对 ceph 与 OpenStack 的对接过程需要进一步了解。
  3. 提升自己看 log 与定位问题的能力。
  4. 对于代码修改量较大的 Patch,不仅需要在本地进行端到端测试,更应该跑一遍 Yardstick 和 Functest。

相关文章

网友评论

      本文标题:Ceph user for OpenStack 创建问题处理

      本文链接:https://www.haomeiwen.com/subject/xcohwftx.html