美文网首页OpenStack
故障定位:netapp(nfs)从云盘快照创建云盘失败Volum

故障定位:netapp(nfs)从云盘快照创建云盘失败Volum

作者: 余亚飞 | 来源:发表于2020-03-08 12:02 被阅读0次

    一背景

    cinder对接了netapp(nfs),创建云盘和快照功能正常, 但是从云盘快照创建云盘失败,报错

    Volume xxx could not be created on shares.
    

    二 定位过程

    1. 从云盘快照创建云盘日志如下:
    2020-03-06 12:28:57.875 2290124 INFO cinder.volume.flows.manager.create_volume [req-fd7cb0cc-5000-4629-a040-8353c9d780a7 6d612f674ec84cb28ed4a6a25b1e5d8a 92c2b6ef30d4446bbc0b6f4c1ef91775 - - -] Volume eadb07ab-869f-4abb-b617-06162cd71ae6: being created as snap with specification: {'status': u'creating', 'volume_size': 40, 'volume_name': 'volume-eadb07ab-869f-4abb-b617-06162cd71ae6', 'snapshot_id': '4a0f0f68-9998-4271-9fff-cfcb4646de33'}
    2020-03-06 12:29:44.365 2290124 WARNING cinder.volume.drivers.netapp.dataontap.nfs_base [req-fd7cb0cc-5000-4629-a040-8353c9d780a7 6d612f674ec84cb28ed4a6a25b1e5d8a 92c2b6ef30d4446bbc0b6f4c1ef91775 - - -] Discover file retries exhausted.
    2020-03-06 12:29:44.367 2290124 ERROR cinder.volume.drivers.netapp.dataontap.nfs_base [req-fd7cb0cc-5000-4629-a040-8353c9d780a7 6d612f674ec84cb28ed4a6a25b1e5d8a 92c2b6ef30d4446bbc0b6f4c1ef91775 - - -] Exception creating volume eadb07ab-869f-4abb-b617-06162cd71ae6 from source snapshot-4a0f0f68-9998-4271-9fff-cfcb4646de33 on share 172.190.68.60:/DEV_R1_8200_C01_SVM_SAS_vol1.
    
    1. 其中第三行日志是netapp的驱动(drivers.netapp.dataontap.nfs_base)报错,从失败的Traceback中看到报错信息:
    VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Volume eadb07ab-869f-4abb-b617-06162cd71ae6 could not be created on shares.
    
    1. 根据Traceback提示File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/netapp/dataontap/nfs_base.py", line 175,
      查看netapp驱动代码,是函数self._discover_file_till_timeout(path)返回了False导致的错误
        def _clone_with_extension_check(self, source, destination_volume):
            source_size = source['size']
            source_id = source['id']
            source_name = source['name']
            destination_volume_size = destination_volume['size']
            self._clone_backing_file_for_volume(source_name,
                                                destination_volume['name'],
                                                source_id)
            path = self.local_path(destination_volume)
            if self._discover_file_till_timeout(path):
                self._set_rw_permissions(path)
                if destination_volume_size != source_size:
                    try:
                        self.extend_volume(destination_volume,
                                           destination_volume_size)
                    except Exception:
                        LOG.error(_LE("Resizing %s failed. Cleaning "
                                      "volume."), destination_volume['name'])
                        self._cleanup_volume_on_failure(destination_volume)
                        raise exception.CinderException(
                            _("Resizing clone %s failed.")
                            % destination_volume['name'])
            else:
                raise exception.CinderException(_("NFS file %s not discovered.")
                                                % destination_volume['name'])
    
    1. 继续查看_discover_file_till_timeout函数, 是找不到新创建的volume的path导致的,而日志中的打印(Discover file retries exhausted.)也正好印证了这个结论。
    def _discover_file_till_timeout(self, path, timeout=45):
        """Checks if file size at path is equal to size."""
        # Sometimes nfs takes time to discover file
        # Retrying in case any unexpected situation occurs
        retry_seconds = timeout
        sleep_interval = 2
        while True:
            if os.path.exists(path):
                return True
            else:
                if retry_seconds <= 0:
                    LOG.warning(_LW('Discover file retries exhausted.'))
                    return False
                else:
                    time.sleep(sleep_interval)
                    retry_seconds -= sleep_interval
    
    1. 但是登陆环境却发现在对应的路径下是存在云盘的


    6.增加打印日志,环境上存在volume的path时, 代码执行os.path.exists返回False,手动执行却返回True。os.path.exists(path)官网说明:

    os.path.exists(path)¶
    Return True if path refers to an existing path. Returns False for broken symbolic links. On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.
    

    7.查看mount目录权限为666, 因此怀疑是代码执行时使用的cinder用户权限不足导致的


    1. 给netapp(nfs)挂载目录加可执行权限x后, 恢复正常。

    三 对于linux权限

    r(Read,读取,权限值为4):对文件而言,具有读取文件内容的权限;对目录来说,具有浏览目 录的权限。
    
    w(Write,写入,权限值为2):对文件而言,具有新增、修改文件内容的权限;对目录来说,具有删除、移动目录内文件的权限。
    
    x(eXecute,执行,权限值为1):对文件而言,具有执行文件的权限;对目录了来说该用户具有进入目录的权限。
    

    关于权限的简单测试

    1. 切换到root用户,创建test目录,并且设置权限为666,没有可执行权限。
    root@HP-Laptop:/home/root# mkdir test
    root@HP-Laptop:/home/root# ll
    总用量 12
    drwxr-xr-x 3 root root 4096 3月   8 11:28 ./
    drwxr-xr-x 4 root root 4096 3月   8 11:27 ../
    drwxr-xr-x 2 root root 4096 3月   8 11:28 test/
    root@HP-Laptop:/home/root# chmod 666 test/
    root@HP-Laptop:/home/root# ll 
    总用量 12
    drwxr-xr-x 3 root root 4096 3月   8 11:28 ./
    drwxr-xr-x 4 root root 4096 3月   8 11:27 ../
    drw-rw-rw- 2 root root 4096 3月   8 11:28 test/
    root@HP-Laptop:/home/root# cd test
    root@HP-Laptop:/home/root/test# touch test.py
    root@HP-Laptop:/home/root/test# chmod 666 test.py 
    root@HP-Laptop:/home/root/test# ll
    总用量 8
    drwxr-xr-x 2 root root 4096 3月   8 11:28 ./
    drwxr-xr-x 3 root root 4096 3月   8 11:28 ../
    -rw-rw-rw- 1 root root    0 3月   8 11:28 test.py
    
    1. 切换到普通用户, os.path.exists由于权限不足返回False。
    yu@HP-Laptop:~$ python
    Python 2.7.17 (default, Nov  7 2019, 10:07:09) 
    [GCC 7.4.0] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> print(os.path.exists("/home/root/test/test.py"))
    False
    >>> 
    
    1. 切换到root用户,给test目录751权限,增加可执行权限。
    root@HP-Laptop:/home/root# chmod 751 test
    root@HP-Laptop:/home/root# ll
    总用量 12
    drwxr-xr-x 3 root root 4096 3月   8 11:32 ./
    drwxr-xr-x 4 root root 4096 3月   8 11:27 ../
    drwxr-x--x 2 root root 4096 3月   8 11:28 test/
    -rw-r--r-- 1 root root    0 3月   8 11:32 test.py
    
    1. 切换到普通用户,os.path.exists返回了True。
    yu@HP-Laptop:~$ python
    Python 2.7.17 (default, Nov  7 2019, 10:07:09) 
    [GCC 7.4.0] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os
    >>> print(os.path.exists("/home/root/test/test.py"))
    True
    >>> 
    

    相关文章

      网友评论

        本文标题:故障定位:netapp(nfs)从云盘快照创建云盘失败Volum

        本文链接:https://www.haomeiwen.com/subject/gulddhtx.html