美文网首页Troubleshooting
访问符号链接异常总结

访问符号链接异常总结

作者: dumphex | 来源:发表于2018-01-31 21:17 被阅读160次

    前段时间调试代码查看log的时候,偶然发现有虚拟机的异常报错信息.当时怀着好奇心尝试复现和分析了下.现在看来,这是一个既奇怪又很有意思的问题.(前半部分初步分析和实验由我完成,后半部分深入kernel分析和root cause由leon.wang同学完成, 在此表示感谢)

    问题现象

    • 手机

      • 64bit芯片平台
      • Android O
      • Linux version 4.4.83-01055-g0f06e44
    • 复现步骤

      • 恢复初厂设置或烧完版本首次开机
      • 低概率出现开机黑屏时间略长,但手机仍能正常使用.

    定位分析

    初步分析

    通过复现的多份logcat, 发现ART虚拟机有类似如下的错误打印信息

    12-21 09:03:03.909  3296  3296 E zygote  : Could not create image space with image file '/system/framework/boot.art'. Attempting to fall back to imageless running. Error was: Failed to open oat file '/data/dalvik-cache/arm/system@framework@boot-conscrypt.oat' referenced from image /data/dalvik-cache/arm/system@framework@boot-conscrypt.art: File /data/dalvik-cache/arm/system@framework@boot-conscrypt.oat does not exist.Cannot relocate image /system/framework/boot-conscrypt.art to /data/dalvik-cache/arm/system@framework@boot-conscrypt.art: Cannot patch a secondary image.
    12-21 09:03:03.909  3296  3296 E zygote  : Attempted image: /system/framework/boot-conscrypt.art
    
    01-01 04:18:22.538  3034  3034 E zygote64: Could not create image space with image file '/system/framework/boot.art'. Attempting to fall back to imageless running. Error was: Failed to open oat file '/data/dalvik-cache/arm64/system@framework@boot-org.apache.http.legacy.boot.oat' referenced from image /data/dalvik-cache/arm64/system@framework@boot-org.apache.http.legacy.boot.art: File /data/dalvik-cache/arm64/system@framework@boot-org.apache.http.legacy.boot.vdex does not exist.Cannot relocate image /system/framework/boot-org.apache.http.legacy.boot.art to /data/dalvik-cache/arm64/system@framework@boot-org.apache.http.legacy.boot.art: Cannot patch a secondary image.
    01-01 04:18:22.538  3034  3034 E zygote64: Attempted image: /system/framework/boot-org.apache.http.legacy.boot.art
    

    由上述错误分析可知,问题原因是由于找不到指定的oat/vdex文件造成的.

    出现以上类似错误后,adb shell到手机到/data/dalvik-cache/<isa>目录下,会发现目录内容为空.

    通过log,再次尝试去复现.zygote64正在操作/data/dalvik-cache/arm64/system@framework@boot-legacy-test.art

    01-01 05:59:38.790  3033  3033 I zygote64: image_location = /system/framework/boot-legacy-test.art, found_image = 1, has_system = 1, has_cache = 1 
    01-01 05:59:38.791  3033  3033 I zygote64: ImageSpace::Init entering image_filename=/data/dalvik-cache/arm64/system@framework@boot-legacy-test.art
    01-01 05:59:38.791  3033  3033 I zygote64: relocate = 1 
    

    但没有找到相应的oat文件(只有art/vdex文件)

    -rw-r--r-- 1 root root   24576 2012-01-01 05:59 system@framework@boot-legacy-test.art
    lrwxrwxrwx 1 root root      50 2012-01-01 05:59 system@framework@boot-legacy-test.vdex -> /system/framework/arm64/boot-legacy-test.vdex
    

    正常流程应是在/data/dalvik-cache/<isa>目录下生成如下三个文件

    -rw-r--r-- 1 root root   20480 2012-01-01 05:59 system@framework@boot-legacy-test.art
    lrwxrwxrwx 1 root root      46 2012-01-01 05:59 system@framework@boot-legacy-test.oat -> /system/framework/arm/boot-legacy-test.oat
    lrwxrwxrwx 1 root root      46 2012-01-01 05:59 system@framework@boot-legacy-test.vdex -> /system/framework/arm/boot-legacy-test.vdex
    

    由于找不到oat/vdex文件,此后zygote会进入异常分支处理,将/data/dalvik-cache/<isa>/目录清空,并打印相关的异常错误log.

    通过进一步的流程梳理, 最后定位到patchoat模块的symlink系统调用可能存在问题.

      // Create a symlink from the old oat to the new oat
      if (symlink(input_oat_filename.c_str(), output_oat_filename.c_str()) < 0) {
        int err = errno;
        LOG(ERROR) << "Failed to create symlink at " << output_oat_filename
                   << " error(" << err << "): " << strerror(err);
        return false;
      }
    

    从现场来看

    • 创建符号链接可能出现了问题,导致用户态程序误以为文件生成成功.从而后续虚拟机在映射oat/vdex文件到内存时出现找不到文件的异常情况.
      • 从log输出来看,这里没有打印出报错信息,因此,symlink()应是创建成功的.
    • 或者创建符号链接成功,但后续访问符号链接期间出现问题.

    实验

    由于问题复现概率较低,因此第一要务是先提高复现概率.

    把低概率问题巧妙地转化为高概率问题甚至必现问题的技巧如下:

    编写测试程序,针对symlink系统调用进行循环压力测试.

    思路如下:

    1. 检测符号链接是否存在.若存在,则unlink删除
    2. 调用symlink创建符号链接
    3. 检查符号链接是否存在
      • 若不存在,则打印错误log并sleep保留现场
      • 若存在,则回到1继续循环.

    经测试验证,测试程序在手机上均能稳定复现(5分钟内).

    ... ...
    438: we will unlink /data/dalvik-cache/arm64/tim.target
    we will link /data/dalvik-cache/arm64/tim.target
    
    439: we will unlink /data/dalvik-cache/arm64/tim.target
    we will link /data/dalvik-cache/arm64/tim.target
    
    symlink didn't succeed. /data/dalvik-cache/arm64/tim.target doesn't exist.
    

    查看一下符号链接

    # ls -l /data/dalvik-cache/arm64/tim.target
    ls: /data/dalvik-cache/arm64/tim.target: No such file or directory
    

    删除符号链接所在子目录

    # rm -rf /data/dalvik-cache/arm64/
    rm: /data/dalvik-cache/arm64//tim.target: No such file or directory
    rm: /data/dalvik-cache/arm64/: Directory not empty
    

    深入分析

    • 由于工作繁忙,kernel部分的工作转给稳定性同学leon.wang继续跟踪处理.

    • source

      • kernel/fs/f2fs/namei.c

    创建符号链接

    由于配置了"CONFIG_F2FS_FS_ENCRYPTION=y", 创建符号链接时,符号链接名会进行加密处理.

    357static int f2fs_symlink(struct inode *dir, struct dentry *dentry,
    358                 const char *symname)
    359{
    360 struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
    361 struct inode *inode;
    362 size_t len = strlen(symname);
    363 size_t p_len;
    364 char *p_str;
    365 struct f2fs_str disk_link = FSTR_INIT(NULL, 0);
    366 struct f2fs_encrypted_symlink_data *sd = NULL;
    367 int err;
    368
    369 if (len > dir->i_sb->s_blocksize)
    370     return -ENAMETOOLONG;
    371
    372 inode = f2fs_new_inode(dir, S_IFLNK | S_IRWXUGO);
    373 if (IS_ERR(inode))
    374     return PTR_ERR(inode);
    375
    376 if (f2fs_encrypted_inode(inode))
    377     inode->i_op = &f2fs_encrypted_symlink_inode_operations;
    378 else
    379     inode->i_op = &f2fs_symlink_inode_operations;
    380 inode->i_mapping->a_ops = &f2fs_dblock_aops;
    381
    382 f2fs_balance_fs(sbi, true);
    383
    384 f2fs_lock_op(sbi);
    385 err = f2fs_add_link(dentry, inode);
    386 if (err)
    387     goto out;
    388 f2fs_unlock_op(sbi);
    389 alloc_nid_done(sbi, inode->i_ino);
    390
    391 if (f2fs_encrypted_inode(dir)) {
    392     struct qstr istr = QSTR_INIT(symname, len);
    393
    394     err = f2fs_get_encryption_info(inode);
    395     if (err)
    396         goto err_out;
    397
    398     err = f2fs_fname_crypto_alloc_buffer(inode, len, &disk_link);
    399     if (err)
    400         goto err_out;
    401
    402     err = f2fs_fname_usr_to_disk(inode, &istr, &disk_link);
    403     if (err < 0)
    404         goto err_out;
    405
    406     p_len = encrypted_symlink_data_len(disk_link.len) + 1;
    407
    408     if (p_len > dir->i_sb->s_blocksize) {
    409         err = -ENAMETOOLONG;
    410         goto err_out;
    411     }
    412
    413     sd = kzalloc(p_len, GFP_NOFS);
    414     if (!sd) {
    415         err = -ENOMEM;
    416         goto err_out;
    417     }
    418     memcpy(sd->encrypted_path, disk_link.name, disk_link.len);
    419     sd->len = cpu_to_le16(disk_link.len);
    420     p_str = (char *)sd;
    421 } else {
    422     p_len = len + 1;
    423     p_str = (char *)symname;
    424 }
    425
    426 err = page_symlink(inode, p_str, p_len);
    427
    428err_out:
    429 d_instantiate(dentry, inode);
    430 unlock_new_inode(inode);
    431
    432 /*
    433  * Let's flush symlink data in order to avoid broken symlink as much as
    434  * possible. Nevertheless, fsyncing is the best way, but there is no
    435  * way to get a file descriptor in order to flush that.
    436  *
    437  * Note that, it needs to do dir->fsync to make this recoverable.
    438  * If the symlink path is stored into inline_data, there is no
    439  * performance regression.
    440  */
    441 if (!err) {
    442     filemap_write_and_wait_range(inode->i_mapping, 0, p_len - 1);
    443
    444     if (IS_DIRSYNC(dir))
    445         f2fs_sync_fs(sbi->sb, 1);
    446 } else {
    447     f2fs_unlink(dir, dentry);
    448 }
    449
    450 kfree(sd);
    451 f2fs_fname_crypto_free_buffer(&disk_link);
    452 return err;
    453out:
    454 handle_failed_inode(inode);
    455 return err;
    456}
    

    访问符号链接

    当访问符号链接时(access()等), 需要先解密.

    975#ifdef CONFIG_F2FS_FS_ENCRYPTION
    976static const char *f2fs_encrypted_follow_link(struct dentry *dentry, void **cookie)
    977{
    978 struct page *cpage = NULL;
    979 char *caddr, *paddr = NULL;
    980 struct f2fs_str cstr = FSTR_INIT(NULL, 0);
    981 struct f2fs_str pstr = FSTR_INIT(NULL, 0);
    982 struct inode *inode = d_inode(dentry);
    983 struct f2fs_encrypted_symlink_data *sd;
    984 loff_t size = min_t(loff_t, i_size_read(inode), PAGE_SIZE - 1);
    985 u32 max_size = inode->i_sb->s_blocksize;
    986 int res;
    987
    988 res = f2fs_get_encryption_info(inode);
    989 if (res)
    990     return ERR_PTR(res);
    991
    992 cpage = read_mapping_page(inode->i_mapping, 0, NULL);
    993 if (IS_ERR(cpage))
    994     return ERR_CAST(cpage);
    995 caddr = kmap(cpage);
    996 caddr[size] = 0;
    997
    998 /* Symlink is encrypted */
    999 sd = (struct f2fs_encrypted_symlink_data *)caddr;
    1000    cstr.len = le16_to_cpu(sd->len);
    1001
    1002    /* this is broken symlink case */
    1003    if (unlikely(cstr.len == 0)) {
    1004        res = -ENOENT;
    1005        goto errout;
    1006    }
    1007    cstr.name = kmalloc(cstr.len, GFP_NOFS);
    1008    if (!cstr.name) {
    1009        res = -ENOMEM;
    1010        goto errout;
    1011    }
    1012    memcpy(cstr.name, sd->encrypted_path, cstr.len);
    1013
    1014    /* this is broken symlink case */
    1015    if (unlikely(cstr.name[0] == 0)) {
    1016        res = -ENOENT;
    1017        goto errout;
    1018    }
    1019
    1020    if ((cstr.len + sizeof(struct f2fs_encrypted_symlink_data) - 1) >
    1021                                max_size) {
    1022        /* Symlink data on the disk is corrupted */
    1023        res = -EIO;
    1024        goto errout;
    1025    }
    1026    res = f2fs_fname_crypto_alloc_buffer(inode, cstr.len, &pstr);
    1027    if (res)
    1028        goto errout;
    1029
    1030    res = f2fs_fname_disk_to_usr(inode, NULL, &cstr, &pstr);
    1031    if (res < 0)
    1032        goto errout;
    1033
    1034    kfree(cstr.name);
    1035
    1036    paddr = pstr.name;
    1037
    1038    /* Null-terminate the name */
    1039    paddr[res] = '\0';
    1040
    1041    kunmap(cpage);
    1042    page_cache_release(cpage);
    1043    return *cookie = paddr;
    1044errout:
    1045    kfree(cstr.name);
    1046    f2fs_fname_crypto_free_buffer(&pstr);
    1047    kunmap(cpage);
    1048    page_cache_release(cpage);
    1049    return ERR_PTR(res);
    1050}
    

    review这里的代码,会发现这里逻辑上有误. 加密后的密文的每个byte是可以为'\0'的,但解密后的明文首字节就不可以(真正的目标文件,不可为空).

    以复现的加密数据为例

    [f2fs_encrypted_follow_link] symlink cstr.name=[0 99 1a 2c a1 6f 72 c5 1c b5 | f7 16 a3 5 7e fb dd 18 5c 46 | 79 33 49 6c c6 9d cd a3 6b] 
    

    如果遇到上面这种数据,因为1015行的判断,导致错误认为'\0'开头的加密数据是broken的,就会报错-ENOENT.

    Root Cause

    • 此问题是由于文件系统加密,在读取symlink相关的文件时遇到kernel原生bug引起的.问题根因不在symlink()系统调用本身,而是symlink加密后,若密文首字节为'\0',则下次访问符号链接时会触发kernel隐藏的原生bug, 从而造成访问符号链接返回"No such file or directory"错误.

    • 问题低概率的原因在于加密后密文首字节是否为'\0'. 通过循环压力测试,则能大大提高此类问题复现的命中率,从而加速此问题得以圆满解决.

    解决方案

    • patch

    • 验证

      • 使用以上patch进行验证,测试程序长时间都能正常运行.

    相关文章

      网友评论

        本文标题:访问符号链接异常总结

        本文链接:https://www.haomeiwen.com/subject/tfalzxtx.html