美文网首页免疫组库单细胞测序细胞与免疫
利用TRUST4从bulk RNA-seq中重构免疫组数据

利用TRUST4从bulk RNA-seq中重构免疫组数据

作者: Yayamia | 来源:发表于2022-05-07 13:14 被阅读0次

    昨天老师发给我一篇生信女神Shirley Liu的文章,看了里面的内容之后感觉很兴奋~它可以不做免疫组测序,直接从Bulk RNA-seq或者scRNA-seq里面重构得到免疫组的信息。


    中文翻译

    文章要点

    1. Although less sensitive than TCR-seq and BCR-seq, TRUST is able to identify the abundantly expressed and potentially more clonally expanded TCRs/BCRs in the RNA-seq data that are more likely to be involved in antigen binding
    2. Recent years have also seen other computational methods introduced for immune repertoire construction from RNA-seq data, including V’DJer, MiXCR, CATT and ImRep. These methods focus on reconstruction of complementary-determining region3 (CDR3), with limited ability to assemble full-length V(D)J receptor sequences, although CDR1 and CDR2 on the V sequence still contribute considerably to anti- gen recognition and binding.

    TRUST4和其他重构算法相比,它的特点:

    1. 可利用FASTQ或BAM文件
    2. 可重构更长,甚至全长的TCR或BCR序列
    3. 更快更敏感

    虽然TRUST4也可以从单细胞数据中重构,今天我主要想试一试从Bulk中重构

    1. 安装

    git clone https://github.com/liulab-dfci/TRUST4.git
    make
    #我想添加环境变量,但不知道问什么总是失败
    #所以决定再目标文件夹对run-trust4文件创建软链接
    ln -s /home/user/myh/install/TRUST4/run-trust4 /home/user/myh/**/TRUST4_outs
    cd /home/user/myh/**/TRUST4_outs
    ./run-trust4
    #可以使用
    

    2.用法

    官方Usage

    Usage: ./run-trust4 [OPTIONS]
        Required:
            -b STRING: path to bam file
            -1 STRING -2 STRING: path to paired-end read files
            -u STRING: path to single-end read file
            -f STRING: path to the fasta file coordinate and sequence of V/D/J/C genes
        Optional:
            --ref STRING: path to detailed V/D/J/C gene reference file, such as from IMGT database. (default: not used). (recommended) 
            -o STRING: prefix of output files. (default: inferred from file prefix)
            --od STRING: the directory for output files. (default: ./)
            -t INT: number of threads (default: 1)
            --barcode STRING: if -b, bam field for barcode; if -1 -2/-u, file containing barcodes (defaul: not used)
            --barcodeRange INT INT CHAR: start, end(-1 for lenght-1), strand in a barcode is the true barcode (default: 0 -1 +)
            --barcodeWhitelist STRING: path to the barcode whitelist (default: not used)
            --read1Range INT INT: start, end(-1 for length-1) in -1/-u files for genomic sequence (default: 0 -1)
            --read2Range INT INT: start, end(-1 for length-1) in -2 files for genomic sequence (default: 0 -1)
            --UMI STRING: if -b, bam field for UMI; if -1 -2/-u, file containing UMIs (default: not used)
            --umiRange INT INT CHAR: start, end(-1 for lenght-1), strand in a UMI is the true UMI (default: 0 -1 +)
            --mateIdSuffixLen INT: the suffix length in read id for mate. (default: not used)
            --skipMateExtension: do not extend assemblies with mate information, useful for SMART-seq (default: not used)
            --abnormalUnmapFlag: the flag in BAM for the unmapped read-pair is nonconcordant (default: not set)
            --noExtraction: directly use the files from provided -1 -2/-u to assemble (default: extraction first)
            --repseq: the data is from TCR-seq or BCR-seq (default: not set)
            --outputReadAssignment: output read assignment results to the prefix_assign.out file (default: no output)
            --stage INT: start TRUST4 on specified stage (default: 0)
                0: start from beginning (candidate read extraction)
                1: start from assembly
                2: start from annotation
                3: start from generating the report table
    

    我的数据是小鼠的数据,先用一个Fastq文件试一试

    ./run-trust4 -f /home/user/myh/install/TRUST4/mouse/GRCm38_bcrtcr.fa --ref /home/user/myh/install/TRUST4/mouse/mouse_IMGT+C.fa -1 /home/user/myh/raw_data/AEKIBULK/inputs/clean_data/KI_T/KIT11_1.clean.fq.gz -2 /home/user/myh/raw_data/AEKIBULK/inputs/clean_data/KI_T/KIT11_2.clean.fq.gz -o KIT11
    
    

    可以通过-t调节可用的线程数

    看到这里表示已经跑完了

    因为我的数据里面是分选了T细胞和B细胞的,但我用T细胞的数据跑也能重构到BCR的结果,Emmm

    注意一下TRUST4跑完是不会主动生成文件夹的,所有的结果都散在那里……

    XX_report.tsv里面有如下信息:

    可直接用于immunarch

    还会生成airr文件,也可用于immunarch分析

    对于T细胞的结果,我把BCR链删掉后,用immunarch进行后续分析

    补充一点关于用VDJtools分析的内容
    下载好VDJtools后
    参考

    1.Basic analysis
    1.1 CalcBasicStats

    java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcBasicStats -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
    # /path to vdjtools/:  vdjtolls的安装路径
    #output_prefix: 输出路径
    

    VDJtools的格式
    注意在CDR3aa里面,要删除out_of_frame的内容,不然vdjtools无法识别

    1.2 CalcSegmentUsage

    java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "group" -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs 
    
    #-p : 画图,依赖于R包
    #-f  : 指定分组依据,分组信息在metadata文件中
    #--plot-type png 输出png图片
    

    1.3 CalcSpectratype
    Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length.

    java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcSpectratype -a -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
    #-a :Will use CDR3 amino acid sequences for calculation instead of nucleotide ones
    

    1.4 PlotFancySpectratype
    Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample.This plot allows to detect the highly-expanded clonotypes.

    java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar PlotFancySpectratype -t 5 /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/AE_T_5.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
    #-t:Number of top clonotypes to visualize. Should not exceed 20, default is 10
    #单一样本
    
    

    下面这个不知道为啥没跑出来

    java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcPairwiseDistances -p -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
    #-p: plot
    

    相关文章

      网友评论

        本文标题:利用TRUST4从bulk RNA-seq中重构免疫组数据

        本文链接:https://www.haomeiwen.com/subject/xnpturtx.html