使用vdjtools进行免疫组库分析

作者: 不会生信 | 来源:发表于2021-12-22 20:07 被阅读0次

    mixcr与vdjtools是基于java平台开发的处理从原始序列到定量克隆型的大量免疫组数据的免疫分析软件,在使用前要确保java环境是ok的。
    官网下载 Java Runtime Environment,jre是java的运行环境。

    java -version #检查java环境是否ok

    下载vdjtools并安装,latest release
    vdjtools的可视化依赖于R的一些可视化包,安装所需要的R包。

    使用vjtools自带命令安装

    java -jar /path to vdjtools/vdjtools-1.2.1.jar Rinstall
    

    也可以在R中手动安装


    将分析好的数据转换为vdjtools可识别的格式,上游分析参考使用mixcr构建免疫组库及下游分析

    构建分组文件
    分组文件应包含所有样本名以及样本所在位置。

    metada.txt
    # convert 
    java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr -m metadata.txt output_prefix
    #or
    java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr sample1.txt sample2.txt ...  output_prefix
    # /path to vdjtools/:  vdjtolls的安装路径
    #output_prefix: 输出路径
    

    转换完后的表格

    转换结果

    1.Basic analysis

    1.1 CalcBasicStats

    This routine computes a set of basic sample statistics, such as read counts, number of clonotypes, etc.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats sample1.txt sample2.txt ... output_prefix
    #or
    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats -m metadata.txt output_prefix
    # /path to vdjtools/:  vdjtolls的安装路径
    #output_prefix: 输出路径
    
    all.basicstats.txt

    Tabular output

    The following table with .basicstats.txt suffix is generated,

    Column Description
    sample_id Sample unique identifier
    Metadata columns. See Metadata section
    count Number of reads in a given sample
    diversity Number of clonotypes in a given sample
    mean_frequency Mean clonotype frequency
    geomean_frequency Geometric mean of clonotype frequency
    nc_diversity Number of non-coding clonotypes
    nc_frequency Frequency of reads that belong to non-coding clonotypes
    mean_cdr3nt_length Mean length of CDR3 nucleotide sequence. Weighted by clonotype frequency
    mean_insert_size Mean number of inserted random nucleotides in CDR3 sequence. Characterizes V-J insert for receptor chains without D segment, or a sum of V-D and D-J insert sizes
    mean_ndn_size Mean number of nucleotides that lie between V and J segment sequences in CDR3
    convergence Mean number of unique CDR3 nucleotide sequences that code for the same CDR3 amino acid sequence

    1.2 CalcSegmentUsage

    This routine computes Variable (V) and Joining (J) segment usage vectors.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "disease_state" -m metadata.txt ./results/desease_state
    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "Sex" -m metadata.txt ./results/Sex
    #-p : 画图,依赖于R包
    #-f  : 指定分组依据,分组信息在metadata文件中
    #--plot-type png 输出png图片
    
    output
    disease_state.segments.wt.V

    1.3 CalcSpectratype

    Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSpectratype -a -m metadata.txt output_prefix
    #-a :Will use CDR3 amino acid sequences for calculation instead of nucleotide ones
    
    output
    aa:CDR3的氨基酸序列长度的频率分布
    insert: CDR3序列中V-J/V-D/D-J插入的核苷酸序列长度的频率分布
    ndn:CDR3序列中V和J片段中间的核苷酸序列长度的频率分布

    1.4 PlotFancySpectratype

    Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample.This plot allows to detect the highly-expanded clonotypes.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancySpectratype -t 5 sample1.txt output_prefix
    #-t:Number of top clonotypes to visualize. Should not exceed 20, default is 10
    #单一样本
    
    fancyspectra

    1.5 PlotFancyVJUsage

    Plots a circos-style V-J usage plot displaying the frequency of various V-J junctions.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancyVJUsage sample.txt output_prefix
    # -u: Instead of counting read frequency, will count the number of unique clonotypes
    
    fancyvj.wt

    1.6 PlotSpectratypeV

    Plots a detailed spectratype containing additional info displays CDR3 length distribution for clonotypes from top N Variable segment families.This plot is useful to detect type 1 and type 2 repertoire biases, that could arise under pathological conditions.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotSpectratypeV sample.txt output_prefix
    # -u: Instead of counting read frequency, will count the number of unique clonotypes
    # -t: Number of top (by frequency) V segments to visualize. Should notexceed 12 default is 12
    
    spectraV.wt

    2.Diversity estimation

    2.1 PlotQuantileStats

    Plots a three-layer donut chart to visualize the repertoire clonality.

    • First layer (“set”) includes the frequency of singleton (“1”, met once), doubleton (“2”, met twice) and highorder(“3+”, met three or more times) clonotypes.
    • The second layer (“quantile”), displays the abundance of top 20% (“Q1”), next 20% (“Q2”), ... (up to “Q5”)
    clonotypes for clonotypes from “3+” set.
    • The last layer (“top”) displays the individual abundances of top N clonotypes.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotQuantileStats -t 10 sample.txt output_prefix
    #-t:Number of top clonotypes to visualize. Should not exceed 10, default is 5
    
    qstat

    2.2 RarefactionPlot

    Plots rarefaction curves for specified list of samples, that is, the dependencies between sample diversity and sample size.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar RarefactionPlot -m metadata.txt output_prefix
    #-f: factor
    
    rarefaction.strict
    Solid and dashed lines mark interpolated and extrapolated regions of rarefaction curves respectively,
    points mark exact sample size and diversity. Shaded areas mark 95% confidence intervals.

    实线和虚线分别表示稀疏曲线的实际和外推区域,点表示实际的样本大小和多样性。阴影区域表示95%置信区间

    2.3 CalcDiversityStats

    多样性估计,输出两个表格,一个是原始数据的多样性计算,另一个是在原始数据上外推的多样性计算。

    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcDiversityStats -m metadata.txt output_prefix
    
    all.diversity.strict.resampled

    3.Repertoire overlap analysis

    Clonotype sharing between samples

    3.1 OverlapPair

    Performs a comprehensive analysis of clonotype sharing for a pair of samples.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar OverlapPair -p --plot-area-v2 sample1.txt sample2.txt output_prefix
    #-p: plot
    #--plot-area-v2:Alternative plotting mode, clonotype CDR3 sequences are shown at plot sides and connected to corresponding areas with lines.
    

    Overlap type

    Shorthand Rule Note
    strict CDR3nt (AND) V (AND) J (AND) SHMs Require full match for receptor nucleotide sequence
    nt CDR3nt
    ntV CDR3nt (AND) V
    ntVJ CDR3nt (AND) V (AND) J
    aa CDR3aa
    aaV CDR3aa (AND) V
    aaVJ CDR3aa (AND) V (AND) J
    aa!nt CDR3aa (AND)((NOT) CDR3nt ) Removes nearly all contamination bias from overlap results. Should not be used for samples from the same donor/tracking experiments
    strict.paired.scatter
    paired.strict.table.collapsed

    Clonotype scatterplot. Main frame contains a scatterplot of clonotype abundances (overlapping clonotypes only) and a linear regression. Point size is scaled to the geometric mean of clonotype frequency in both samples. Scatterplot axes represent log10 clonotype frequencies in each sample. Two marginal histograms show the overlapping (red) and total clonotype (grey) abundance distributions in corresponding sample. Histograms are weighted by clonotype abundance, i.e. they display read distribution by clonotype size.
    Shared clonotype abundance plot. Plot shows details for top 20 clonotypes shared between samples, as well as collapsed (“NotShown”) and non-overlapping (“NonOverlapping”) clonotypes. Clonotype CDR3 amino acid sequence is plotted against the sample where the clonotype reaches maximum abundance.

    CalcPairwiseDistances

    Performs an all-versus-all pairwise overlap for a list of samples and computes a set of repertoire similarity measures. At least 3 samples should be provided.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p  [sample1.txt sample2.txt sample3.txt or -m metadata.txt] output_prefix
    #-p: plot
    
    intersect.batch.aa

    Pairwise overlap circos plot. Count, frequency and diversity panels correspond to the read count, frequency (both non-symmetric) and the total number of clonotypes that are shared between samples. Pairwise overlaps are stacked, i.e. segment arc length is not equal to sample size.

    ClusterSamples

    CalcPairwiseDistances的输出文本作为输入进行聚类分析。

    java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p  input_prefix output_prefix
    #input_prefix等于 calcpariwiseDistance 中的 output_prefix (不用加后缀)
    #-p: plot
    #-f: factor
    #-n:Specifies if plotting factor is continuous
    

    比如:
    java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p e:/data/ -m metadata.txt e:/results/all
    java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p -f "Sex" e:/results/all e:/results/Sex

    官方给的参考图片

    image

    TestClusters

    This routine allows to test whether a given factor influences repertoire clustering. It assesses compactness of samples that have the same factor level and separation between samples with distinct factor levels for the factor specified in ClusterSamples.
    (只有ClusterSamples指定了-f时才可以使用该函数,验证factor是如何影响聚类效果的。)

    java -jar /path to vdjtools/vdjtools-1.2.1.jar TestClusters   input_prefix output_prefix
    

    官方图片

    image

    TrackClonotypes

    This routine performs an all-vs-all intersection between an ordered list of samples for clonotype tracking purposes. User can specify sample which clonotypes will be traced, e.g. the pre-therapy sample.

    java -jar /path to vdjtools/vdjtools-1.2.1.jar TrackClonotypes [options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix
    #-m:metadata
    #-f:factor
    #-p:plot
    

    相关文章

      网友评论

        本文标题:使用vdjtools进行免疫组库分析

        本文链接:https://www.haomeiwen.com/subject/yfvgxrtx.html