featureCounts

作者: 一只小脑斧 | 来源:发表于2022-03-07 10:27 被阅读0次

RNA-seq上游分析的最后一站：featureCounts
featureCounts帮助文档
featureCounts
RBP AS生信流程（五）RBP定量和AS事件相关性
RNA-seq上游分析（3）-使用featureCounts对b
snakemake 转录组上游pipeline
Merge counts data of run-feature
R脚本--FeatureCounts计算reads counts
从原始的FASTQ生成TPM
1.8featureCounts

参考：

https://www.jianshu.com/p/16dbbb824e3c

https://www.jianshu.com/p/fbcf00a46574

####安装

cd /home/yifan/software

wget --no-check-certificate https://sourceforge.net/projects/subread/files/subread-2.0.2/subread-2.0.2-Linux-x86_64.tar.gz

tar -xzvf subread-2.0.2-Linux-x86_64.tar.gz

cd subread-2.0.2-Linux-x86_64/bin/

ls -h

运行

/public/vip/biosoft/subread-2.0.2-Linux-x86_64/bin/featureCounts

####输入文件

1）reads的比对情况，这种信息通常都用BAM/ SAM文件来存储

2）区间注释文件，支持两种格式

在定量的时候，支持对单个feature 定量（对外显子定量），也支持对meta-feature 进行定量（对基因进行定量）， meta-feature 的定量是属于同一meta-features 下的所有features 的总和；

当reads 比对到2个或者以上的features 时，默认情况下，featureCounts在统计时会忽略到这部分reads, 如果你想要统计上这部分reads, 可以添加-O 参数，此时一条reads 比对到多个feature，每个feature 定量时，都会加1，对于meta-features 来说，如果比对到多个features 属于同一个 meta-features(比如一条reads比对到了exon, 但这些exon 属于同一个gene), 则对于这个gene 而言，只会计数1次；

总之，不管对于feature 还是meta-feature, 只有比对多个不同的区间时，才会分别计数；

######使用

##单个样本定量

featureCounts -T 5 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping.sam

###多个样本归一化

featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt library1.bam library2.bam library3.bam

# 运行featureCounts对所有样本进行基因水平定量

gtf=/teach/database/gtf/gencode.v25.annotation.gtf.gz

featureCounts -T 1 -p \

-t exon -g gene_id \

-a $gtf -o all.id.txt *.sort.bam

#解释：-T 1：线程数为1；-p：表示数据为paired-end，双末端测序数据；-t exon表示 feature名称为exon；-g gene_id表示meta-feature名称为gene_id(ensembl名称)；-a $gtf 表示输入的GTF基因组注释文件；-o all.id.txt 设置输出文件类型；*.sort.bam是被分析的bam文件。featureCounts支持通配符*

# 只保留all.id.txt文件的【基因名】和【样本counts】

下面对几个常用的选项详细解释一下：

-F # 指定区间注释文件的格式，默认是GTF

-o # 输出文件:可输出raw counts的txt文本及raw counts的summary文本

-p # 针对paired-end数据

-g # 指定统计的meta-feature名称，默认是gene_id

-t # 指定统计的feature名称，默认是exon。可有CDS,exon,five_prime_utr,gene,Selenocysteine,start_codon, stop_codon,three_prime_utr,transcript

-T # 指定线程数

-a : 指定的区间注释文件，默认是gtf格式

-T : 线程数，默认是1

-t : 想要统计的feature 的名称，取值范围是gtf 文件中的第3列的值，默认是exon

-g : 想要统计的meta-feature 的名称，取值范围参考gtf 第9列注释信息，gtf 的第9列为 key=value 的格式， -g 参数可能的取值就是所有的key, 默认值是gene_id。准备gtf 文件时，要确保-g 参数指定的值都是唯一标识符，才能达到预期的效果；

####输出结果解读

Geneid：基因的ensemble基因号；

Chr：多个feature所在的染色体编号;

Start：多个feature起始位点,与前面一一对应

End：多个feature终止位点,与前面一一对应

Strand：正负链

Length：基因长度

sampleID：一列代表一个样本，数值表示比对到该基因上的read数目

###MultiQC分析featureCounts结果

multiqc all.id.txt.summary #对featureCounts的结果进行整合，html文件可视化

###########own

cd /home/yifan/project/CZM/exo.m6A/Data/CleanData/fastp.results/4.bowtie2.results

gtf=/home/yifan/data/ref/mouse/Mus_musculus.GRCm38.90.gtf

for i in LR22A21FY{19..24}

do

featureCounts -T 20 -p \

-t exon -g gene_id \

-a $gtf -o input.count.txt ${i}.sorted.bam

done

# 只保留all.id.txt文件的【基因名】和【样本counts】

cat all.id.txt | cut -f1,7- > counts.txt

###########帮助文档

Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...

## Mandatory arguments:

-a <string> Name of an annotation file. GTF/GFF format by default. See

-F option for more format information. Inbuilt annotations

(SAF format) is available in 'annotation' directory of the

package. Gzipped file is also accepted.

-o <string> Name of output file including read counts. A separate file

including summary statistics of counting results is also

included in the output ('<string>.summary'). Both files

are in tab delimited format.

input_file1 [input_file2] ... A list of SAM or BAM format files. They can be

either name or location sorted. If no files provided,

<stdin> input is expected. Location-sorted paired-end reads

are automatically sorted by read names.

## Optional arguments:

# Annotation

-F <string> Specify format of the provided annotation file. Acceptable

formats include 'GTF' (or compatible GFF format) and

'SAF'. 'GTF' by default. For SAF format, please refer to

Users Guide.

-t <string> Specify feature type(s) in a GTF annotation. If multiple

types are provided, they should be separated by ',' with

no space in between. 'exon' by default. Rows in the

annotation with a matched feature will be extracted and

used for read mapping.

-g <string> Specify attribute type in GTF annotation. 'gene_id' by

default. Meta-features used for read counting will be

extracted from annotation using the provided value.

--extraAttributes Extract extra attribute types from the provided GTF

annotation and include them in the counting output. These

attribute types will not be used to group features. If

more than one attribute type is provided they should be

separated by comma.

-A <string> Provide a chromosome name alias file to match chr names in

annotation with those in the reads. This should be a two-

column comma-delimited text file. Its first column should

include chr names in the annotation and its second column

should include chr names in the reads. Chr names are case

sensitive. No column header should be included in the

file.

# Level of summarization

-f Perform read counting at feature level (eg. counting

reads for exons rather than genes).

# Overlap between reads and features

-O Assign reads to all their overlapping meta-features (or

features if -f is specified).

--minOverlap <int> Minimum number of overlapping bases in a read that is

required for read assignment. 1 by default. Number of

overlapping bases is counted from both reads if paired

end. If a negative value is provided, then a gap of up

to specified size will be allowed between read and the

feature that the read is assigned to.

--fracOverlap <float> Minimum fraction of overlapping bases in a read that is

required for read assignment. Value should be within range

[0,1]. 0 by default. Number of overlapping bases is

counted from both reads if paired end. Both this option

and '--minOverlap' option need to be satisfied for read

assignment.

--fracOverlapFeature <float> Minimum fraction of overlapping bases in a

feature that is required for read assignment. Value

should be within range [0,1]. 0 by default.

--largestOverlap Assign reads to a meta-feature/feature that has the

largest number of overlapping bases.

--nonOverlap <int> Maximum number of non-overlapping bases in a read (or a

read pair) that is allowed when being assigned to a

feature. No limit is set by default.

--nonOverlapFeature <int> Maximum number of non-overlapping bases in a feature

that is allowed in read assignment. No limit is set by

default.

--readExtension5 <int> Reads are extended upstream by <int> bases from their

5' end.

--readExtension3 <int> Reads are extended upstream by <int> bases from their

3' end.

--read2pos <5:3> Reduce reads to their 5' most base or 3' most base. Read

counting is then performed based on the single base the

read is reduced to.

# Multi-mapping reads

-M Multi-mapping reads will also be counted. For a multi-

mapping read, all its reported alignments will be

counted. The 'NH' tag in BAM/SAM input is used to detect

multi-mapping reads.

# Fractional counting

--fraction Assign fractional counts to features. This option must

be used together with '-M' or '-O' or both. When '-M' is

specified, each reported alignment from a multi-mapping

read (identified via 'NH' tag) will carry a fractional

count of 1/x, instead of 1 (one), where x is the total

number of alignments reported for the same read. When '-O'

is specified, each overlapping feature will receive a

fractional count of 1/y, where y is the total number of

features overlapping with the read. When both '-M' and

'-O' are specified, each alignment will carry a fractional

count of 1/(x*y).

# Read filtering

-Q <int> The minimum mapping quality score a read must satisfy in

order to be counted. For paired-end reads, at least one

end should satisfy this criteria. 0 by default.

--splitOnly Count split alignments only (ie. alignments with CIGAR

string containing 'N'). An example of split alignments is

exon-spanning reads in RNA-seq data.

--nonSplitOnly If specified, only non-split alignments (CIGAR strings do

not contain letter 'N') will be counted. All the other

alignments will be ignored.

--primary Count primary alignments only. Primary alignments are

identified using bit 0x100 in SAM/BAM FLAG field.

--ignoreDup Ignore duplicate reads in read counting. Duplicate reads

are identified using bit Ox400 in BAM/SAM FLAG field. The

whole read pair is ignored if one of the reads is a

duplicate read for paired end data.

# Strandness

-s <int or string> Perform strand-specific read counting. A single integer

value (applied to all input files) or a string of comma-

separated values (applied to each corresponding input

file) should be provided. Possible values include:

0 (unstranded), 1 (stranded) and 2 (reversely stranded).

Default value is 0 (ie. unstranded read counting carried

out for all input files).

# Exon-exon junctions

-J Count number of reads supporting each exon-exon junction.

Junctions were identified from those exon-spanning reads

in the input (containing 'N' in CIGAR string). Counting

results are saved to a file named '<output_file>.jcounts'

-G <string> Provide the name of a FASTA-format file that contains the

reference sequences used in read mapping that produced the

provided SAM/BAM files. This optional argument can be used

with '-J' option to improve read counting for junctions.

# Parameters specific to paired end reads

-p If specified, libraries are assumed to contain paired-end

reads. For any library that contains paired-end reads, the

'countReadPairs' parameter controls if read pairs or reads

should be counted.

--countReadPairs If specified, fragments (or templates) will be counted

instead of reads. This option is only applicable for

paired-end reads. For single-end data, it is ignored.

-B Only count read pairs that have both ends aligned.

-P Check validity of paired-end distance when counting read

pairs. Use -d and -D to set thresholds.

-d <int> Minimum fragment/template length, 50 by default.

-D <int> Maximum fragment/template length, 600 by default.

-C Do not count read pairs that have their two ends mapping

to different chromosomes or mapping to same chromosome

but on different strands.

--donotsort Do not sort reads in BAM/SAM input. Note that reads from

the same pair are required to be located next to each

other in the input.

# Number of CPU threads

-T <int> Number of the threads. 1 by default.

# Read groups

--byReadGroup Assign reads by read group. "RG" tag is required to be

present in the input BAM/SAM files.

# Long reads

-L Count long reads such as Nanopore and PacBio reads. Long

read counting can only run in one thread and only reads

(not read-pairs) can be counted. There is no limitation on

the number of 'M' operations allowed in a CIGAR string in

long read counting.

# Assignment results for each read

-R <format> Output detailed assignment results for each read or read-

pair. Results are saved to a file that is in one of the

following formats: CORE, SAM and BAM. See Users Guide for

more info about these formats.

--Rpath <string> Specify a directory to save the detailed assignment

results. If unspecified, the directory where counting

results are saved is used.

# Miscellaneous

--tmpDir <string> Directory under which intermediate files are saved (later

removed). By default, intermediate files will be saved to

the directory specified in '-o' argument.

--maxMOp <int> Maximum number of 'M' operations allowed in a CIGAR

string. 10 by default. Both 'X' and '=' are treated as 'M'

and adjacent 'M' operations are merged in the CIGAR

string.

--verbose Output verbose information for debugging, such as un-

matched chromosome/contig names.

-v Output version of the program.

网友评论

走进转录组

本文标题：featureCounts

本文链接：https://www.haomeiwen.com/subject/zpgmrrtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

featureCounts

参考：

####安装

####输入文件

######使用

####输出结果解读

###MultiQC分析featureCounts结果

###########own

###########帮助文档

相关文章

RNA-seq上游分析的最后一站：featureCounts

featureCounts帮助文档

featureCounts

RBP AS生信流程（五）RBP定量和AS事件相关性

RNA-seq上游分析（3）-使用featureCounts对b

snakemake 转录组上游pipeline

Merge counts data of run-feature

R脚本--FeatureCounts计算reads counts

从原始的FASTQ生成TPM

1.8featureCounts

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

走进转录组