书接上文,我们已经学会了如何利用count matrix数据来构建DESeqDataSet,今天我们来学习另一种数据输入的构建方法htseq-count input
Htseq-count input
先介绍一下什么是HTSeq,它是一个Python包用来对测序数据进行分析。
1.Getting statistical summaries about the base-call quality scores to study the data quality.
2.Calculating a coverage vector and exporting it for visualization in a genome browser.
3.Reading in annotation data from a GFF file.
4.Assigning aligned reads from an RNA-Seq experiments to exons and genes.
Analyzing RNA-seq data with DESeq2(一)
Analyzing RNA-seq data with DESeq2(二)
Analyzing RNA-seq data with DESeq2(三)
Analyzing RNA-seq data with DESeq2(四)
Analyzing RNA-seq data with DESeq2(五)
directory <- "/path/to/your/files/"
directory <- system.file("extdata", package="pasilla",
mustWork=TRUE)
sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleCondition <- sub("(.*treated).*","\\1",sampleFiles)
sampleTable <- data.frame(sampleName = sampleFiles,
fileName = sampleFiles,
condition = sampleCondition)
sampleTable$condition <- factor(sampleTable$condition)
library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = directory,
design= ~ condition)
ddsHTSeq
## class: DESeqDataSet
## dim: 70463 7
## metadata(1): version
## assays(1): counts
## rownames(70463): FBgn0000003:001 FBgn0000008:001 ... FBgn0261575:001
## FBgn0261575:002
## rowData names(0):
## colnames(7): treated1fb.txt treated2fb.txt ... untreated3fb.txt
## untreated4fb.txt
## colData names(1): condition
看看原始数据是什么样子呢?
> head(ddsHTSeq@assays@data$counts)
treated1fb.txt treated2fb.txt treated3fb.txt untreated1fb.txt untreated2fb.txt untreated3fb.txt untreated4fb.txt
FBgn0000003:001 0 0 1 0 0 0 0
FBgn0000008:001 0 0 0 0 0 0 0
FBgn0000008:002 0 0 0 0 0 1 0
FBgn0000008:003 0 1 0 1 1 1 0
FBgn0000008:004 1 0 1 0 1 0 1
FBgn0000008:005 4 1 1 2 2 0 1
到现在两种常用输入数据形式已经学习完了,接下来就是对数据进行处理了哦。
对了有时间可以学习一下HTSeq这个Python包,感觉很强大的样子呀。
网友评论