美文网首页
生信流程搭建(14)家鸡的参考基因下载与注释文件

生信流程搭建(14)家鸡的参考基因下载与注释文件

作者: Geekero | 来源:发表于2020-07-28 13:15 被阅读0次

生信流程搭建(13)拟南芥参考基因下载与注释文件配置方法类似

了解原鸡的拉丁名:

部分常见物种拉丁名中文名对照直到家鸡的拉丁名为Gallus gallus

到Ensembl数据库下载

动物参考基因组:http://asia.ensembl.org/index.html
植物参考基因组:http://plants.ensembl.org/index.html
其他真菌细菌等参考基因组:http://ensemblgenomes.org/

然后在这里找对应的家鸡名字:发现没那么复杂,其实就叫chicken


点击然后进到页面

再选择版本
一般都选择toplevel
然后迅雷下载,一般来说,充了会员会快一些

其实也不一定,我就没充会员


下载gtf注释文件

稍微更改一下地址:


下载红框那个即可

用Xftp将文件传送到服务器上

解压

gzip -d Gallus_gallus.GRCg6a.98.gtf.gz
gzip -d Gallus_gallus.GRCg6a.dna.toplevel.fa.gz

查看下gtf文件内容

=======================================================================

以下是构建10X单细胞pipline所需reference的过程,bulk测序的可以忽略以下内容

cellranger 检查并生成指定用于10X pipiline的gtf文件

$cellranger mkgtf Gallus_gallus.GRCg6a.98.gtf Gallus_gallus.GRCg6a.98_new.gtf
/opt/biosoft/cellranger-expression/cellranger-cs/3.1.0/bin
cellranger mkgtf (3.1.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Writing new genes GTF file (may take 10 minutes for a 1GB input GTF file)...
...done

为了后面分析流程的需要,在线粒体基因上加上"Mt"标记

需要自己写个Perl或者Python小脚本

python ../add_mt_marker.py Gallus_gallus.GRCg6a.98_new.gtf Gallus_gallus.GRCg6a.98_new2.gtf
mv Gallus_gallus.GRCg6a.98_new2.gtf Gallus_gallus.GRCg6a.98.gtf
less -S Gallus_gallus.GRCg6a.98.gtf

cellranger 检查并生成指定用于10X pipiline的reference

$cellranger mkref --genome=chicken --fasta=Gallus_gallus.GRCg6a.dna.toplevel.fa --genes=Gallus_gallus.GRCg6a.98.gtf
/opt/biosoft/cellranger-expression/cellranger-cs/3.1.0/bin
cellranger mkref (3.1.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Creating new reference folder at /share/nas1/Data/Users/luohb/Data/Reference/chicken/chicken
...done

Writing genome FASTA file into reference folder...
...done

Computing hash of genome FASTA file...
...done

Indexing genome FASTA file...
...done

Writing genes GTF file into reference folder...
...done

Computing hash of genes GTF file...
...done

Writing genes index file into reference folder (may take over 10 minutes for a 3Gb genome)...
...done

Writing genome metadata JSON file into reference folder...
...done

Generating STAR genome index (may take over 8 core hours for a 3Gb genome)...
Jan 15 18:01:55 ..... Started STAR run
Jan 15 18:01:55 ... Starting to generate Genome files
Jan 15 18:02:55 ... starting to sort  Suffix Array. This may take a long time...
Jan 15 18:02:59 ... sorting Suffix Array chunks and saving them to disk...
Jan 15 18:42:16 ... loading chunks from disk, packing SA...
Jan 15 18:42:51 ... Finished generating suffix array
Jan 15 18:42:51 ... Generating Suffix Array index
Jan 15 18:45:49 ... Completed Suffix Array index
Jan 15 18:45:49 ..... Processing annotations GTF
Jan 15 18:45:55 ..... Inserting junctions into the genome indices
Jan 15 18:52:03 ... writing Genome to disk ...
Jan 15 18:52:04 ... writing Suffix Array to disk ...
Jan 15 18:52:13 ... writing SAindex to disk
Jan 15 18:52:14 ..... Finished successfully
...done.

>>> Reference successfully created! <<<

You can now specify this reference on the command line:
cellranger --transcriptome=/share/nas1/Data/Users/luohb/Data/Reference/chicken/chicken ...

这步有点久= =

新生成的文件目录

$cd chicken/
$tree
.
├── fasta
│   ├── genome.fa
│   └── genome.fa.fai
├── genes
│   └── genes.gtf
├── pickle
│   └── genes.pickle
├── reference.json
└── star
    ├── chrLength.txt
    ├── chrNameLength.txt
    ├── chrName.txt
    ├── chrStart.txt
    ├── exonGeTrInfo.tab
    ├── exonInfo.tab
    ├── geneInfo.tab
    ├── Genome
    ├── genomeParameters.txt
    ├── SA
    ├── SAindex
    ├── sjdbInfo.txt
    ├── sjdbList.fromGTF.out.tab
    ├── sjdbList.out.tab
    └── transcriptInfo.tab

4 directories, 20 files

保存原始的压缩文件,和说明文档。说明文件来源

cd ..
mkdir source
cd source/
vi README.txt

搞掂~

相关文章

网友评论

      本文标题:生信流程搭建(14)家鸡的参考基因下载与注释文件

      本文链接:https://www.haomeiwen.com/subject/twdfrktx.html