Add ./bin to $PATH, then makedir a new folder, then run:
genDB.sh db_list.txt [animal|plant|bacteria].json
1. Modify plant.json
First find the ftp links of ensemble plant for the gtf files, such as:
http://ftp.ebi.ac.uk/ensemblgenomes/pub/release-52/plants/gtf/zea_mays/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.chr.gtf.gz
Then modify the <plant.json> file to as follows:
The provided ftp links in the file should be modified before use, cause the FTP links of ensemble has changed a little. But the provided link was old version.
Only modified the FTP links, keep other fields unchanged.
ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-*VERSION*/plants/gtf/*SPECIES*/*SPECIES_NAME*.*GENOME*.*VERSION*.chr.gtf.gz
2. Modify the db_list_plant.txt (tab sperated)
B73 Zm-B73-REFERENCE-NAM-5.0 zea_mays 52
#B73 is the genome name used for data analysis.
#zay_mays is *SPECIES*/*SPECIES_NAME* in json file
#52 is *VERSION* in json file
3. Create the genome library
genDB.sh ./data/db_list_plant.txt ./data/json/plant.json
### following was the the running code generated by above command:
B73
Processing ensembl annotation...
/public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/ensembl.db.gen.sh B73 Zm-B73-REFERENCE-NAM-5.0 zea_mays 52 ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-*VERSION*/plants/gtf/*SPECIES*/*SPECIES_NAME*.*GENOME*.*VERSION*.chr.gtf.gz ./data/json/plant.json 0 ./data/db_list_plant.txt /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin ./output
Zea_mays
wget -q -O ./output/tmp/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.gtf.gz ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-52/plants/gtf/zea_mays/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.chr.gtf.gz
perl -I /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/gtf2txt_plot.pl ensembl ./output/tmp/B73.ensembl.gtf ./output/tmp/B73.ensembl.txt
/public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/filter_ensembl.py ./output/tmp/B73.ensembl.gtf ./output/tmp/B73.ensembl.txt ./output/tmp/B73.ensembl.biotype.txt B73
Rscript /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/genDB.R B73 ensembl ./output/tmp/B73.ensembl.biotype.txt
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ?.iocGenerics?
The following objects are masked from ?.ackage:parallel?.
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ?.ackage:stats?.
IQR, mad, sd, var, xtabs
The following objects are masked from ?.ackage:base?.
anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
colnames, colSums, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
rowSums, sapply, setdiff, sort, table, tapply, union, unique,
unsplit, which, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ?.4Vectors?
The following object is masked from ?.ackage:base?.
expand.grid
Loading required package: foreach
Processing refseq annotation...
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e 'select * from refGene;' B73 > ./output/tmp/B73.refseq.gp
/public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/refseq.db.gen.sh: line 8: mysql: command not found
No B73 genome annotation in UCSC!
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e select chrom, chromStart as start, chromEnd as end, name from cpgIslandExt; B73 > ./output/tmp/B73.cgi.gp
bash: line 34: mysql: command not found
No CGI annotation of B73 genome in UCSC! Skip the CGI annotation step!
B73/
B73/B73.ensembl.exon.variant.misc.RData
B73/B73.ensembl.exon.altBoth.misc.RData
B73/B73.ensembl.exon.altBoth.protein_coding.RData
B73/B73.ensembl.exon.variant.protein_coding.RData
B73/B73.ensembl.exon.promoter.misc.RData
B73/B73.ensembl.exon.polyA.protein_coding.RData
B73/B73.ensembl.exon.canonical.misc.RData
B73/B73.ensembl.exon.promoter.protein_coding.RData
B73/B73.ensembl.exon.altAcceptor.misc.RData
B73/.chrnames.ensembl
B73/B73.ensembl.exonmodel.RData
B73/B73.ensembl.exon.altDonor.protein_coding.RData
B73/B73.ensembl.genebody.misc.RData
B73/B73.ensembl.genebody.protein_coding.RData
B73/B73.ensembl.exon.polyA.misc.RData
B73/B73.ensembl.exon.altDonor.misc.RData
B73/B73.ensembl.exon.canonical.protein_coding.RData
B73/B73.ensembl.exon.altAcceptor.protein_coding.RData
B73/.metainfo
Install the genomes
The generated genome was in
./output/database
.
ngsplotdb.py install ngsplotdb_B73_52_3.00.tar.gz
list the installed genomes
ngsplotdb.py list
ID Assembly Species EnsVer NPVer InstalledFeatures
AGPv3 AGPv3 zea_mays 21.0 3.0 exon,genebody,tss,tes
B73 Zm-B73-REFERENCE-NAM-5.0 zea_mays 52.0 3.0 exon,genebody,tss,tes
hg19 GRCh37 homo_sapiens 75.0 3.0 cgi,exon,genebody,tss,tes
IRGSP-1 IRGSP-1.0 oryza_sativa 21.0 3.0 exon,genebody,tss,tes
MSUv7 MSUv7 oryza_sativa 0.0 3.0 exon,genebody,tss,tes
Tair10 TAIR10 arabidopsis_thaliana 21.0 3.0 exon,genebody,tss,tes
If you want to remove a genome just use the following command:
ngsplotdb.py remove hg19
示例:安装RAPDB 基因组
- modify the db_list_plant.txt as follows:
Rapdb IRGSP-1.0 oryza_sativa 52
- Create the genome files
genDB.sh ./data/db_list_plant.txt ./data/json/plant.json
# This command should be run in non-conda environment, as the conda condition reportes errors.
- Install the genome files
ngsplotdb.py install ngsplotdb_Rapdb_52_3.00.tar.gz
- Running example
#BSUB -J blast
#BSUB -n 10
#BSUB -R span[hosts=1]
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -q normal
cd /public/home/qtxu/Data/Chenxiaoyang/Ribo_seq_Redox/rRNA_tRNA_removed_clean_data/align/Metaplot_by_NGSplot
ngs.plot.r -G Rapdb -R genebody -C config.example.txt -O H4_K5_rep1_dup_rm_genebody -T H4K5 -FL 300
- Results:
This is a test example using ribo-seq data.
网友评论