美文网首页
NGSplot database 构建

NGSplot database 构建

作者: 余绕 | 来源:发表于2022-03-23 20:20 被阅读0次

    Add ./bin to $PATH, then makedir a new folder, then run:

    genDB.sh db_list.txt [animal|plant|bacteria].json

    1. Modify plant.json

    First find the ftp links of ensemble plant for the gtf files, such as:

    http://ftp.ebi.ac.uk/ensemblgenomes/pub/release-52/plants/gtf/zea_mays/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.chr.gtf.gz
    

    Then modify the <plant.json> file to as follows:

    The provided ftp links in the file should be modified before use, cause the FTP links of ensemble has changed a little. But the provided link was old version.
    Only modified the FTP links, keep other fields unchanged.

    ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-*VERSION*/plants/gtf/*SPECIES*/*SPECIES_NAME*.*GENOME*.*VERSION*.chr.gtf.gz
    

    2. Modify the db_list_plant.txt (tab sperated)

    B73   Zm-B73-REFERENCE-NAM-5.0     zea_mays    52
    
    #B73 is the genome name used for data analysis. 
    #zay_mays is  *SPECIES*/*SPECIES_NAME* in json file
    #52 is *VERSION* in json file
    

    3. Create the genome library

    genDB.sh ./data/db_list_plant.txt   ./data/json/plant.json
    
    ### following was the the running code generated by above command:
    B73
    Processing ensembl annotation...
    /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/ensembl.db.gen.sh B73 Zm-B73-REFERENCE-NAM-5.0 zea_mays       52 ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-*VERSION*/plants/gtf/*SPECIES*/*SPECIES_NAME*.*GENOME*.*VERSION*.chr.gtf.gz ./data/json/plant.json 0 ./data/db_list_plant.txt /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin ./output
    Zea_mays
    wget -q -O ./output/tmp/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.gtf.gz ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/release-52/plants/gtf/zea_mays/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.chr.gtf.gz
    perl -I /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/gtf2txt_plot.pl ensembl     ./output/tmp/B73.ensembl.gtf    ./output/tmp/B73.ensembl.txt
    /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/filter_ensembl.py ./output/tmp/B73.ensembl.gtf    ./output/tmp/B73.ensembl.txt  ./output/tmp/B73.ensembl.biotype.txt B73
    Rscript /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/genDB.R B73 ensembl   ./output/tmp/B73.ensembl.biotype.txt
    Loading required package: BiocGenerics
    Loading required package: parallel
    
    Attaching package: ?.iocGenerics?
    
    The following objects are masked from ?.ackage:parallel?.
    
        clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
        clusterExport, clusterMap, parApply, parCapply, parLapply,
        parLapplyLB, parRapply, parSapply, parSapplyLB
    
    The following objects are masked from ?.ackage:stats?.
    
        IQR, mad, sd, var, xtabs
    
    The following objects are masked from ?.ackage:base?.
    
        anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
        colnames, colSums, dirname, do.call, duplicated, eval, evalq,
        Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
        lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
        pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
        rowSums, sapply, setdiff, sort, table, tapply, union, unique,
        unsplit, which, which.max, which.min
    
    Loading required package: S4Vectors
    Loading required package: stats4
    
    Attaching package: ?.4Vectors?
    
    The following object is masked from ?.ackage:base?.
    
        expand.grid
    
    Loading required package: foreach
    Processing refseq annotation...
    mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e 'select * from refGene;' B73  > ./output/tmp/B73.refseq.gp
    /public/home/qtxu/Practice/ngsplot_database/ngsplotdb-develop/bin/refseq.db.gen.sh: line 8: mysql: command not found
    No B73 genome annotation in UCSC!
    mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N  -e select chrom, chromStart as start, chromEnd as end, name from cpgIslandExt; B73 > ./output/tmp/B73.cgi.gp
    bash: line 34: mysql: command not found
    No CGI annotation of B73 genome in UCSC! Skip the CGI annotation step!
    B73/
    B73/B73.ensembl.exon.variant.misc.RData
    B73/B73.ensembl.exon.altBoth.misc.RData
    B73/B73.ensembl.exon.altBoth.protein_coding.RData
    B73/B73.ensembl.exon.variant.protein_coding.RData
    B73/B73.ensembl.exon.promoter.misc.RData
    B73/B73.ensembl.exon.polyA.protein_coding.RData
    B73/B73.ensembl.exon.canonical.misc.RData
    B73/B73.ensembl.exon.promoter.protein_coding.RData
    B73/B73.ensembl.exon.altAcceptor.misc.RData
    B73/.chrnames.ensembl
    B73/B73.ensembl.exonmodel.RData
    B73/B73.ensembl.exon.altDonor.protein_coding.RData
    B73/B73.ensembl.genebody.misc.RData
    B73/B73.ensembl.genebody.protein_coding.RData
    B73/B73.ensembl.exon.polyA.misc.RData
    B73/B73.ensembl.exon.altDonor.misc.RData
    B73/B73.ensembl.exon.canonical.protein_coding.RData
    B73/B73.ensembl.exon.altAcceptor.protein_coding.RData
    B73/.metainfo
    

    Install the genomes

    The generated genome was in ./output/database.

    ngsplotdb.py install  ngsplotdb_B73_52_3.00.tar.gz
    
    list the installed genomes
    ngsplotdb.py list 
    
    
    ID       Assembly                 Species              EnsVer   NPVer    InstalledFeatures            
    AGPv3    AGPv3                    zea_mays             21.0     3.0      exon,genebody,tss,tes        
    B73      Zm-B73-REFERENCE-NAM-5.0 zea_mays             52.0     3.0      exon,genebody,tss,tes        
    hg19     GRCh37                   homo_sapiens         75.0     3.0      cgi,exon,genebody,tss,tes    
    IRGSP-1  IRGSP-1.0                oryza_sativa         21.0     3.0      exon,genebody,tss,tes        
    MSUv7    MSUv7                    oryza_sativa         0.0      3.0      exon,genebody,tss,tes        
    Tair10   TAIR10                   arabidopsis_thaliana 21.0     3.0      exon,genebody,tss,tes 
    

    If you want to remove a genome just use the following command:

    ngsplotdb.py    remove   hg19     
    
    示例:安装RAPDB 基因组
    1. modify the db_list_plant.txt as follows:
    Rapdb   IRGSP-1.0   oryza_sativa    52
    
    1. Create the genome files
    genDB.sh ./data/db_list_plant.txt   ./data/json/plant.json
    # This command should be run in non-conda environment, as the conda condition reportes errors.
    
    1. Install the genome files
    ngsplotdb.py install  ngsplotdb_Rapdb_52_3.00.tar.gz
    
    1. Running example
    #BSUB -J blast
    #BSUB -n 10
    #BSUB -R span[hosts=1]
    #BSUB -o %J.out
    #BSUB -e %J.err
    #BSUB -q normal
    
    cd /public/home/qtxu/Data/Chenxiaoyang/Ribo_seq_Redox/rRNA_tRNA_removed_clean_data/align/Metaplot_by_NGSplot
    
    ngs.plot.r -G  Rapdb   -R genebody -C config.example.txt  -O  H4_K5_rep1_dup_rm_genebody  -T H4K5  -FL 300
    
    1. Results:
      This is a test example using ribo-seq data.
    image.png

    Ref to :GitHub - shenlab-sinai/ngsplotdb: Genome databases generation pipeline for ngs.plot and region_analysis.

    相关文章

      网友评论

          本文标题:NGSplot database 构建

          本文链接:https://www.haomeiwen.com/subject/pvkhjrtx.html