美文网首页
提取Genebank文件的检索号和碱基序列

提取Genebank文件的检索号和碱基序列

作者: lizg | 来源:发表于2019-01-23 13:53 被阅读16次

    1.在NCBIGenebank子库nucletide下检索gene:IL10,下载Genebank格式的文件,命名为IL10_Genebank:

    LOCUS       DQ977084                1925 bp    DNA     linear   PRI 14-JUL-2016
    DEFINITION  Macaca nemestrina IL10 (IL10) gene, partial cds.
    ACCESSION   DQ977084
    VERSION     DQ977084.1
    KEYWORDS    .
    SOURCE      Macaca nemestrina (pig-tailed macaque)
      ORGANISM  Macaca nemestrina
                Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
                Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
                Catarrhini; Cercopithecidae; Cercopithecinae; Macaca.
    REFERENCE   1  (bases 1 to 1925)
      AUTHORS   Nickel,G.C., Tefft,D.L., Goglin,K. and Adams,M.D.
      TITLE     An empirical test for branch-specific positive selection
      JOURNAL   Genetics 179 (4), 2183-2193 (2008)
       PUBMED   18689901
    REFERENCE   2  (bases 1 to 1925)
      AUTHORS   Nickel,G.C., Tefft,D.L., Trevarthen,K., Funt,J. and Adams,M.D.
      TITLE     Positive Selection in Transcription Factor Genes on the Human
                Lineage
      JOURNAL   Unpublished
    REFERENCE   3  (bases 1 to 1925)
      AUTHORS   Nickel,G.C., Tefft,D.L., Trevarthen,K., Funt,J. and Adams,M.D.
      TITLE     Direct Submission
      JOURNAL   Submitted (31-AUG-2006) Dept. of Genetics, Case Western Reserve
                University, 10900 Euclid Ave, Cleveland, OH 44106, USA
    FEATURES             Location/Qualifiers
         source          1..1925
                         /organism="Macaca nemestrina"
                         /mol_type="genomic DNA"
                         /db_xref="taxon:9545"
         gene            <347..>1831
                         /gene="IL10"
         mRNA            <347..>511
                         /gene="IL10"
                         /product="IL10"
         CDS             347..>511
                         /gene="IL10"
                         /codon_start=1
                         /product="IL10"
                         /protein_id="ABM88029.1"
                         /translation="MHSSALLCCLVLLTGVRASPGQGTQSENSCTRFPGNLPHMLRDL
                         RDAFSRVKTFF"
         exon            <347..511
                         /gene="IL10"
                         /number=1
         gap             628..727
                         /estimated_length=unknown
         mRNA            join(<955..1020,1739..>1831)
                         /gene="IL10"
                         /product="IL10"
         CDS             join(<955..1020,1739..1831)
                         /gene="IL10"
                         /codon_start=1
                         /product="IL10"
                         /protein_id="ABM88030.1"
                         /translation="HRFLPCENKSKAVEQVKNAFSKLQEKGVYKAMSEFDIFINYIEA
                         YMTMKIQN"
         exon            955..1020
                         /gene="IL10"
                         /number=4
         gap             1293..1392
                         /estimated_length=unknown
         exon            1739..>1831
                         /gene="IL10"
                         /number=5
    ORIGIN      
            1 catgagctgt tctccccagg aaatcaactt tttttaattg agaagctaaa aaattattct
           61 aagagaggta gcccatccta aaaatagctg tgcagaagtt catgttcaac caatcctttt
          121 tgcttacgat gcaaaatttg aaaactaagt ttattagaga ggttagagaa ggaggagctc
          181 taagcagaaa aaatcctgtg ccgggaaacc tgtgattgtg gctttttatg aatgaagagg
          241 cctccctgag cttacaatat aaaaggggga cagagaggtg aaggtctaca catcaggggc
          301 ttgctcttgc aaaaccaaac cacaagacag acttgcaaaa gaaggcatgc acagctcagc
          361 actgctctgt tgcctagtcc tcctgactgg ggtgagggcc agcccaggcc agggcaccca
          421 gtctgagaac agctgcaccc gcttcccagg caacctgcct cacatgcttc gagacctccg
          481 agatgccttc agcagagtga agactttctt tgtgagtatg attccctcct gtgctttctc
          541 tcttcctggg actgcctgaa ctaggcattt tcctggagct ataagaagaa ccctcctcct
          601 gtgcctccac ttccatcccc aacacctnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
          661 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
          721 nnnnnnntcg gagtgggtcc tggagaaata cattttatct cccagggccg tggttcttct
          781 ctgacctttg gatagttagt aagggtgaag cagggctcag ttctctctgg gagctgtgag
          841 gcgaggcatt tggataaatc tagcaccctc atgatgccac cagcttgtcc cccaagtgtg
          901 atggacatgg agctgggagc cgggatcacc aacactttct cttttcttcc acagcatcga
          961 tttcttccct gtgaaaacaa aagcaaggcc gtggagcagg tgaagaatgc ctttagtaag
         1021 gtgagcttgg atggtggcag agagggtctg cagagcacag cccatgccca ctccccaacc
         1081 ccaaagcgtg gaaggtggtg aggactcagt aggccccatc cttcattgga aggagtgtgg
         1141 gaacctgaca gatggtatga cctgctcagc cagtgaggag ctgccgcctt gattgtattt
         1201 gttttctgtt aagtgtcttt gggggtttct aaatgactgc tcgctgcctt tgcaggcttg
         1261 cgggttaggc tggccggcca gcctgtgaac acnnnnnnnn nnnnnnnnnn nnnnnnnnnn
         1321 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
         1381 nnnnnnnnnn nngctttcaa agtgcttcct ctaatgtctt ttcatcacac tctgcataat
         1441 catcatgtga atacgtgacc tttaaaattg ttgaaaaggc atcattttga agacagcgct
         1501 ttgcaaaatg aatgctccct ttgctaggca gtagccgtac ttcaggcctg gaggagatga
         1561 aggtcaatgc actgcctttc ccaaggcagc tgggcctatc ctctggttca cttcccagcg
         1621 tgagggagaa taagcagcct ctgcactcaa ggtcatgccc atccatgagc atgggaaagg
         1681 ggagcctatt tcgtccccag aagggattta actgaatgtt tcttatctct ctgcacagct
         1741 ccaagagaaa ggcgtctaca aagccatgag tgagtttgac atcttcatca actacataga
         1801 agcctacatg acaatgaaga tacaaaactg agacatcagg gtggcgactc tatagactct
         1861 aggacataaa ttggaggtct ccaaaatcag atccagggtt ctgggatacc tgacccagcc
         1921 ccttg
    //
    

    2.python脚本;

    # 提取基因的检索号和碱基序列
    input_file = open('IL10_Genebank.gb','r')# 读取Genebank文件
    output_file = open('IL10.fasta','w')
    flag=0
    for line in input_file:
        if line[0:9]=='ACCESSION':
            AC=line.split()[1].strip()
            output_file.write('>'+AC+'\n')
        elif line[0:6]=='ORIGIN':
            flag=1
        elif flag==1:
            fields=line.split()# 以空格为分界,将line转换为list
            if fields!=[]:
                seq=''.join(fields[1:])#将list组装为字符串
                output_file.write(seq.upper()+'\n')
    input_file.close()
    output_file.close()
    

    3.输出结果

    >DQ977084
    CATGAGCTGTTCTCCCCAGGAAATCAACTTTTTTTAATTGAGAAGCTAAAAAATTATTCT
    AAGAGAGGTAGCCCATCCTAAAAATAGCTGTGCAGAAGTTCATGTTCAACCAATCCTTTT
    TGCTTACGATGCAAAATTTGAAAACTAAGTTTATTAGAGAGGTTAGAGAAGGAGGAGCTC
    TAAGCAGAAAAAATCCTGTGCCGGGAAACCTGTGATTGTGGCTTTTTATGAATGAAGAGG
    CCTCCCTGAGCTTACAATATAAAAGGGGGACAGAGAGGTGAAGGTCTACACATCAGGGGC
    TTGCTCTTGCAAAACCAAACCACAAGACAGACTTGCAAAAGAAGGCATGCACAGCTCAGC
    ACTGCTCTGTTGCCTAGTCCTCCTGACTGGGGTGAGGGCCAGCCCAGGCCAGGGCACCCA
    GTCTGAGAACAGCTGCACCCGCTTCCCAGGCAACCTGCCTCACATGCTTCGAGACCTCCG
    AGATGCCTTCAGCAGAGTGAAGACTTTCTTTGTGAGTATGATTCCCTCCTGTGCTTTCTC
    TCTTCCTGGGACTGCCTGAACTAGGCATTTTCCTGGAGCTATAAGAAGAACCCTCCTCCT
    GTGCCTCCACTTCCATCCCCAACACCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNTCGGAGTGGGTCCTGGAGAAATACATTTTATCTCCCAGGGCCGTGGTTCTTCT
    CTGACCTTTGGATAGTTAGTAAGGGTGAAGCAGGGCTCAGTTCTCTCTGGGAGCTGTGAG
    GCGAGGCATTTGGATAAATCTAGCACCCTCATGATGCCACCAGCTTGTCCCCCAAGTGTG
    ATGGACATGGAGCTGGGAGCCGGGATCACCAACACTTTCTCTTTTCTTCCACAGCATCGA
    TTTCTTCCCTGTGAAAACAAAAGCAAGGCCGTGGAGCAGGTGAAGAATGCCTTTAGTAAG
    GTGAGCTTGGATGGTGGCAGAGAGGGTCTGCAGAGCACAGCCCATGCCCACTCCCCAACC
    CCAAAGCGTGGAAGGTGGTGAGGACTCAGTAGGCCCCATCCTTCATTGGAAGGAGTGTGG
    GAACCTGACAGATGGTATGACCTGCTCAGCCAGTGAGGAGCTGCCGCCTTGATTGTATTT
    GTTTTCTGTTAAGTGTCTTTGGGGGTTTCTAAATGACTGCTCGCTGCCTTTGCAGGCTTG
    CGGGTTAGGCTGGCCGGCCAGCCTGTGAACACNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNGCTTTCAAAGTGCTTCCTCTAATGTCTTTTCATCACACTCTGCATAAT
    CATCATGTGAATACGTGACCTTTAAAATTGTTGAAAAGGCATCATTTTGAAGACAGCGCT
    TTGCAAAATGAATGCTCCCTTTGCTAGGCAGTAGCCGTACTTCAGGCCTGGAGGAGATGA
    AGGTCAATGCACTGCCTTTCCCAAGGCAGCTGGGCCTATCCTCTGGTTCACTTCCCAGCG
    TGAGGGAGAATAAGCAGCCTCTGCACTCAAGGTCATGCCCATCCATGAGCATGGGAAAGG
    GGAGCCTATTTCGTCCCCAGAAGGGATTTAACTGAATGTTTCTTATCTCTCTGCACAGCT
    CCAAGAGAAAGGCGTCTACAAAGCCATGAGTGAGTTTGACATCTTCATCAACTACATAGA
    AGCCTACATGACAATGAAGATACAAAACTGAGACATCAGGGTGGCGACTCTATAGACTCT
    AGGACATAAATTGGAGGTCTCCAAAATCAGATCCAGGGTTCTGGGATACCTGACCCAGCC
    CCTTG
    

    相关文章

      网友评论

          本文标题:提取Genebank文件的检索号和碱基序列

          本文链接:https://www.haomeiwen.com/subject/bnmjjqtx.html