美文网首页Bioconductor学习
欧剑虹老师BOOK学习记录:第一章 R/Bioconductor

欧剑虹老师BOOK学习记录:第一章 R/Bioconductor

作者: 热衷组培的二货潜 | 来源:发表于2019-05-04 14:28 被阅读0次

    首先最重要的参考链接:

    第一章 R/Bioconductor入门

    image.png

    啊啊啊,如有侵犯版权, 麻烦请私信我,看到立马删除!

    主要用来记录自己可能要用的一些知识点。(基本复制粘贴,建议直达链接)

    欧剑虹老师BOOK学习记录:第一章 R/Bioconductor入门(1)

    生物字符串 Biological strings

    • 生物字符串的常见操作比如求互补序列反向序列反向互补序列翻译转录逆转录碱基频率统计序列比对等。
    # if (!requireNamespace("BiocManager", quietly = TRUE))
    #    install.packages("BiocManager")
    # BiocManager::install("Biostrings")
    library(Biostrings) ## 加载包
    dna<-DNAString("TCTCCCAACCCTTGTACCAGTATAAATCGT")
    
    # DNA反向序列
    > reverse(dna) 
      30-letter "DNAString" instance
    seq: TGCTAAATATGACCATGTTCCCAACCCTCT
    
    # DNA转换成RNA
    > RNAString(dna)
      30-letter "RNAString" instance
    seq: UCUCCCAACCCUUGUACCAGUAUAAAUCGU
    
    # DNA的互补序列
    > complement(dna)
      30-letter "DNAString" instance
    seq: AGAGGGTTGGGAACATGGTCATATTTAGCA
    
    # DNA反向互补序列
    > reverseComplement(dna)
      30-letter "DNAString" instance
    seq: ACGATTTATACTGGTACAAGGGTTGGGAGA
    
    # DNA转录
    > rna<-RNAString(complement(dna)) ## 转录,注意它与dna2rna的不同
    > rna
      30-letter "RNAString" instance
    seq: AGAGGGUUGGGAACAUGGUCAUAUUUAGCA
       # TCTCCCAACCCTTGTACCAGTATAAATCGT (为了方便对比,把DNA序列放在这里)
       # UCUCCCAACCCUUGUACCAGUAUAAAUCGU  (dna2rna)
    
    # RNA转换成DNA
    > DNAString(rna)
      30-letter "DNAString" instance
    seq: AGAGGGTTGGGAACATGGTCATATTTAGCA
    
    # 逆转录为cDNA
    > DNAString(complement(rna)) 
      30-letter "DNAString" instance
    seq: TCTCCCAACCCTTGTACCAGTATAAATCGT
    
    # 将RNA转换成密码子
    > codons(rna) 
      Views on a 30-letter RNAString subject
    subject: AGAGGGUUGGGAACAUGGUCAUAUUUAGCA
    views:
         start end width
     [1]     1   3     3 [AGA]
     [2]     4   6     3 [GGG]
     [3]     7   9     3 [UUG]
     [4]    10  12     3 [GGA]
     [5]    13  15     3 [ACA]
     [6]    16  18     3 [UGG]
     [7]    19  21     3 [UCA]
     [8]    22  24     3 [UAU]
     [9]    25  27     3 [UUA]
    [10]    28  30     3 [GCA]
    
    # 将密码子翻译成氨基酸
    > translate(rna) 
      10-letter "AAString" instance
    seq: RGLGTWSYLA
    
    # 碱基频数的统计
    > alphabetFrequency(dna)
     A  C  G  T  M  R  W  S  Y  K  V  H  D  B  N  -  +  . 
     8 10  3  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
    
    # 查看是否只含有四种碱基
    > hasOnlyBaseLetters(dna)
    [1] TRUE
    
    # 唯一的碱基
    > uniqueLetters(dna)
    [1] "A" "C" "G" "T"
    
    # 自定义统计碱基比例,如常见的CG频数
    > letterFrequency(dna, letters="CG")
    C|G 
     13 
    
    # 计算GC频率
    > GC_content <- letterFrequency(dna, letters="CG")/letterFrequency(dna, letters="ACGT"); GC_content
          C|G 
    0.4333333 
    
    # 搜索自定义的碱基序列
    > TATA <- "TATA"
    > (mT <- matchPattern(TATA, dna))
      Views on a 30-letter DNAString subject
    subject: TCTCCCAACCCTTGTACCAGTATAAATCGT
    views:
        start end width
    [1]    21  24     4 [TATA]
    
    # 进行序列比对
    ## 蛋白质序列的比对
    > aa1<-AAString("HXBLVYMGCHFDCXVBEHIKQZ")
    > aa2<-AAString("QRNYMYCFQCISGNEYKQN")
    # 全局比对
    > pairwiseAlignment(aa1, aa2, substitutionMatrix="BLOSUM62", type="global") 
    Global PairwiseAlignmentsSingleSubject (1 of 1)
    pattern: HXBLVYMGCHFDCXVBEHIKQZ
    subject: QRN--YMYC-FQCISGNEYKQN
    score: 9 
    # 局部比对
    > pairwiseAlignment(aa1, aa2, substitutionMatrix="BLOSUM62", type="local")
    Local PairwiseAlignmentsSingleSubject (1 of 1)
    pattern: [6] YMGCHFDCXVBEHIKQ
    subject: [4] YMYC-FQCISGNEYKQ
    score: 24 
    
    ## DNA序列的比对
    > s1 <- 
    +     DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
    > s2 <-
    +     DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC")
    ## 指定罚分
    > mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly = TRUE)
    > mat
       A  C  G  T
    A  1 -3 -3 -3
    C -3  1 -3 -3
    G -3 -3  1 -3
    T -3 -3 -3  1
    > pairwiseAlignment(s1, s2, substitutionMatrix = mat,
    +                       gapOpening = -5, gapExtension = -2)
    Global PairwiseAlignmentsSingleSubject (1 of 1)
    pattern: ACTTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAG
    subject: GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC
    score: -52 
    
    > pairwiseAlignment(s1, s2, type = "local", substitutionMatrix = mat,
    +                       gapOpening = -5, gapExtension = -2)
    Local PairwiseAlignmentsSingleSubject (1 of 1)
    pattern: [20] GGTAAGT
    subject: [20] GGTAAGT
    score: 7
    
    > pairwiseAlignment(s1, s2, type = "overlap", substitutionMatrix = mat,
    +                       gapOpening = -5, gapExtension = -2)
    Overlap PairwiseAlignmentsSingleSubject (1 of 1)
    pattern: [54] G
    subject:  [1] G
    score: 1 
    

    相关文章

      网友评论

        本文标题:欧剑虹老师BOOK学习记录:第一章 R/Bioconductor

        本文链接:https://www.haomeiwen.com/subject/dquxoqtx.html