miRNA靶基因预测

作者: MYS_bio_man | 来源:发表于2021-06-08 17:08 被阅读0次

有人问miRNA靶基因如何预测的问题,鉴于自己也做过相关的分析,稍有点经验,总结记录一下。

一、miranda预测

  1. 软件安装
$ wget http://cbio.mskcc.org/microrna_data/miRanda-aug2010.tar.gz
$ tar -xvf miRanda-aug2010.tar.gz
$ cd miRanda-3.3a/
$ ./configure --prefix=/Lustre01/user/software/miRanda-3.3a/
$ make && make install
$ cd ***/miRanda-3.3a/bin #你自己的软件路径
$ ./miranda --help #产看帮助信息和参数等等,具体如下

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
miranda v3.3a    microRNA Target Scanning Algorithm
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
(c) 2003 Memorial Sloan-Kettering Cancer Center, New York

Authors: Anton Enright, Bino John, Chris Sander and Debora Marks
(mirnatargets (at) cbio.mskcc.org - reaches all authors)

Software written by: Anton Enright
Distributed for anyone to use under the GNU Public License (GPL),
See the files 'COPYING' and 'LICENSE' for details

If you use this software please cite:
Enright AJ, John B, Gaul U, Tuschl T, Sander C and Marks DS;
(2003) Genome Biology; 5(1):R1.

   miranda comes with ABSOLUTELY NO WARRANTY;
   This is free software, and you are welcome to redistribute it
   under certain conditions; type `miranda --license' for details.

miRanda is an miRNA target scanner which aims to predict mRNA
targets for microRNAs using dynamic-programming alignment and
thermodynamics.

Usage:  miranda file1 file2 [options..]

Where:
    'file1' is a FASTA file with a microRNA query
    'file2' is a FASTA file containing the sequence(s)
    to be scanned.

OPTIONS

 --help -h  Display this message
 --version -v   Display version information
 --license  Display license information

Core algorithm parameters:
 -sc S      Set score threshold to S        [DEFAULT: 140.0]
 -en -E     Set energy threshold to -E kcal/mol [DEFAULT: 1.0]
 -scale Z   Set scaling parameter to Z      [DEFAULT: 4.0]
 -strict    Demand strict 5' seed pairing       [DEFAULT: off]

Alignment parameters:
 -go -X     Set gap-open penalty to -X      [DEFAULT: -4.0]
 -ge -Y     Set gap-extend penalty to -Y        [DEFAULT: -9.0]

General Options:
 -out file  Output results to file          [DEFAULT: off]
 -quiet     Output fewer event notifications    [DEFAULT: off]
 -trim T    Trim reference sequences to T nt    [DEFAULT: off]
 -noenergy  Do not perform thermodynamics       [DEFAULT: off]

 -restrict file Restricts scans to those between
                specific miRNAs and UTRs
                provided in a pairwise
                tab-separated file          [DEFAULT: off]

Other Options:
 -keyval    Key value pairs ??          [DEFAULT:]


This software will be further developed under the open source model,
coordinated by Anton Enright and Chris Sander (miranda (at) cbio.mskcc.org).

Please send bug reports to: miranda (at) cbio.mskcc.org.
  1. 测试example
$ cd ***/miRanda-3.3a/examples #你自己的软件路径
$ ls #查看路径下有两个fastq序列文件
bantam_stRNA.fasta  hid_UTR.fasta
# 我们挨个看一下:[如下,前者是miRNA成熟序列文件(可以是下载的miRNA序列,自己鉴定的miRNA序列等等),后者是待匹配/查找的基因文件(circRNA、lncRNA、PCG等序列文件)。]
$ less -SN bantam_stRNA.fasta
      1 >gi|29565487|emb|AJ550546.1| Drosophila melanogaster microRNA miR-bantam
      2 GTGAGATCATTTTGAAAGCTG
$ less -SN hid_UTR.fasta
      1 >gi|945100|gb|U31226.1|DMU31226 Drosophila melanogaster head involution defective protein (hid) mRNA, complete cds (3'UTR only)
      2 TGACAAAAAATAAAAAACGAAATCCATCGTGAACAGTTTTGTGTTTTTAAATCAGTTCTAAACACGAAAA
      3 GGGTTGATGAAAAACGCAGAAGAATCCGAAAAACTAACTAACCGAGCAAAAACTTGACTTGAGTGTTGTT
      4 TGACAAATCAGGAAAGATAAAAAACAAATCATAAGAAAAAACTGCACGAAAAATGAAAAAGTTTCTAATA
      5 TTCAAAATCTTGCACAAGAAATACAAAATCAATTAAAGTGAACTCTAACCAAAAGTTGTACACAAAATAA
      6 AAAGCAAAACAAAGCAGCGAAGAACAATCACAAGAAGAGCAAAGTGCCAACAAAGTGCAGGAAGGAAGGA
      7 AGCGGATAAGGACAAAAAGGAAGCCAGCACACACACACACACCCACACAATGGCCGTGCCCTTTTATTTG
      8 CCCGAGGGCGGCGCCGATGACGTAGCGTCGAGTTCATCGGGAGCCTCGGGCAACTCCTCCCCCCACAACC
      9 ACCCACTTCCCTCGAGCGCATCCTCGTCCGTCTCCTCCTCGGGCGTGTCCTCGGCCTCCGCCTCCTCGGC
     10 CTCATCTTCGTCATCCGCATCGTCGGACGGCGCCAGCAGCGCCGCCTCGCAATCGCCGAACACCACCACC
     11 TCGTCGGCCACGCAGACGCCGATGCAGTCTCCACTGCCCACCGACCAAGTGCTATACGCCCTCTACGAGT
     12 GGGTCAGGATGTACCAGAGCCAGCAGAGTGCCCCGCAAATCTTCCAGTATCCGCCGCCAAGCCCCTCTTG
     13 CAATTTCACTGGCGGCGATGTGTTCTTTCCGCACGGCCATCCGAATCCGAACTCGAATCCCCATCCGCGC
     14 ACCCCCCGAACCAGCGTGAGCTTCTCCTCCGGCGAGGAGTACAACTTCTTCCGGCAGCAGCAGCCGCAAC
     15 CACATCCGTCATATCCGGCGCCATCAACACCGCAGCCAATGCCACCGCAGTCAGCGCCGCCGATGCACTG
     16 CAGCCACAGCTACCCGCAGCAGTCGGCGCACATGATGCCACACCATTCCGCTCCCTTCGGAATGGGCGGT
     17 ACCTACTACGCCGGCTACACGCCACCACCCACTCCGAACACGGCCAGTGCGGGCACCTCCAGCTCATCGG
     18 CGGCCTTCGGCTGGCACGGCCACCCCCACAGCCCCTTCACGTCGACCTCCACGCCGTTATCGGCGCCAGT
     19 GGCGCCCAAGATGCGCCTGCAGCGCAGCCAGTCGGATGCGGCCAGACGCAAGCGATTGACCTCGACGGGC
     20 GAGGATGAGCGCGAGTACCAGAGCGATCATGAGGCCACTTGGGACGAGTTTGGCGATCGCTACGACAACT
运行测试预测靶基因代码(提前添加bin到.bashrc):
$ miranda bantam_stRNA.fasta  hid_UTR.fasta > test.out.txt # 这里只设置输入与输出,其他参数默认,或者如下添加其他参数
$ miranda bantam_stRNA.fasta  hid_UTR.fasta -sc 140 -en -1 > test.out.txt
$ cat test.out.txt | grep ">" | less -SN
      1 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  167.00  -24.54  2 20    3340 3360       18      83.33%  94.44%
      2 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  156.00  -20.03  2 17    2505 2525       15      86.67%  93.33%
      3 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  155.00  -14.57  2 16    2852 2872       14      78.57%  85.71%
      4 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  152.00  -14.18  2 18    3820 3841       17      76.47%  76.47%
      5 >>gi|29565487|emb|AJ550546.1|   gi|945100|gb|U31226.1|DMU31226  630.00  -73.32  167.00  -24.54  1       21      3902     3340 2505 2852 3820
# 结果解读:
1)">"开头的几行,是miRNA(gi|29565487|emb|AJ550546.1|)靶标到基因(gi|945100|gb|U31226.1|DMU31226)上的不同位置
2)">"开头的行对应的信息依次是:miRNA id,基因id,打分,自由能,miRNA起始位置,miRNA终止位置,基因起始位置,基因终止位置,靶标结合miRNA长度,miRNA结合长度与miRNA总长占比,基因上结合长度(大于等于前者,可能是一个跨度)对miRNA总长占比
3)">>"的行是综合信息,依次代表:miRNA id,基因id,总打分,总自由能,最大打分,最大自由能,链信息,miRNA长度,基因长度,靶标到基因上的位置(1个或多个,这里是4个)
4)综上,我们可以抓取“>>”的行,来获取靶向的基因,如果需要其他信息也可以自己按需提取,so easy
  1. 再举一例(个性化分析)

miRNA鉴定等分析这里就不说了(赘述),(1)我们测miRNA数据,鉴定并得到差异miRNA(known or novel均可),是否希望看一下他们靶向与那些基因呢?(2)文献、试验等等,发现某些miRNA(可获取序列或名字的)有着重要的生物学功能,机理尚不明确,是否也想看看其靶向关系呢?可以运行miranda。按照(2)我们造几个miRNA进行举例:

3.1 下载miRBase成熟序列,提取human的miRNA进行举例
$ wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
$ gunzip mature.fa.gz
$ less -SN mature.fa # 格式如下
      1 >cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
      2 UGAGGUAGUAGGUUGUAUAGUU
      3 >cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
      4 CUAUGCAAUUUUCUACCUUACC
      5 >cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
      6 UCCCUGAGACCUCAAGUGUGA
      7 >cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
      8 ACACCUGGGCUCUCCGGGUACC
我通过python编程取出10个human的miRNA,存为hsa_mature10.fa
$ cat hsa_mature10.fa
>hsa-let-7a-5p
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p
UGAGGUAGUAGGUUGUGUGGUU
>hsa-let-7b-3p
CUAUACAACCUACUGCCUUCCC
>hsa-let-7c-5p
UGAGGUAGUAGGUUGUAUGGUU
>hsa-let-7c-3p
CUGUACAACCUUCUAGCUUUCC
>hsa-let-7d-5p
AGAGGUAGUAGGUUGCAUAGUU
>hsa-let-7d-3p
CUAUACGACCUGCUGCCUUUCU
>hsa-let-7e-5p
UGAGGUAGGAGGUUGUAUAGUU
接着准备靶向的基因集合,我是通过gtf位置提取了human基因组所有的转录本,取了其中17个来测试(17.fa),当然你也可以去ensembl下载CDS序列,其他方法得到的基因序列均可。
$ grep ">" 17.fa
>ENST00000000233.9  +
>ENST00000000412.7  -
>ENST00000000442.10 +
>ENST00000001008.5  +
>ENST00000001146.6  -
>ENST00000002125.8  +
>ENST00000002165.10 -
>ENST00000002501.10 -
>ENST00000002596.5  -
>ENST00000002829.7  +
>ENST00000003084.10 +
>ENST00000003100.12 -
>ENST00000003302.8  -
>ENST00000003583.12 -
>ENST00000003912.7  +
>ENST00000004103.7  +
>ENST00000004531.14 +
然后预测二者的靶标关系(通过管道输出):
$ miranda hsa_mature10.fa 17.fa -sc 140 -en -1 | grep ">>" > hsa_mature10_targets.txt
我们大概看看几行结果:
$ less -SN hsa_mature10_targets.txt
      1 >>hsa-let-7a-5p ENST00000000412.7       146.00  -11.97  146.00  -11.97  2       22      2756     960
      2 >>hsa-let-7a-5p ENST00000001008.5       143.00  -17.75  143.00  -17.75  4       22      3732     2611
      3 >>hsa-let-7a-5p ENST00000002125.8       156.00  -21.39  156.00  -21.39  6       22      2176     1248
      4 >>hsa-let-7a-5p ENST00000002829.7       146.00  -16.38  146.00  -16.38  10      22      3802     3480
      5 >>hsa-let-7a-5p ENST00000003084.10      299.00  -39.65  152.00  -20.76  11      22      6132     3077 5087
      6 >>hsa-let-7a-5p ENST00000003100.12      143.00  -18.40  143.00  -18.40  12      22      3210     2586
      7 >>hsa-let-7a-5p ENST00000003302.8       438.00  -47.15  152.00  -16.09  13      22      4669     1132 2011 4340
      8 >>hsa-let-7a-5p ENST00000004531.14      163.00  -20.40  163.00  -20.40  17      22      7560     1101
      9 >>hsa-let-7a-3p ENST00000002125.8       161.00  -12.52  161.00  -12.52  23      21      2176     1476
     10 >>hsa-let-7a-3p ENST00000002165.10      158.00  -14.96  158.00  -14.96  24      21      2356     1993
     11 >>hsa-let-7a-3p ENST00000003084.10      148.00  -6.47   148.00  -6.47   28      21      6132     5758
     12 >>hsa-let-7a-3p ENST00000003302.8       149.00  -16.95  149.00  -16.95  30      21      4669     3959
     13 >>hsa-let-7a-3p ENST00000003912.7       146.00  -7.39   146.00  -7.39   32      21      5481     3640
     14 >>hsa-let-7a-3p ENST00000004531.14      292.00  -16.76  146.00  -13.42  34      21      7560     3954 6657

tips:我们看到结果的格式如上面例子所描述的一致,此外,也看到同一miRNA靶向与不同的转录本(基因)上面,关于怎么去解释结果和行文、试验等等,就要看你的阅历和水平了。可以提取所有结果进行网络图的构建、可以提取最好(根据打分和自由能)的基因进行验证(相关性计算、定量、湿试验等等)、单纯查找资料讨论。【转录本与基因的对应关系需要自行转换】。下面也附上miRBase下载成熟序列的截图,也很简单。【本教程再提取序列时需要一定的编程基础,如果你只有两者的序列,也是可以进行靶向预测的】


http://www.mirbase.org
download

二、

三、

相关文章

网友评论

    本文标题:miRNA靶基因预测

    本文链接:https://www.haomeiwen.com/subject/bxmasltx.html