2021-02-12

作者: byejya | 来源:发表于2021-04-25 19:03 被阅读0次

过年啦！
周五 2021-02-12 22:45 - 06:30 晴 06
2021-02-12 大年初一
第二部《发财日志》
无形之刃（10）
设置底线-不违规
打开
致人类书：（97）你们的意识进化之路，是一条菩提道
林子漫笔“微日记”(1036-1040)：蚂蚁呀嘿“七天乐”
#Dairy103 大年初一

LaBrachoR (LSTM Branchpoint Retriever)

LaBranchoR uses a LSTM network built with keras to predict the position of RNA splicing branchpoints relative to a three prime splice site. Precisely evaluating LaBranchoR was challenging due to pervasive noise in the experimental data, but as we show in our paper, we estimate that LaBranchoR correcty predicts a branchpoint for over 90% of 3'ss.

Paggi J.M., Bejerano, G. A sequence-based, deep learning model accurately predicts RNA splicing branchpoints. bioRxiv 185868 (2017). DOI:10.1101/185868

Download existing branchpoint annotations

See our website linked above to download branchpoint predictions for introns in gencode v19 (hg19) or view LaBranchoR predicted branchpoints in the UCSC genome browser.

Running LaBranchoR

If having to run the model yourself would stop you from using LaBranchoR, please open an issue requesting the desired predictions or contact the authors via email.

All of the code and model weights needed to run LaBranchoR are available in the 'labranchor' directory. Running LaBranchoR requires keras and numpy to be installed.

Predicting branchpoints

The script labranchor.py makes predictions for a fasta file of sequences upstream of 3'ss. It can be invoked with

以下解释'labranchor' directory内的三个文件各自作用

1

python labrachor.py weights 'top-bed'/'top'/'all' fasta_file output

weights: The path to the h5 weights file (labranchor/2layer.h5)

'top-bed'/'top'/'all': top-bed: produces a bed file of predicted branchpoints. Assumes fasta names are chrom:3'ss_coord:strand (ex. chr1:1000:+) top: reports the shift of the top scoring branchpoint from the associated 3'ssfor each fasta entry all: reports a comma seperated list of branchpoint probabilities corresponding to positions -70 to -1 from each 3'ss

fasta_file: Path to a fasta file of sequences upstream of 3'ss. Input sequences are required to be 70 base pairs and should not contain characters other than 'A', 'C', 'G', 'T', or 'N'. Any Ns will be considered A's during prediction.

output: Path to the output file. See the above options for formatting.

Creating 3'ss sequence fasta files

The script create_fasta.py can be used to create fasta files suitable for branchpoint prediction for all introns in given gtf file.

It can be invoked with:

invoked 运用

python create_fasta.py genome gtf output

genome: A path to a genome fasta file consistent with the gtf file.

gtf: The path to the gtf file you wish to predict branchpoints in.

output: The path to the output fasta file.

Analysis Included in Paper

notebooks目录下的内容：

1

Model training: notebooks/train_model.ipynb

打开train_model.ipynb

1

能看到整个训练过程

Model performance: notebooks/performance_*

Cases where LaBranchoR disagrees with experimental data: notebooks/disagreement_*

LaBranchoR不同意实验数据的案例

Genome-wide properties and overlap with pathogenic variants: notebooks/landscape_*

全基因组特性和与致病变异重叠

Properties of C and no -2 U branchpoints: notebooks/landscape_C_and_noT.ipynb

Enrichments of ExAC variants: notebooks/ExAC_variant_enrichments.ipynb

Generation of ISM supplmentary data: notebooks/supp_data.ipynb

Analysis not included in paper

Exploration of nucleotide importances: notebooks/importance.ipynb

Analysis of secondary structure near branchpoints: notebooks/secondary_*

网友评论

本文标题：2021-02-12

本文链接：https://www.haomeiwen.com/subject/vqgdxltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2021-02-12

相关文章

过年啦！

周五 2021-02-12 22:45 - 06:30 晴 06

2021-02-12 大年初一

第二部《发财日志》

无形之刃（10）

设置底线-不违规

打开

致人类书：（97）你们的意识进化之路，是一条菩提道

林子漫笔“微日记”(1036-1040)：蚂蚁呀嘿“七天乐”

#Dairy103 大年初一

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读