Microsatellite repeat quantification from long-read sequencing data.
Introduction
Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the “unsequenceable” genomic trinucleotide repeat disorders.
Features
Accurate and efficient estimation of repeat counts from long-read sequencing data
Analysis of all types of simple repeats
Prefined models are included for more than 10 well known trinucleotide repeats: AFF2, AR, ATN1, ATXN1, ATXN2, ATXN3, ATXN7, ATXN8OS, CACNA1A, DMPK, FMR1, FXN, HTT, PPP2R2B, TBP
Easy to install and use
RepeatHMM consists of several steps, as shown in Figure below. We used trinucleotide repeat as an example below to illustrate the procedure, but RepeatHMM can be used for microsatellites of any size.
![](https://img.haomeiwen.com/i18064756/ba3ddfebff27f75c.png)
网友评论