到上月为止,距离我们推出“bioRxiv生信好文速览”栏目已经有整整两年的时间了。第一次同主编谈起做一个预印本(preprint)专题的想法时,主要是为了填补当时中文媒体中对预印本报道的缺失。那时候,国际上对于预印本的宣传和普及已初具规模,小编接触到的很多生信领域的老师都有尝试过预印本,不少院校表示不会在招聘或职位晋升时把简历中标有biorxiv的publication抹去。
我们的栏目分为“引言”和“好文推介”两部分。起初,小编尝试对文章进行一些的解读,后来限于个人能力和时间慢慢放弃了,以英文摘要取而代之。同时,转而在引言着更多笔墨,对上个月预印本发生的事件进行点评,或者谈论一些自己对预印本发表模式的体会,以及预印本所面临的问题。这样的结构也成为了我们的特色,并与同类栏目区分。由于知识背景和领域所限,选取的文章多围绕基因组学,这与近年来业务转型后,生信人公众号多以医学相关生信研究的推送有所出入。然而,这也正使“好文速览”成为对公众号有所“疏远”的基础研究领域的重要补充。
为了更好地宣传预印本,我们也将从本期开始丰富所推荐文章的类型,比如涵盖其他类型的未经同行评议的好文(比如blog),以及非生信领域的preprint,并且欢迎大家自荐或他荐文章。比如,上个月,著名生物信息学家李恒就在他的博客(https://lh3.github.io/)上挂出了好几篇有趣的博文,一起来看看吧。
1. 16 个新测序基因组中蕴含着Vertebrate Genomes Project 怎样的雄心(声称高质量测序现存7万脊椎动物)
Towards complete and error-free genome assemblies of all vertebrate species
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.
2. 27个组织中的转录因子结合位点图谱
Atlas of Transcription Factor Binding Sites from ENCODE DNase Hypersensitivity Data Across 27 Tissue Types
There is intense interest in mapping the tissue-specific binding sites of transcription factors in the human genome to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting provides a means to predict genome-wide binding sites for hundreds of transcription factors (TFs) simultaneously. However, despite the public availability of DNase-seq data for hundreds of samples, there is neither a unified analytical workflow nor a publicly accessible database providing the locations of footprints across all available samples. Here, we implemented a workflow for uniform processing of footprints using two state-of-the-art footprinting algorithms: Wellington and HINT. Our workflow scans the footprints generated by these algorithms for 1,530 sequence motifs to predict binding sites for 1,515 human transcription factors. We applied our workflow to detect footprints in 192 DNase-seq experiments from ENCODE spanning 27 human tissues. This collection of footprints describes an expansive landscape of potential TF occupancy. At thresholds optimized through machine learning, we report high-quality footprints covering 9.8% of the human genome. These footprints were enriched for true positive TF binding sites as defined by ChIP-seq peaks, as well as for genetic variants associated with changes in gene expression. Integrating our footprint atlas with summary statistics from genome-wide association studies revealed that risk for neuropsychiatric traits was enriched specifically at highly-scoring footprints in human brain, while risk for immune traits was enriched specifically at highly-scoring footprints in human lymphoblasts. Our cloud-based workflow is available at github.com/globusgenomics/genomics-footprint and a database with all footprints and TF binding site predictions are publicly available at http://data.nemoarchive.org/other/grant/sament/sament/footprint_atlas.
3. 加拿大英属哥伦比亚大学:单细胞全基因组测序展示基因拷贝数变异在癌症进化和抗药性中的作用
Single cell fitness landscapes induced by genetic and pharmacologic perturbations in cancer
Tumour fitness landscapes underpin selection in cancer, impacting etiology, evolution and response to treatment. Progress in defining fitness landscapes has been impeded by a lack of timeseries perturbation experiments over realistic intervals at single cell resolution. We studied the nature of clonal dynamics induced by genetic and pharmacologic perturbation with a quantitative fitness model developed to ascribe quantitative selective coefficients to individual cancer clones, enable prediction of clone-specific growth potential, and forecast competitive clonal dynamics over time. We applied the model to serial single cell genome (>60,000 cells) and transcriptome (>58,000 cells) experiments ranging from 10 months to 2.5 years in duration. We found that genetic perturbation of TP53 in epithelial cell lines induces multiple forms of copy number alteration that confer increased fitness to clonal populations with measurable consequences on gene expression. In patient derived xenografts, predicted selective coefficients accurately forecasted clonal competition dynamics, that were validated with timeseries sampling of experimentally engineered mixtures of low and high fitness clones. In cisplatin-treated patient derived xenografts, the fitness landscape was inverted in a time-dependent manner, whereby a drug resistant clone emerged from a phylogenetic lineage of low fitness clones, and high fitness clones were eradicated. Moreover, clonal selection mediated reversible drug response early in the selection process, whereas late dynamics in genomically fixed clones were associated with transcriptional plasticity on a fixed clonal genotype. Together, our findings outline causal mechanisms with implication for interpreting how mutations and multi-faceted drug resistance mechanisms shape the etiology and cellular fitness of human cancers.
4. Prox-seq:同时量化mRNA、蛋白质、及蛋白复合体的单细胞测序技术(芝加哥大学Savaş Tay实验室)
Quantification of proteins, protein complexes and mRNA in single cells by proximity-sequencing
Multiplexed analysis of single-cells enables accurate modeling of cellular behaviors, classification of new cell types, and characterization of their functional states. Here we present proximity-sequencing (Prox-seq), a method for simultaneous measurement of an individual cell’s proteins, protein complexes and mRNA. Prox-seq utilizes deep sequencing and barcoded proximity assays to measure proteins and their complexes from all pairwise combinations of targeted proteins, in thousands of single-cells. The number of measured protein complexes scales quadratically with the number of targeted proteins, providing unparalleled multiplexing capacity. We developed a high-throughput experimental and computational pipeline and demonstrated the potential of Prox-Seq for multi-omic analysis with a panel of 13 barcoded proximity probes, enabling the measurement of 91 protein complexes, along with thousands of mRNA molecules in single T-cells and B-cells. Prox-seq provides access to an untapped yet powerful measurement modality for single-cell phenotyping and can discover new protein interactions in signaling and drug studies.
5. 爱丁堡大学:大型三文鱼基因组结构变异研究助于揭示驯化对其的影响
The structural variation landscape in 492 Atlantic salmon genomes
Structural variants (SVs) are a major source of genetic and phenotypic variation, but remain challenging to accurately type and are hence poorly characterized in most species. We present an approach for reliable SV discovery in non-model species using whole genome sequencing and report 15,483 high-confidence SVs in 492 Atlantic salmon (Salmo salar L.) sampled from a broad phylogeographic distribution. These SVs recover population genetic structure with high resolution, include an active DNA transposon, widely affect functional features, and overlap more duplicated genes retained from an ancestral salmonid autotetraploidization event than expected. Changes in SV allele frequency between wild and farmed fish indicate polygenic selection on behavioural traits during domestication, targeting brain-expressed synaptic networks linked to neurological disorders in humans. This study offers novel insights into the role of SVs in genome evolution and the genetic architecture of domestication traits, along with resources supporting reliable SV discovery in non-model species.
6. 进化分析暗示P53蛋白的蛋白序列决定了动物的寿命
The changes in the p53 protein across the animal kingdom pointing to its involvement in longevity
Recently, the quest for the mythical fountain of youth has turned into specific research programs aiming to extend the healthy lifespan of humans. Despite advances in our understanding of the molecular processes underlying aging, the surprisingly extended lifespan of some animals remains unexplained. In this respect, the p53 protein plays a crucial role not only in tumor suppression but also in tissue homeostasis and healthy aging. However, the mechanism through which p53 maintains the function as a gatekeeper of healthy aging is not fully understood. Thus, we inspected TP53 gene sequences in individual species of phylogenetically related organisms that show different aging patterns. We discovered novel correlations between specific amino acid variations in p53 and lifespan across different animal species. In particular, we found that species with extended lifespan have characteristic amino acid substitutions mainly in the p53 DNA binding domain that change its function. These findings lead us to propose a theory of longevity based on alterations in TP53 that might be responsible for determining extended organismal lifespan.
7. 十年一剑:丹麦奥尔堡大学(Aalborg University)发布1083个包含完成rRNA的宏基因组揭示微生物对污水治理的影响
Connecting structure to function with the recovery of over 1000 high-quality activated sludge metagenome-assembled genomes encoding full-length rRNA genes using long-read sequencing
Microorganisms are critical to water recycling, pollution removal and resource recovery processes in the wastewater industry. While the structure of this complex community is increasingly understood based on 16S rRNA gene studies, this structure cannot currently be linked to functional potential due to the absence of high-quality metagenome-assembled genomes (MAGs) with full-length rRNA genes for nearly all species. Here, we sequence 23 Danish full-scale wastewater treatment plant metagenomes, producing >1 Tbp of long-read and >0.9 Tbp of short-read data. We recovered 1083 high-quality MAGs, including 57 closed circular genomes. The MAGs accounted for ~30% of the community, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We show how novel high-quality MAGs in combination with >13 years of amplicon data, Raman microspectroscopy and fluorescence in situ hybridisation can be used to uncover abundant undescribed lineages belonging to important functional groups.
8. 【新冠肺炎】加州理工Lior Pachter:SARS-CoV-2单细胞测序需格外谨慎
log(x+1)* and log(1+x)*
Single-cell RNA-seq technologies have been successfully employed over the past decade to generate many high resolution cell atlases. These have proved invaluable in recent efforts aimed at understanding the cell type specificity of host genes involved in SARS-CoV-2 infections. While single-cell atlases are based on well-sampled highly-expressed genes, many of the genes of interest for understanding SARS-CoV-2 can be expressed at very low levels. Common assumptions underlying standard single-cell analyses don’t hold when examining low-expressed genes, with the result that standard workflows can produce misleading results.
9. 【新冠肺炎】新冠病毒在蝙蝠中的自然选择
Evidence of significant natural selection in the evolution of SARS-CoV-2 in bats, not humans
RNA viruses are proficient at switching to novel host species due to their fast mutation rates. Implicit in this assumption is the need to evolve adaptations in the new host species to exploit their cells efficiently. However, SARS-CoV-2 has required no significant adaptation to humans since the pandemic began, with no observed selective sweeps to date. Here we contrast the role of positive selection and recombination in the Sarbecovirusesin horseshoe bats to SARS-CoV-2 evolution in humans. While methods can detect some evidence for positive selection in SARS-CoV-2, we demonstrate these are mostly due to recombination and sequencing artefacts. Purifying selection is also substantially weaker in SARS-CoV-2 than in the related bat Sarbecoviruses. In comparison, our results show evidence for positive, specifically episodic selection, acting on the bat virus lineage SARS-CoV-2 emerged from. This signature of selection can also be observed among synonymous substitutions, for example, linked to ancestral CpG depletion on this bat lineage. We show the bat virus RmYN02 has recombinant CpG content in Spike pointing to coinfection and evolution in bats without involvement of other species. Our results suggest the non-human progenitor of SARS-CoV-2 was capable of human-human transmission as a consequence of its natural evolution in bats.
10. 【生信博文】李恒: A blog post on comparing fast high-level languages including Julia, Nim and Crystal on FASTQ parsing and interval query
11. 【不说生信】颇具争议的花粉磁珠转化法到底有没有效?
No evidence for transient transformation via pollen magnetofection in several monocot species
The development of rapid and efficient transformation methods for many plant species remains an obstacle in both the basic and applied plant sciences. A novel method described by Zhao et al. (2017) used magnetic nanoparticles to deliver DNA into pollen grains of several dicot species, and one monocot (lily), to achieve transformation (“pollen magnetofection”). Using the published protocol, extensive trials by two independent research groups showed no indication of transient transformation success with pollen from two monocots, maize and sorghum. To further address the feasibility of magnetofection, lily pollen was used for side-by-side trials of magnetofection with a proven methodology for transient transformation, biolistics. Using a Green Fluorescent Protein reporter plasmid, transformation efficiency with the biolistic approach averaged 0.7% over three trials. However, the same plasmid produced no recognizable transformants via magnetofection, despite screening >3500 individual pollen grains. We conclude that pollen magnetofection is not effective for transient transformation of pollen for at least three species of monocots, and suggest that efforts to replicate the magnetofection protocol in dicot species would be useful to fully assess its potential.
另见iplants公众号 bioRxiv 没有证据表明花粉磁珠转化法在单子叶植物中起作用
作者原创原载于生信人
网友评论