From: montreal 生信人 2019年11月9日
9月30日,冷泉港实验室(CSHL)正式发表声明,推出了一项旨在针对于bioRxiv上的预印本(preprint)手稿的透明审稿服务,并在上个月正式开始运行。
该服务的英文全称为:Transparent Review in Preprints,简写为TRiP。一般而言,文章的审稿意见是不会公开的(尽管目前越来越多杂志尝试性地推出透明审稿意见计划)。eLife与EMBO旗下四种杂志,以及两家独立的预审稿同行评议机构,Peerage of Science and Review Commons,成为了TRiP的尝鲜者,据称包括Plos系列及美国植物生物学学会旗下杂志在内的不少期刊也跃跃欲试。
bioRxiv的一个重要目的是提升研究的透明性,而透明性既包括文章的研究结果和数据,也涵盖审稿过程和修改。随着TRiP的推出,投稿到这些杂志或预审稿机构的bioRxiv的preprint,可以根据作者要求在bioRxiv上伴随审稿意见公开。当然,作者有权选择不公开这些评审意见,但一旦选择公布,所有审稿意见,包括负面的意见,都会出现在bioRxiv上。特别地,如果作者选择通过上述两家预审稿机构进行预审稿,那么这些意见也将有可能被参与TRiP计划的同行评议期刊采纳,以此减少在投稿、审稿、拒稿到再投稿,这一循环中浪费的不必要的时间——不光是作者,也有审稿人和编辑的时间。更多有关TRiP的资讯,请大家浏览下面的网页:
https://www.cshl.edu/transparent-review-in-preprints/
先说到这,下面看一看小编为大家带来了哪些十月份的bioRxiv生信好文吧。
1. 由DNA序列直接预测三维基因组的软件——Akita
Predicting 3D genome folding from DNA sequence(CC-BY 4.0)
In interphase, the human genome sequence folds in three dimensions into a rich variety of locus-specific contact patterns. Here we present a deep convolutional neural network, Akita, that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of CTCF and reveal a complex grammar underlying genome folding. Akita enables rapid in silico predictions for sequence mutagenesis, genome folding across species, and genetic variants.
2. Cumulus:基于云端的大规模单细胞RNA-seq分析工具——博德研究所Aviv Regev实验室出品
Cumulus: a cloud-based data analysis framework for large-scale single-cell and single-nucleus RNA-seq(CC-BY-NC-ND 4.0)
Massively parallel single-cell and single-nucleus RNA-seq (sc/snRNA-seq) have opened the way to systematic tissue atlases in health and disease, but as the scale of data generation is growing, so does the need for computational pipelines for scaled analysis. Here, we developed Cumulus, a cloud-based framework for analyzing large scale sc/snRNA-seq datasets. Cumulus combines the power of cloud computing with improvements in algorithm implementations to achieve high scalability, low cost, user-friendliness, and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.
3. 家系数据库的隐私问题
Attacks on genetic privacy via uploads to genealogical databases(CC-BY 4.0)
Direct-to-consumer (DTC) genetics services are increasingly popular for genetic genealogy, with tens of millions of customers as of 2019. Several DTC genealogy services allow users to upload their own genetic datasets in order to search for genetic relatives. A user and a target person in the database are identified as genetic relatives if the user’s uploaded genome shares one or more sufficiently long segments in common with that of the target person—that is, if the two genomes share one or more long regions identical by state (IBS). IBS matches reveal some information about the genotypes of the target person, particularly if the chromosomal locations of IBS matches are shared with the uploader. Here, we describe several methods by which an adversary who wants to learn the genotypes of people in the database can do so by uploading multiple datasets. Depending on the methods used for IBS matching and the information about IBS segments returned to the user, substantial information about users’ genotypes can be revealed with a few hundred uploaded datasets. For example, using a method we call IBS tiling, we estimate that an adversary who uploads approximately 900 publicly available genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 uploads of falsified datasets can reveal enough genetic information to allow accurate genome-wide imputation of every person in the database. We provide simple-to-implement suggestions that will prevent the exploits we describe and discuss our results in light of recent trends in genetic privacy, including the recent use of uploads to DTC genetic genealogy services by law enforcement.
4. 长读段alignment to graphs比对工具GraphAligner:声称12倍快于同类工具
GraphAligner: Rapid and Versatile Sequence-to-Graph Alignment(CC-BY-ND 4.0)
Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pan-genome graph. Yet, so far this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to state-of-the-art tools, GraphAligner is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. When employing GraphAligner for error correction, we find it to be almost 3x more accurate and over 15x faster than extant tools. Availability Package managerhttps://anaconda.org/bioconda/graphaligner and source code:https://github.com/maickrau/GraphAligner
5. 组织病理学与深度学习的完美结合
Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis(CC-BY-NC 4.0)
· Pan-cancer computational histopathology analysis with deep learning extracts histopathological patterns and accurately discriminates 28 cancer and 14 normal tissue types
· Computational histopathology predicts whole genome duplications, focal amplifications and deletions, as well as driver gene mutations
· Wide-spread correlations with gene expression indicative of immune infiltration and proliferation
· Prognostic information augments conventional grading and histopathology subtyping in the majority of cancers
6. 拟南芥全基因组范围从头DNA甲基化的新观点
Common alleles of CMT2 and NRPE1 are major determinants of de novo DNA methylation variation in Arabidopsis thaliana(CC-BY 4.0)
Author Summary:DNA methylation is a major component of transposon silencing, and essential for genomic integrity. Recent studies revealed large-scale geographic variation as well as the existence of major trans-acting polymorphisms that partly explained this variation. In this study, we re-analyze previously published data (The 1001 Epigenomes), focusing on de novoDNA methylation patterns of individual TEs and TE families rather than on genome-wide averages (as was done in previous studies). GWAS of the patterns reveals the underlying regulatory networks, and allowed us to comprehensively characterize trans-regulation of de novo DNA methylation and its role in the striking geographic pattern for this phenotype.
7. 川大刘建权组:杨树成种过程中多样的自然选择
Evidence for widespread selection in shaping the genomic landscape during speciation of Populus(CC-BY-NC-ND 4.0)
Increasing our understanding of how various evolutionary processes drive the genomic landscape of variation is fundamental to a better understanding of the genomic consequences of speciation. However, the genome-wide patterns of within- and between-species variation have not been fully investigated in most forest tree species despite their global ecological and economic importance. Here, we use whole-genome resequencing data from four Populus species spanning the speciation continuum to reconstruct their demographic histories, investigate patterns of diversity and divergence, infer their genealogical relationships and estimate the extent of ancient introgression across the genome. Our results show substantial variation in these patterns along the genomes although this variation is not randomly distributed but is strongly predicted by the local recombination rates and the density of functional elements. This implies that the interaction between recurrent selection and intrinsic genomic features has dramatically sculpted the genomic landscape over long periods of time. In addition, our findings provide evidence that, apart from background selection, recent positive selection and long-term balancing selection are also crucial components in shaping patterns of genome-wide variation during the speciation process.
8. 马普发育生物研究所大佬Detlef Weigel关于植物微生物互作新研究
Combining whole genome shotgun sequencing and rDNA amplicon analyses to improve detection of microbe-microbe interaction networks in plant leaves(CC-BY 4.0)
Microorganisms from all domains of life establish associations with plants. Although some harm the plant, others antagonize pathogens or prime the plant immune system, acquire nutrients, tune plant hormone levels, or perform additional services. Most culture-independent plant microbiome research has focused on amplicon sequencing of 16S rDNA and/or the internal transcribed spacer (ITS) of rDNA loci, but the decreasing cost of high-throughput sequencing has made shotgun metagenome sequencing increasingly accessible. Here, we describe shotgun sequencing of 275 wild Arabidopsis thaliana leaf microbiomes from southwest Germany, with additional bacterial 16S rDNA and eukaryotic ITS1 amplicon data from 176 of these samples. The shotgun data were dominated by bacterial sequences, with eukaryotes contributing only a minority of reads. For shotgun and amplicon data, microbial membership showed weak associations with both site of origin and plant genotype, both of which were highly confounded in this dataset. There was large variation among microbiomes, with one extreme comprising samples of low complexity and a high load of microorganisms typical of infected plants, and the other extreme being samples of high complexity and a low microbial load. We use the metagenome data, which captures the ratio of bacterial to plant DNA in leaves of wild plants, to scale the 16S rDNA amplicon data such that they reflect absolute bacterial abundance. We show that this cost-effective hybrid strategy overcomes compositionality problems in amplicon data and leads to fundamentally different conclusions about microbiome community assembly.
9. 宏基因组研究者的福音——Genome Constellation
A new method for rapid genome classification, clustering, visualization, and novel taxa discovery from metagenome(CC0)
Classifying taxa, including those that have not previously been identified, is a key task in characterizing the microbial communities of under-described habitats, including permanently ice-covered lakes in the dry valleys of the Antarctic. Current supervised phylogeny-based methods fall short on recognizing species assembled from metagenomic datasets from such habitats, as they are often incomplete or lack closely known relatives. Here, we report an efficient software suite, “Genome Constellation”, that is capable of rapidly characterizing a large number of metagenome-assembled genomes. Genome Constellation estimates similarities between genomes based on their k-mer matches, and subsequently uses these similarities for classification, clustering, and visualization. The clusters of reference genomes formed by Genome Constellation closely resemble known phylogenetic relationships while simultaneously revealing unexpected connections. In a dataset containing 1,693 draft genomes assembled from the Antarctic lake communities where only 40% could be placed in a phylogenetic tree, Genome Constellation improves taxa assignment to 61%. The clustering-based analysis revealed several novel taxa groups, including six clusters that may represent new bacterial phyla. Remarkably, we discovered 63 new giant viruses, 3 of which could not be found by using the traditional marker-based approach. In summary, we demonstrate that Genome Constellation provides an unbiased option to rapidly analyze a large number of microbial genomes and visually explore their relatedness. The software is available under BSD license at: https://bitbucket.org/berkeleylab/jgi-genomeconstellation/.
10. 洛克菲勒大学David Zeevi实验室:密码子的构成或受环境影响
Resource conservation manifests in the genetic code
Ocean microbes are responsible for about 50% of primary production on Earth, and are strongly affected by environmental resource availability. However, selective forces resulting from environmental conditions are not well understood. We studied selection by examining single-nucleotide variants in the marine environment, and discovered strong purifying selective forces exerted across marine microbial genes. We present evidence indicating that this selection is driven by the environment, and especially by nitrogen availability. We further corroborate that nutrient availability drives this ‘resource-driven’ selection by showing stronger selection on highly expressed and extracellular genes, that are more resource-consuming. Finally, we show that the standard genetic code, along with amino acid abundances, facilitates nutrient conservation by providing robustness to mutations that increase nitrogen and carbon consumption. Notably, this robustness generalizes to multiple taxa across all domains of life, including the Human genome, and manifests in the code structure itself. Overall, we uncover overwhelmingly strong purifying selective pressure across marine microbial life that may have contributed to the structure of our genetic code.
网友评论