From: montreal 生信人
预印本(preprint)服务器的队伍近年来一直不断壮大,然而也不是会有平台关门的消息传来。近日,bioRxiv的兄弟平台Peerj preprint宣布,将在9月30日结束预印本投稿,并在年底停止已发布preprint的再更新。不过,Peerj预印本平台不会关闭,也就是说上面的文章可以继续浏览但不接受新的文章投放。自2013年诞生以来,PeerJ preprint上发布的预印本共有超过5000份在同行评议期刊上发表。
作为新兴的开放获取出版商,PeerJ包括多本同行评议期刊以及预印本平台两大块内容,其最初创办目的有两个:第一,也是主要目的:provide a superior peer-reviewed experience shaped by its Academic Editors;第二,通过创办预印本服务器:bring preprints back to biology。现在,PeerJ团队认为随着预印本服务器越来越多,是时候将团队的主要精力放在主要目的,并停止预印本服务器的服务了。尽管很遗憾,但需要清楚,维持一个预印本网站——完全免费开放的学术平台——需要一定成本。作为一家需要盈利的出版商,PeerJ肯定要权衡多方面因素考量。小编也希望其他预印本平台找到适合自己的可持续发展模式,更好地服务科学。
1. 【Transcriptomics】斯坦福大学Krasnow课题组描绘单细胞水平的肺基因表达图谱
A molecular cell atlas of the human lung from single cell RNA sequencing
Although single cell RNA sequencing studies have begun providing compendia of cell expression profiles, it has proven more difficult to systematically identify and localize all molecular cell types in individual organs to create a full molecular cell atlas. Here we describe droplet- and plate-based single cell RNA sequencing applied to ∼70,000 human lung and blood cells, combined with a multi-pronged cell annotation approach, which have allowed us to define the gene expression profiles and anatomical locations of 58 cell populations in the human lung, including 41 of 45 previously known cell types or subtypes and 14 new ones. This comprehensive molecular atlas elucidates the biochemical functions of lung cell types and the cell-selective transcription factors and optimal markers for making and monitoring them; defines the cell targets of circulating hormones and predicts local signaling interactions including sources and targets of chemokines in immune cell trafficking and expression changes on lung homing; and identifies the cell types directly affected by lung disease genes. Comparison to mouse identified 17 molecular types that appear to have been gained or lost during lung evolution and others whose expression profiles have been substantially altered, revealing extensive plasticity of cell types and cell-type-specific gene expression during organ evolution including expression switches between cell types. This lung atlas provides the molecular foundation for investigating how lung cell identities, functions, and interactions are achieved in development and tissue engineering and altered in disease and evolution.
2. 【System biology】Chemical Checker——超过80万个小分子生物活性资源库
Extending the small molecule similarity principle to all levels of biology(CC-BY-NC-ND 4.0)
We present the Chemical Checker (CC), a resource that provides processed, harmonized and integrated bioactivity data on 800,000 small molecules. The CC divides data into five levels of increasing complexity, ranging from the chemical properties of compounds to their clinical outcomes. In between, it considers targets, off-targets, perturbed biological networks and several cell-based assays such as gene expression, growth inhibition and morphological profilings. In the CC, bioactivity data are expressed in a vector format, which naturally extends the notion of chemical similarity between compounds to similarities between bioactivity signatures of different kinds. We show how CC signatures can boost the performance of drug discovery tasks that typically capitalize on chemical descriptors, including target identification and library characterization. Moreover, we demonstrate and experimentally validate that CC signatures can be used to reverse and mimic biological signatures of disease models and genetic perturbations, options that are otherwise impossible using chemical information alone.
3. 【Genomics】昆明动物所张亚平团队:基因组水平上高原家养动物的趋同进化
Convergent genomic signatures of high altitude adaptation among domestic mammals
Abundant and diverse domestic mammals living on the Tibetan Plateau provide useful materials for investigating adaptive evolution and genetic convergence. Here, we utilized 327 genomes from horses, sheep, goats, cattle, pigs and dogs living at both high and low altitudes, including 73 genomes generated for this study, to disentangle the genetic mechanisms underlying local adaptation of domestic mammals. Although molecular convergence is comparatively rare at the DNA sequence level, we found convergent signature of positive selection at the gene level, particularly EPAS1 gene in these Tibetan domestic mammals. We also reported a potential function in response to hypoxia for the gene C10orf67, which underwent positive selection in three of the domestic mammals. Our data provides insight into adaptive evolution of high-altitude domestic mammals, and should facilitate the search for additional novel genes involved in the hypoxia response pathway.
4. 【Bioinformatics】新工具PopDel声称可以更快检测5万+全基因组测序中的删除缺失
PopDel identifies medium-size deletions jointly in tens of thousands of genomes(CC-BY-NC-ND 4.0)
Thousands of genomic structural variants segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Here we present PopDel, which identifies and genotypes deletions of about 500 to at least 10,000 bp in length in many genomes jointly. PopDel scales to tens of thousands of genomes as demonstrated by our evaluation on data of up to 49,962 genomes. Compared to previous tools, PopDel reduces the computational time needed to analyze 150 genomes from weeks to days. The deletions detected by PopDel in a single sample show a large overlap with high-confidence reference call sets. On data of up to 6,794 trios, inheritance patterns suggest a low false positive rate at a high recall. PopDel reliably reports common, rare and de novo deletions and the deletions reflect reported population structure. Therefore, PopDel enables routine scans for deletions in large-scale sequencing studies.
5. 【Omics】博德研究所(Broad Institute)Luca Pinello实验室:10种scATAC-seq软件的系统比较
Assessment of computational methods for the analysis of single-cell ATAC-seq data
Background Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans) lead to inherent data sparsity (1-10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (20-50% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level.Results We present a benchmarking framework that was applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were evaluated by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed.Conclusions This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC was the only method able to analyze a large dataset (> 80,000 cells).
6. 【Genomics】从端粒到端粒
Telomere-to-telomere assembly of a complete human X chromosome(CC0)
After nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no one chromosome has been finished end to end, and hundreds of unresolved gaps persist 1,2. The remaining gaps include ribosomal rDNA arrays, large near-identical segmental duplications, and satellite DNA arrays. These regions harbor largely unexplored variation of unknown consequence, and their absence from the current reference genome can lead to experimental artifacts and hide true variants when re-sequencing additional human genomes. Here we present a de novo human genome assembly that surpasses the continuity of GRCh38 2, along with the first gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome 3, we reconstructed the ∼2.8 megabase centromeric satellite DNA array and closed all 29 remaining gaps in the current reference, including new sequence from the human pseudoautosomal regions and cancer-testis ampliconic gene families (CT-X and GAGE). This complete chromosome X, combined with the ultra-long nanopore data, also allowed us to map methylation patterns across complex tandem repeats and satellite arrays for the first time. These results demonstrate that finishing the human genome is now within reach and will enable ongoing efforts to complete the remaining human chromosomes.
欧洲生物信息学研究所(European Bioinformatics Institute)联合主席Ewan Birney在推特上感叹道:五年前几乎所有科学家都会认为这是不可思议的!
7. 【Evolution】古菌起源再起波澜,背对背文章结论迥然不同
7.1 德国杜塞尔多夫大学(University Düsseldorf)Martin课题组:
Anomalous phylogenetic behavior of ribosomal proteins in metagenome assembled genomes(CC-BY-NC-ND 4.0)
Metagenomic studies have claimed the existence of novel lineages with unprecedented properties never before observed in prokaryotes. Such lineages include Asgard archaea1–3, which are purported to represent archaea with eukaryotic cell complexity, and the Candidate Phyla Radiation (CPR), a novel domain level taxon erected solely on the basis of metagenomic data4. However, it has escaped the attention of most biologists that these metagenomic sequences are not assembled into genomes by sequence overlap, as for cultured archaea and bacteria. Instead, short contigs are sorted into computer files by a process called binning in which they receive taxonomic assignment on the basis of sequence properties like GC content, dinucleotide frequencies, and stoichiometric co-occurrence across samples. Consequently, they are not genome sequences as we know them, reflecting the gene content of real organisms. Rather they are metagenome assembled genomes (MAGs). Debates that Asgard data are contaminated with individual eukaryotic sequences5–7 are overshadowed by the more pressing issue that no evidence exists to indicate that any sequences in binned Asgard MAGs actually stem from the same chromosome, as opposed to simply stemming from the same environment. Here we show that Asgard and CPR MAGs fail spectacularly to meet the most basic phylogenetic criterion8 fulfilled by genome sequences of all cultured prokaryotes investigated to date: the ribosomal proteins of Asgard and CPR MAGs do not share common evolutionary histories. Their phylogenetic behavior is anomalous to a degree never observed with genomes of real organisms. CPR and Asgard MAGs are binning artefacts, assembled from environments where up to 90% of the DNA is from dead cells9–12. Asgard and CPR MAGs are unnatural constructs, genome-like patchworks of genes that have been stitched together into computer files by binning.
7.2 日本学者Ken Takai团队:
Isolation of an archaeon at the prokaryote-eukaryote interface
The origin of eukaryotes remains enigmatic. Current data suggests that eukaryotes may have risen from an archaeal lineage known as “Asgard archaea”. Despite the eukaryote-like genomic features found in these archaea, the evolutionary transition from archaea to eukaryotes remains unclear due to the lack of cultured representatives and corresponding physiological insight. Here we report the decade-long isolation of a Lokiarchaeota-related Asgard archaeon from deep marine sediment. The archaeon, “Candidatus Prometheoarchaeum syntrophicum strain MK-D1”, is an anaerobic, extremely slow-growing, small cocci (∼550 nm), that degrades amino acids through syntrophy. Although eukaryote-like intracellular complexities have been proposed for Asgard archaea, the isolate has no visible organella-like structure. Ca. P. syntrophicum instead displays morphological complexity – unique long, and often, branching protrusions. Based on cultivation and genomics, we propose an “Entangle-Engulf-Enslave (E3) model” for eukaryogenesis through archaea-alphaproteobacteria symbiosis mediated by the physical complexities and metabolic dependency of the hosting archaeon.
8. 【Bioinformatics】加州大学圣克鲁斯分校(UC Santa Cruz)Benedict Paten联手哥本哈根大学张国杰等团队开发超大规模多基因组比对工具Cactus
Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era(CC-BY-NC-ND 4.0)
Cactus, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequence. We describe progressive extensions to Cactus that enable reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We show that Cactus is capable of scaling to hundreds of genomes and beyond by describing results from an alignment of over 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment yet created. Further, we show improvements in orthology resolution leading to downstream improvements in annotation.
9. 【Evolution】长寿植物桉树有着更低的体细胞突变率?
A phylogenomic approach reveals a low somatic mutation rate in a long-lived plant(CC-BY 4.0)
Somatic mutations can have important effects on the life history, ecology, and evolution of plants, but the rate at which they accumulate is poorly understood, and has been very difficult to measure directly. Here, we demonstrate a novel method to measure somatic mutations in individual plants and use this approach to estimate the somatic mutation rate in a large, long-lived, phenotypically mosaic Eucalyptus melliodora tree. Despite being 100 times larger than Arabidopsis, this tree has a per-generation mutation rate only ten times greater, which suggests that this species may have evolved mechanisms to reduce the mutation rate per unit of growth. This adds to a growing body of evidence that illuminates the correlated evolutionary shifts in mutation rate and life history in plants.
10. 大型合作项目:全球58个城市近4000样本的微生物组生物地理学
Global Genetic Cartography of Urban Metagenomes and Anti-Microbial Resistance(CC-BY 4.0)
Although studies have shown that urban environments and mass-transit systems have geospa-tially distinct metagenomes, no study has ever systematically studied these dense, human/microbial ecosystems around the world. To address this gap in knowledge, we created a global metagenomic and antimicrobial resistance (AMR) atlas of urban mass transit systems from 58 cities, spanning 3,741 samples and 4,424 taxonomically-defined microorganisms collected for three years. The map provides annotated, geospatial data about microbial strains, functional genetics, antimicrobial resistance, and novel genetic elements, including 10,928 novel predicted viral species. Urban microbiomes often resemble human commensal microbiomes from the skin and airways but contain a consistent “core” of 61 species which are predominantly not human commensal species. These data also show that AMR density across cities varies by several orders of magnitude with many AMRs present on plasmids with cosmopolitan distributions. Conversely, samples may be accurately (91.4%) classified to their city-of-origin using a linear support vector machine over taxa. Together, these results constitute a high-resolution global metagenomic atlas, which enables the discovery of new genetic components of the built human environment, forensic application, and an essential first draft of the global AMR burden of the world’s cities.
作者原创,原载于生信人公众号
网友评论