
Loom is an efficient file format for very large omics datasets, consisting of a main matrix, optional additional layers, a variable number of row and column annotations, and sparse graph objects. We use loom files to store single-cell gene expression data: the main matrix contains the actual expression values (one column per cell, one row per gene); row and column annotations contain metadata for genes and cells, such as Name, Chromosome, Position (for genes), and Strain, Sex, Age (for cells). Graph objects are used to store nearest-neighbor graphs used for graph-based clustering.
The Loom logo illustrates how all the parts fit together:

Loom is an efficient file format for large omics datasets. Loom files contain a main matrix, optional additional layers, a variable number of row and column annotations, and sparse graph objects. Under the hood, Loom files are HDF5 and can be opened from many programming languages, including Python, R, C, C++, Java, MATLAB, Mathematica, and Julia.
Key features:
Single file that can be moved around
Metadata travels with the main data
Data, clustering, layout, annotation stored together
Efficient random access
Automatic, on-the-fly compression
Out-of-memory data processing
Open source, BSD license
网友评论