# Creating and connecting
##Creating Loom files
创建一个loom文件,需要提供一个矩阵数据(numpy ndarray or scipy sparse matrix)和两个关于行列属性的字典(字典中,属性名是键值,numpy ndarrays是对应的值)。如果矩阵数据是N×M,行属性必需有N个元素,列属性应该有M个元素。
import numpy as np
import loompy
filename = "test.loom"
matrix = np.arange(10000).reshape(100,100)
row_attrs = { "SomeRowAttr": np.arange(100) }
col_attrs = { "SomeColAttr": np.arange(100) }
loompy.create(filename, matrix, row_attrs, col_attrs)
接受numpy dense matrices (numpy.ndarray
) ,scipy sparse matrices(scipy.sparse.coo_matrix
, orscipy.sparse.csr_matrix
import numpy as np
import loompy
import scipy.sparse as sparse
filename = "test.loom"
matrix = sparse.coo_matrix((100, 100))
row_attrs = { "SomeRowAttr": np.arange(100) }
col_attrs = { "SomeColAttr": np.arange(100) }
loompy.create(filename, matrix, row_attrs, col_attrs)
for sample in samples:
with loompy.connect(sample) as dsin:
logging.info(f"Appending {sample}.")
dsout.add_columns(ds.layers, col_attrs=dsin.col_attrs, row_attrs=dsin.row_attrs)
loompy.combine(files, output_filename, key="Accession")
- loompy.create_from_cellranger()导入10X Genomics cellranger结果
loompy.create_from_cellranger(folder, output_filename)
##Connecting to Loom files
- 可以使用一个with语句来处理连接:
with loompy.connect("filename.loom") as ds:
# do something with ds
ds = loompy.connect("filename.loom")
#Manipulate data
##Shape, indexing and slicing
>>> ds.shape
(100, 2345)
#loom matrix中数据可以通过索引和切片进行获取。
- Indices: anything that can be converted to a Python long
- Slices (i.e.
) - Lists of the rows/columns you want (i.e.
[0, 34, 576]
) - Mask arrays (i.e. numpy array of bool indicating the rows/columns you want)
ds[0:10, 0:10] # Return the 10x10 submatrix starting at row and column zero
ds[99, :] # Return the 100th row
ds[:, 99] # Return the 100th column
ds[[0,3,5], :] # Return rows with index 0, 3 and 5
ds[:, bool_array] # Return columns where bool_array elements are True
## ##Sparse data
- 可以导入主矩阵或者任意层的数据,返回稀疏矩阵
ds.layers["exons"].sparse() # Returns a scipy.sparse.coo_matrix
ds.layers["unspliced"].sparse(rows, cols) # Returns only the indicated rows and columns (ndarrays of integers or bools)
- 给层非陪数据
ds.layers["exons"] = my_sparse_matrix
## ##Modifying layers
ds[:, :] = newdata # Assign a full matrix
ds[3, 500] = 31 # Set the element at (3, 500) to the value 31
ds[99, :] = rowdata # Assign new values to row with index 99
ds[:, 99] = coldata # Assign new values to column with index 99
## ##Global attributes
>>> ds.attrs.title
"The title of the dataset"
>>> ds.attrs.title = "New title"
>>> ds.attrs["title"]
"New title"
>>> del ds.attrs.title
>>> ds.attrs.keys()
["title", "description"]
>>> for key, value in ds.attrs.items():
>>> print(f"{key} = {value}")
title = New title
description = Fancy dataset
##Row and column attributes
- 行和列属性是分别通过
ds.ca.keys() # Return list of column attribute names
ds.ra.Gene = ... # Create or replace the Gene attribute
a = ds.ra.Gene # Assign the array of gene names (assuming the attribute exists)
del ds.ra.Gene # Delete the Gene row attribute
- 属性也可以通过索引来访问:
a = ds.ra["Gene"] # Assign the array of gene names (assuming the attribute exists)
del ds.ra["Gene"] # Delete the Gene row attribute
- 你可以将多个具有相同类型的属性提取到一个numpy数组中
a = ds.ra["Gene", "Attribute"] # Returns a 2D array of shape (n_genes, 2)
b = ds.ca["PCA1", "PCA2"] # Returns a 2D array of shape (n_cells, 2)
- 访问多个属性
a = ds.ra["Gene", "GeneName"] # Return one or the other (if only one exists)
b = ds.ca["TSNE", "PCA", "UMAP"] # Return the one that exists (if only one exists)
- 使用属性来模糊索引主矩阵,在选择子数组时可以使用非常紧凑和可读的语法:
array([[ 2., 9., 9., ..., 0., 14., 0.]], dtype=float32)
>>> ds[(ds.ra.Gene == "Actb") | (ds.ra.Gene == "Gapdh"), :]
array([[ 2., 9., 9., ..., 0., 14., 0.],
[ 0., 1., 4., ..., 0., 14., 3.]], dtype=float32)
>>> ds[:, ds.ca.CellID == "AAACATACATTCTC-1"]
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]], dtype=float32)
for ‘or’, &
for ‘and’ and ~
for ‘not’。
(a == b) & (a > c) | ~(c <= b)
##Modifying attributes
with loompy.connect("filename.loom") as ds:
ds.ca.ClusterNames = values # where values is a list or ndarray with one element per column
# This does not change the attribute on disk:
ds.ca.ClusterNames[10] = "banana"
##Adding columns
- 您可以向现有的loom文件添加列。添加行或删除矩阵的任何部分无法操作。
ds.add_columns(submatrix, col_attrs)
- 如果要向空文件添加列,还必须提供行属性:
ds.add_columns(submatrix, col_attrs, row_attrs={"Gene": genes})
- 你也可以添加另一个.loom文件的内容:other_file添加到ds中
ds.add_loom(other_file, key="Gene")
loom支持多层, 每个loom只有一个的主矩阵,但可以有一个或多个具有相同行数和列数的额外层。使用LoomConnection对象上的Layers属性访问层。
- 层支持与属性相同的python API:
ds.layers.keys() # Return list of layers
ds.layers["unspliced"] # Return the layer named "unspliced"
ds.layers["spliced"] = ... # Create or replace the "spliced" layer
a = ds.layers["spliced"][:, 10] # Assign the 10th column of layer "spliced" to the variable a
del ds.layers["spliced"] # Delete the "spliced" layer
- 为了方便起见,层也可以直接在连接对象上使用。
ds["spliced"] = ... # Create or replace the "spliced" layer
a = ds["spliced"][:, 10] # Assign the 10th column of layer "spliced" to the variable a
del ds["spliced"] # Delete the "spliced" layer
- 有时您可能需要创建一个空层(全部为零),以便稍后填充。空层是通过为层名指定类型来创建的。例如:
ds["empty_floats"] = "float32"
ds["empty_ints"] = "int64"
- 行图和列图在使用ds.row_graphs 和 ds.col_graphs访问。例如:
ds.row_graphs.keys() # Return list of row graphs
ds.col_graphs.KNN = ... # Create or replace the column-oriented graph KNN
a = ds.col_graphs.KNN # Assign the KNN column graph to variable a
del ds.col_graphs.KNN # Delete the KNN graph
loompy Views的功能是通过切片将loom文件特定的内容读入到内存,然后进行查看。
ds.view[:, 10:20]
- 通过map方法可以在数据行或者列的维度上对数据进行迭代处理:
ds.map([np.mean, np.std], axis=1)
- map对于函数设置的参数接受一个函数list,然后将每个函数都在整个数据进行遍历,就算一个函数也应该是list结构。
(means,) = ds.map([np.mean], axis=1)
- 排列行或列的顺序:
rdering = np.random.permutation(np.arange(ds.shape[1]))
ds.permute(ordering, axis=1)
- 对于非常大的织机文件,分批(batches)scan文件(沿着行或列)是非常有用的,以避免将整个文件加载到内存中。这可以使用scan()方法实现:
for (ix, selection, view) in ds.scan(axis=1):
# do something with each view
import numpy as np
import loompy
filename = "test.loom"
matrix = np.arange(10000).reshape(100,100)
row_attrs = { "SomeRowAttr": np.arange(100) }
col_attrs = { "SomeColAttr": np.arange(100) }
loompy.create(filename, matrix, row_attrs, col_attrs)
ds = loompy.connect("test.loom")
for (ix, selection, view) in ds.scan(axis=1):
print(ix, selection)
0 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99]
- 还可以选定的列或行子集内容进行扫描。例如:
cells = # List of columns you want to see
for (ix, selection, view) in ds.scan(items=cells, axis=1):
# do something with each view
#cell就是对应的index; array([ 0, 1, 2, 3])
loom R版本参考: loomR介绍及使用指南