pyradiomics
官方文档里有几个示例文件,里面涉及了包括yaml文件设置、feature extraction、可视化等一系列影像组学常规操作,是非常好的学习资料。源文件链接:https://github.com/AIM-Harvard/pyradiomics
今天学习在名称为''FeatureVisualization"这份文档。
pyRadiomics feature visualization using multi-dimensional scaling and heatmaps
- 下载数据
Note:
前言介绍说示例数据来自Brigham and Women's Surgical Planning Lab(SPL),但是运行代码会报错。手动核实代码里的网址,发现网页已经不存在了。所以这一节(及另外一个以这个data为示例的文档)里就只能学习代码,不能运行它们了。
# Download the zip file if it does not exist
import os, zipfile
import pandas as pd
import seaborn as sns
from six.moves import urllib
url = "http://www.spl.harvard.edu/publications/bitstream/download/5270"
filename = 'example_data/Tumorbase.zip'
if not os.path.isfile(filename):
if not os.path.isdir('example_data'):
os.mkdir('example_data')
print ("retrieving")
urllib.request.urlretrieve(url, filename)
else:
print ("file already downloaded")
extracted_path = 'example_data/tumorbase'
if not os.path.isdir(extracted_path):
print ("unzipping")
z = zipfile.ZipFile(filename)
z.extractall('example_data')
print ("done unzipping")
- 导入必要的函数
# Import some libraries
import SimpleITK as sitk
from radiomics import featureextractor
- 提取特征(Extract features)
# Load up the segmentations, 1 to 10 and extract the features
params = os.path.join(os.getcwd(), '..', 'examples', 'exampleSettings', 'Params.yaml')
extractor = featureextractor.RadiomicsFeatureExtractor(params)
# hang on to all our features
features = {}
for case_id in range(1,11):
path = 'example_data/tumorbase/AutomatedSegmentation/case{}/'.format(case_id)
image = sitk.ReadImage(path + "grayscale.nrrd")
mask = sitk.ReadImage(path + "segmented.nrrd")
# Tumor is in label value 6
features[case_id] = extractor.execute ( image, mask, label=6 )
# A list of the valid features, sorted
feature_names = list(sorted(filter ( lambda k: k.startswith("original_"), features[1] )))
# Make a numpy array of all the values
import numpy as np
samples = np.zeros((10,len(feature_names)))
for case_id in range(1,11):
a = np.array([])
for feature_name in feature_names:
a = np.append(a, features[case_id][feature_name])
samples[case_id-1,:] = a
# May have NaNs
samples = np.nan_to_num(samples)
- Multidimensional scaling
Multidimensional scaling or MDS is as way to visualize very high dimensional data in a lower dimensional space. In our case, the feature space is len(feature_names) (or 93) dimensional space. To help us understand the data, we project into 2d space. MDS preserves the relative distance between sample points during the projection, so two samples close together in 2d space would also be close together in the original 93-dimensional space (and vice versa).
We us non-metric algorithm, because our data are highly non-uniform in the scale of each feature.
Note:
MDS算法思想很简单,一句话就是:保持样本在原空间和低维空间的距离不变。因为距离是样本之间一个很好的分离属性,对于大多数聚类算法来说,距离是将样本分类的重要属性,因此当我们降维后,保持距离不变,那么就相当于保持了样本的相对空间关系不变。(CSDN@changyuanchn)
from sklearn import manifold
from sklearn.metrics import euclidean_distances
from sklearn.decomposition import PCA
similarities = euclidean_distances(samples)
seed = np.random.RandomState(seed=3)
mds = manifold.MDS(n_components=2, max_iter=5000, eps=1e-12, random_state=seed,
n_init=10,
dissimilarity="precomputed", n_jobs=1, metric=False)
pos = mds.fit_transform(similarities)
- 作图(Plot)
# Plot
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
import matplotlib.cm as cm
fig = plt.figure(1)
ax = plt.axes([0., 0., 1., 1.])
s = 100
# Type of tumor
meningioma = [0, 1, 2]
glioma = [3,5,9]
astrocytoma = [4, 6, 7, 8]
plt.scatter(pos[meningioma, 0], pos[meningioma, 1], color='navy', alpha=1.0, s=s, lw=1, label='meningioma')
plt.scatter(pos[glioma, 0], pos[glioma, 1], color='turquoise', alpha=1.0, s=s, lw=1, label='glioma')
plt.scatter(pos[astrocytoma, 0], pos[astrocytoma, 1], color='darkorange', alpha=0.5, s=s, lw=1, label='astrocytoma')
plt.legend(scatterpoints=1, loc=5, shadow=False)
similarities = similarities.max() / similarities * 100
similarities[np.isinf(similarities)] = 0
plt.show()
image.png
- 绘制热图(Plot features as a heatmap)
import pandas as pd
import seaborn as sns
# Construct a pandas dataframe from the samples
d = pd.DataFrame(data=samples, columns=feature_names)
corr = d.corr()
# Set up the matplotlib figure, make it big!
f, ax = plt.subplots(figsize=(15, 10))
# Draw the heatmap using seaborn
sns.heatmap(corr, vmax=.8, square=True)
plt.show()
image.png
- 绘制聚类热图(Cluster the heatmap)
# Choose a subset of features for clustering
dd = d.iloc[:,1:50]
pp = sns.clustermap(dd.corr(), linewidths=.5, figsize=(13,13))
_ = plt.setp(pp.ax_heatmap.get_yticklabels(), rotation=0)
plt.show()
image.png
网友评论