上一节《深度学习中的Normalization和Standardization处理1》,介绍了min-max normalization 和 Mean normalization,本节介绍:Standardization (Z-score Normalization)。
Standardization,是机器学习中最常用的Normalization方法,经过Standardization处理后,数据分布不变,但均值和标准差跟standard normal distribution(标准正态分布)的均值和标准差对齐,即均值(期望值)μ=0,标准差σ=1条件下的正态分布。Standardization是Normalization方法中,将数据分布的均值和方差要求为(0,1)的方法,若原始数据本身符合正态分布,那么Standardization处理后,数据符合标准正态分布。
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
def plot(data, title):
sns.set_style('dark')
f, ax = plt.subplots()
ax.set(ylabel='probability desity')
ax.set(xlabel='weight(red) : height(blue)')
ax.set(title=title)
sns.distplot(data[:, 0:1], color='blue')
sns.distplot(data[:, 1:2], color='red')
plt.show()
# 生成不同分布的身高体重数据
np.random.seed(0)
height = np.random.uniform(low=150, high=190, size=1000).reshape(-1, 1)
weight = np.random.normal(loc=70, scale=10, size=1000).reshape(-1, 1)
# 原始数据
original_data = np.concatenate((height, weight), axis=1)
plot(original_data, 'Original')
# min-max normalization
min_max_data = (original_data - np.min(original_data, axis=0)) / (
np.max(original_data, axis=0) - np.min(original_data, axis=0))
plot(min_max_data, 'min-max normalization')
# Mean normalization
mean_normal_data = (original_data - np.mean(original_data, axis=0)) / (
np.max(original_data, axis=0) - np.min(original_data, axis=0))
plot(mean_normal_data, 'Mean normalization')
# Standardization (Z-score Normalization)
z_score_data = (original_data - np.mean(original_data, axis=0)) / np.std(original_data, axis=0)
plot(mean_normal_data, 'Standardization (Z-score Normalization)')
Standardization (Z-score Normalization)处理结果
网友评论