Examples
1、标准化:去均值,方差规模化
Standardization标准化:将特征数据的分布调整成标准正太分布,也叫高斯分布,也就是使得数据的均值维0,方差为1.
标准化的原因在于如果有些特征的方差过大,则会主导目标函数从而使参数估计器无法正确地去学习其他特征。
标准化的过程为两步:去均值的中心化(均值变为0);方差的规模化(方差变为1)。
from sklearn.preprocessing import StandardScaler
features = ['accommodates','bedrooms','bathrooms','beds','price','minimum_nights','maximum_nights','number_of_reviews']
dc_listings = pd.read_csv('listings.csv')
dc_listings = dc_listings[features]
dc_listings['price'] = dc_listings.price.str.replace("\$|,",'').astype(float)
dc_listings = dc_listings.dropna()
dc_listings[features] = StandardScaler().fit_transform(dc_listings[features])
2、sklearn knn 模型
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
X = dc_listings.loc[:,dc_listings.columns != 'price']
y = dc_listings.loc[:,dc_listings.columns == 'price']
# 切分训练集、样本集
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size =0.3, random_state =0)
knn = KNeighborsRegressor()
knn.fit(X_train,y_train)
knn_result = knn.predict(X_test)
# 均方误差
mse = mean_squared_error(y_test,knn_result)
# 均方根误差
rmse = mse ** (1/2)
rmse
其他资源参考:
K-近邻(KNN)算法
网友评论