环境安装
yum -y install epel-release
yum -y install python-pip
pip install numpy pandas xgboost scikit-learn
环境安装速度
阿里云 、 腾讯云 > 百度云
阿里云和腾讯云全部装完不到一分钟
百度云装了10分钟,基本是50k/s的速度
华为由于是ARM服务器,基本上都要编译安装,所以速度最最最最最慢,整整装了一个多小时。
测试代码
本代码通过XGBOOST对一个数据集进行分类预测
数据集参见:https://www.kaggle.com/c/forest-cover-type-prediction/data
import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd
from xgboost import XGBClassifier
from sklearn import metrics
import random
import time
from sklearn.externals import joblib
import os
def xgboost_train(train, test):
features = train[[col for col in train.columns if col not in ['Cover_Type']]]
label = train['Cover_Type']
xgbc = XGBClassifier(
booster='gbtree',
silent=True,
nthread=12, # 根据CPU数量调整
scale_pos_weight=8,
n_estimators=120,
max_depth=7,
min_child_weight=1,
subsample=1,
colsample_bytree=1,
gamma=0.2,
learning_rate=0.2,
max_delta_step=0,
base_score=1,
colsample_bylevel=1,
objective='multi:softmax',
num_class=2,
reg_alpha=2,
reg_lambda=2
)
model = xgbc.fit(features, label)
X_test = test[[col for col in test.columns if col not in ['Cover_Type']]]
y_test = test['Cover_Type']
score = xgbc.score(X_test, y_test)
predictions = model.predict(X_test)
macro = metrics.precision_score(y_test, predictions, average='macro', labels = [1,2,3,4,5,6,7])
return score, macro, model, xgbc
def full_predict():
batchNo = random.randint(0,9999)
f = open('xgboost_train_'+ str(batchNo) +'.txt','w')
dfa = pd.read_csv('train1.csv')
dfa = dfa.dropna()
dfa = dfa.drop(['Ground_position'], axis = 1)
train, test = train_test_split(dfa, test_size=0.2)
start = time.clock()
scores = xgboost_train(train, test)
end = time.clock()
payForRun = end-start
ret = "F1-score: " + str(scores[0]) + "; Macro : " + str(scores[1]) + ", padding : " + str(payForRun)
print(ret)
f.write(ret)
f.write('\n')
f.close()
model = scores[2]
xgbc = scores[3]
t_test = pd.read_csv('test1.csv')
t_test = t_test.drop(['Ground_position'], axis = 1)
X_test2 = t_test[[col for col in t_test.columns if col not in ['Cover_Type']]]
predictions = model.predict(X_test2)
submission = pd.DataFrame({ 'Cover_Type': predictions })
submission.to_csv("full_predict_"+str(batchNo)+".csv", index=False)
joblib.dump(xgbc, "./train_model_"+str(batchNo)+".m")
full_predict()
百度云
- 实例类型:普通II型
- CPU:E5-2680 v4 16核心
- 内存:32GB
- 操作系统:CENTOS 7.3
- 测试使用核心:14
测试结果:28s
腾讯云
- 实例类型:S2.3XLARGE24
- CPU:Intel Xeon E5-2680 v4(2.4 GHz) 12核
- 内存:24GB
- 操作系统:CENTOS 7.3
- 测试使用核心:12
测试结果:20s
阿里云
- 实例类型:ecs.ic5.3xlarge
- CPU:Platinum 8163 12核心
- 内存:12GB
- 操作系统:CENTOS 7
- 测试使用核心:12
测试结果:29s
DELL R630 物理服务器
- CPU:Intel Xeon E5-2620 v3(2.4 GHz) *2 共12 核心
- 内存:64GB
- 操作系统:CentOS 7 x64
- 测试使用核心:24
测试结果:21s
- 测试使用核心:12
测试结果:24s
自用主机
- 实例类型:无
- CPU:I5-7400 4 核心
- 内存:16GB
- 操作系统:Windows 10
- 测试使用核心:4
测试结果:63s
洋垃圾
- 实例类型:未知
- CPU:L5148 4核心
- 内存:12GB
- 操作系统:CENTOS 7
- 测试使用核心:4
测试结果:110s
洋垃圾2
- 实例类型:未知
- CPU:Intel(R) Xeon(R) CPU E7- 4807 @ 1.87GHz 12核
- 内存:12GB
- 操作系统:CENTOS 7
- 测试使用核心:12
测试结果:51s
华为云鲲鹏920 ARM服务器
data:image/s3,"s3://crabby-images/7e27a/7e27adbf058979922efdcda60cb0eb1357202d5d" alt=""
- 实例类型:kc1.3xlarge.2
- CPU:鲲鹏通用计算增强型 | 12vCPUs |
- 内存:24GB
- 操作系统:CentOS 7
- 测试使用核心:12
# 换ARM源
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.huaweicloud.com/repository/conf/CentOS-AltArch-7.repo
yum clean all
yum makecache
yum install gcc -y
yum install python3 -y
yum install python3-devel -y
yum install lapack -y
yum install blas-devel lapack-devel -y
# 以下是编译安装,比较慢。。我的钱钱 ):
pip3 install Cython -i https://pypi.tuna.tsinghua.edu.cn/simple
CFLAGS=-std=c99 pip3 install numpy==1.17.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install scikit-learn -i https://pypi.tuna.tsinghua.edu.cn/simple
pip3 install xgboost -i https://pypi.tuna.tsinghua.edu.cn/simple
测试结果:27s
网友评论