逻辑回归LogisticRegression

作者: 酥脆海苔饼干 | 来源:发表于2018-12-03 15:36 被阅读0次

第二十二章 logistic regression 算法（上）
逻辑回归LogisticRegression
机器学习中的调参
机器学习预测乳腺癌良恶性（1）（逻辑回归）
073 逻辑回归
sklearn的逻辑回归Logistic Regression（
LogisticRegression & Maxent（面试准备
基于LogisticRegression的鸢尾花分类
逻辑回归之深入浅出
线性回归和逻辑回归对比

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, n_features=4)
print(X) #np.array类型，1000*3
print(y) #np.array类型，1000*1
lr = LogisticRegression()
X_train = X[:-200]
X_test = X[-200:]
y_train = y[:-200]
y_test = y[-200:]
lr.fit(X_train, y_train)
y_train_predictions = lr.predict(X_train)
print(type(y_train_predictions))
y_test_predictions = lr.predict(X_test)
print ((y_train_predictions == y_train).sum().astype(float) / y_train.shape[0])
print ((y_test_predictions == y_test).sum().astype(float) / y_test.shape[0])

LogisticRegression()中的可加入参数较多，包含有：
(1)penalty:正则化项，l2正则化的目的是为防止过拟合，其内容为各权重的平方和加权
(2)C:目标函数的系数；因此C越大时，表示正则化的能力越弱
(3)tol：迭代停止值
(4)solver:求梯度的方法，默认选择‘liblinear’---线性分类器。其可选参数类型包含{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default: ‘liblinear’.
根据API文档，各参数的优势为：
For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.
‘newton-cg’, ‘lbfgs’ and ‘sag’ only handle L2 penalty, whereas ‘liblinear’ and ‘saga’ handle L1 penalty.
Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
(5)dual:是否采用对偶方式进行求解；dual=true表示对偶方式,primal为原问题方式。