回归问题常用的评估指标
回归问题常用的评估指标包括:MAE, MAPE, MSE, RMSE, R2_Score等。
这些评价指标基本都在 sklearn 包中都封装好了,可直接调用。
安装 sklearn, 完整的名字是 scikit-learn
。
pip install -U scikit-learn
# 现在最新版是 V0.22.2.post1
metric | formula | method |
---|---|---|
MAE | sklearn.metrics.mean_absolute_error | |
MAPE | sklearn.metrics.mean_absolute_percentage_error | |
MSE | sklearn.metrics.mean_squared_error | |
RMSE | sklearn.metrics.mean_squared_error | |
R2 Score | sklearn.metrics.r2_score |
注:
- MAPE 在V0.22.2中还不能直接调用,预计会在V0.23中发布;
- 参见 github #15007
- RMSE 可以调用
mean_squared_error
方法实现, 设置squared=False
即可;rmse_score = mean_squared_error(y_test, y_pred, squared=False)
示例
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
x_true = xxx
y_true = yyy
y_pred = model.predict(x_true)
# 评估标准: mae, mape, mse, rmse, r2_score
mae_score = mean_absolute_error(y_true, y_pred)
mape_score = mean_absolute_percentage_error(y_true, y_pred)
mse_score = mean_squared_error(y_true, y_pred)
rmse_score = mean_squared_error(y_true, y_pred, squared=False)
r_2_score = r2_score(y_true, y_pred)
PS: 修改 sklearn(0.22.2) 源码, 直接调用 mean_absolute_percentage_error
需要修改两个文件: sklearn/metrics/__init__.py
, sklearn/metrics/_regression.py
-
sklearn/metrics/__init__.py
, 修改两处。
## 第 63 行左右
from ._regression import explained_variance_score
from ._regression import max_error
from ._regression import mean_absolute_error
from ._regression import mean_absolute_percentage_error # +++ add MAPE +++
from ._regression import mean_squared_error
from ._regression import mean_squared_log_error
from ._regression import median_absolute_error
# 第 126 行左右
'matthews_corrcoef',
'max_error',
'mean_absolute_error',
'mean_absolute_percentage_error', # +++ add MAPE +++
'mean_squared_error',
'mean_squared_log_error',
'mean_poisson_deviance',
-
sklearn/metrics/_regression.py
, 修改两处
# 第 36 行左右
"max_error",
"mean_absolute_error",
"mean_absolute_percentage_error", # +++ add MAPE +++
"mean_squared_error",
"mean_squared_log_error",
"median_absolute_error",
# 第 190 行左右
return np.average(output_errors, weights=multioutput)
# +++ add MAPE +++
def mean_absolute_percentage_error(y_true, y_pred,
sample_weight=None,
multioutput='uniform_average'):
"""Mean absolute percentage error regression loss
Note here that we do not represent the output as a percentage in range
[0, 100]. Instead, we represent it in range [0, 1/eps]. Read more in the
:ref:`User Guide <mean_absolute_percentage_error>`.
Parameters
----------
y_true : array-like of shape (n_samples,) or (n_samples, n_outputs)
Ground truth (correct) target values.
y_pred : array-like of shape (n_samples,) or (n_samples, n_outputs)
Estimated target values.
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
multioutput : {'raw_values', 'uniform_average'} or array-like
Defines aggregating of multiple output values.
Array-like value defines weights used to average errors.
If input is list then the shape must be (n_outputs,).
'raw_values' :
Returns a full set of errors in case of multioutput input.
'uniform_average' :
Errors of all outputs are averaged with uniform weight.
Returns
-------
loss : float or ndarray of floats in the range [0, 1/eps]
If multioutput is 'raw_values', then mean absolute percentage error
is returned for each output separately.
If multioutput is 'uniform_average' or an ndarray of weights, then the
weighted average of all output errors is returned.
MAPE output is non-negative floating point. The best value is 0.0.
But note the fact that bad predictions can lead to arbitarily large
MAPE values, especially if some y_true values are very close to zero.
Examples
--------
>>> from sklearn.metrics import mean_absolute_percentage_error
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> mean_absolute_percentage_error(y_true, y_pred)
0.3273...
>>> y_true = [[0.5, 1], [-1, 1], [7, -6]]
>>> y_pred = [[0, 2], [-1, 2], [8, -5]]
>>> mean_absolute_percentage_error(y_true, y_pred)
0.5515...
>>> mean_absolute_percentage_error(y_true, y_pred, multioutput=[0.3, 0.7])
0.6198...
"""
y_type, y_true, y_pred, multioutput = _check_reg_targets(
y_true, y_pred, multioutput)
check_consistent_length(y_true, y_pred, sample_weight)
epsilon = np.finfo(np.float64).eps
mape = np.abs(y_pred - y_true) / np.maximum(np.abs(y_true), epsilon)
output_errors = np.average(mape,
weights=sample_weight, axis=0)
if isinstance(multioutput, str):
if multioutput == 'raw_values':
return output_errors
elif multioutput == 'uniform_average':
# pass None as weights to np.average: uniform mean
multioutput = None
return np.average(output_errors, weights=multioutput)
# +++ add MAPE +++
def mean_squared_error(y_true, y_pred,
sample_weight=None,
sklearn-mape-1.png sklearn-mape-2.png参考 github:#15007
网友评论