任务
- 阅读
- Machine Learning in Action(对应第一章)
- 实践
- 安装python环境
- 导入NumPy、BeautifulSoup等数据挖掘与机器学习所需要的模块
实践
- Python的安装
- python官网下载对应操作系统的python版本,如windows
- windows系统,配置环境变量
- 安装IDE,如PyCharm(可选)
- Python第三方模块的导入的方法
- 下载pip
- 解压到一个文件夹,进入解压目录后,cmd输入
python setup.py install
- 添加环境变量
安装盘:\Python版本\Scripts
- 使用cmd安装beautifulsoup4模块
pip install beautifulsoup4
- 导入numpy
- 下载64位对应python3版本的numpy的wheel文件
- pip安装wheel
pip install wheel
- 到对应的numpy的wheel所在的文件夹内,cmd输入
pip install numpy-1.12.0+mkl-cp36-cp36m-win_amd64.whl
- 测试numpy是否安装成功,在python shell中输入
random.rand(4,4)
- 导入scikit-learn
- 下载scikit-learn对应的wheel文件
- 找到wheel文件存放的文件夹,
pip install scikit_learn-0.18.1-cp36-cp36m-win_amd64.whl
- scikit-learn需要先下载安装scipy
- 导入matplotlib
- 下载[matplotlib]的wheel文件并安装(http://www.lfd.uci.edu/~gohlke/pythonlibs/#matplotlib)
- 测试matplotlib的安装
# 官方demo
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from matplotlib import cm
fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y, Z = axes3d.get_test_data(0.05)
ax.plot_surface(X, Y, Z, rstride=8, cstride=8, alpha=0.3)
cset = ax.contourf(X, Y, Z, zdir='z', offset=-100, cmap=cm.coolwarm)
cset = ax.contourf(X, Y, Z, zdir='x', offset=-40, cmap=cm.coolwarm)
cset = ax.contourf(X, Y, Z, zdir='y', offset=40, cmap=cm.coolwarm)
ax.set_xlabel('X')
ax.set_xlim(-40, 40)
ax.set_ylabel('Y')
ax.set_ylim(-40, 40)
ax.set_zlabel('Z')
ax.set_zlim(-100, 100)
plt.show()
阅读
Machine Learning in Action第一章:机器学习基础
- 总结本章中比较重要的观点
- 机器学习让我们从“生”的数据集中提炼出有意义的信息,以便我们从中获取洞见
With machine learning we can gain insight from a dataset,not cyborg rote memorization, and not the creation of sentient beings
Machine learning is turning data into information - 机器学习是一门运用统计学的学科,之所以需要统计学,因为现实世界中并没有那么多确定性
Machine learning uses statistics.There are many problems where the solution isn’t deterministic. That is, we don’t know enough about the problem or don’t have enough computing power to properly model the problem. For these problems we need statistics.
-
机器学习问题分类
- 机器学习一般步骤
- 收集数据源
- 准备输入数据
- 分析输入数据
- 训练算法
- 测试算法
- 使用
- 名词解释
- expert systems(专家系统): 可以像某个领域的专家那样处理专业问题的系统
By creating a computer program to recognize birds, we’ve replaced an ornithologist with a computer. The ornithologist is a bird expert, so we’ve created an expert system.
- features/attributes(特征): 类似于标签,是对事物属性的描述
features可以有以下几种取值:
- numeric
- binary
- enumeration
- classifiction(归类)
For the moment, assume we have all that information. How do we then decide if a bird at our feeder is an Ivory-billed Woodpecker or somethingelse? This task is called classification.
- regression(回归): 对于数值变化的预测,揭示出数值变化的规律
Regression is the prediction of a numeric value.
- training set(训练集):用以训练算法的数据源
A training set is the set of training examples we’ll use to train our machine learning algorithms.
- target variable(目标变量):机器学习算法预测的目标值
The target variable is what we’ll be trying to predict with our machine learning algorithms. In classification the target variable takes on a nominal value, and in the task of regression its value could be continuous.
- test set(测试集):是从training set中分割出来的数据集,用以测试算法的准确性
To test machine learning algorithms what’s usually done is to have a training set of data and a separate dataset, called a test set.
- supervised learning(监督学习): 人为干预(为数据贴标签等)情况下的机器学习
This set of problems is known as supervised because we’re
telling the algorithm what to predict.
- unsupervised learning(非监督学习):人为不干预,如聚类问题
In unsupervised learning, there’s no label or target value given for the data. A task where we group similar items together is known as clustering.
资源汇总
python官网
python官网windows版本下载
python环境准备网络博客
codecademy上的Python学习
windows下面安装Python和pip终极教程
Python中的Numpy、SciPy、MatPlotLib安装与配置
网友评论