Titanic 模型建立 - Kaggle

作者: 程序猪小羊 | 来源:发表于2018-02-25 01:21 被阅读25次

Thanks a lot to the author!

Model, predict and solve

We must understand the type of problem and solution requirement to narrow down to a select few models which we can evaluate.

Supervised Learning plus Classification and Regression

Logistic Regression
KNN or k-Nearest Neighbors
Support Vector Machines
Naive Bayes classifier
Decision Tree
Random Forrest
Perceptron
Artificial neural network
RVM or Relevance Vector Machine

logistic regression

In statistics, logistic regression, or logit regression, or logit model^[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of a binary dependent variable—that is, where the output can take only two values, "0" and "1", which represent outcomes such as pass/fail.

We can use Logistic Regression to validate our assumptions and decisions for feature creating and completing goals. This can be done by calculating the coefficient of the features in the decision function.

Positive coefficients increase the log-odds of the response (and thus increase the probability), and negative coefficients decrease the log-odds of the response (and thus decrease the probability).

Feature Correlation

1 Sex 2.201527
5 Title 0.398234
2 Age 0.287163
4 Embarked 0.261762
6 IsAlone 0.129140
3 Fare -0.085150
7 Age*Class -0.311200
0 Pclass -0.749007

模型评价
confidence score generated by the model based on our training dataset.
acc_log = round(logreg.score(X_train, Y_train) * 100, 2)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
acc_svc

Support Vector Machines
Reference Wikipedia. Or here

# Support Vector Machines

svc = SVC()
svc.fit(X_train, Y_train)
Y_pred = svc.predict(X_test)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
acc_svc

k-Nearest Neighbors algorithm
classified by a majority vote of its neighbors, with the sample being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. Reference Wikipedia.

Naive Bayes classifier
the use of Bayes' theorem in the classifier's decision rule.(although you don't need to obey Bayesian rule.)

Decision trees
Tree models where the target variable can take a finite set of values are called classification trees.
Leaves - class labels;
Branches - conjunctions of features that lead to those class labels.
Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Reference Wikipedia.

协方差（Covariance）在概率论和统计学中用于衡量两个变量的总体误差。而方差是协方差的一种特殊情况，即当两个变量是相同的情况。
Degrees of freedom (statistics) - 理解

网友评论

本文标题：Titanic 模型建立 - Kaggle

本文链接：https://www.haomeiwen.com/subject/lygfxftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Titanic 模型建立 - Kaggle

Model, predict and solve

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读