Thanks a lot to the author!
Model, predict and solve
We must understand the type of problem and solution requirement to narrow down to a select few models which we can evaluate.
Supervised Learning plus Classification and Regression
Logistic Regression
KNN or k-Nearest Neighbors
Support Vector Machines
Naive Bayes classifier
Decision Tree
Random Forrest
Perceptron
Artificial neural network
RVM or Relevance Vector Machine
logistic regression
In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of a binary dependent variable—that is, where the output can take only two values, "0" and "1", which represent outcomes such as pass/fail.
We can use Logistic Regression to validate our assumptions and decisions for feature creating and completing goals. This can be done by calculating the coefficient of the features in the decision function.
Positive coefficients increase the log-odds of the response (and thus increase the probability), and negative coefficients decrease the log-odds of the response (and thus decrease the probability).
Feature Correlation
1 Sex 2.201527
5 Title 0.398234
2 Age 0.287163
4 Embarked 0.261762
6 IsAlone 0.129140
3 Fare -0.085150
7 Age*Class -0.311200
0 Pclass -0.749007
模型评价
confidence score generated by the model based on our training dataset.
acc_log = round(logreg.score(X_train, Y_train) * 100, 2)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
acc_svc
Support Vector Machines
Reference Wikipedia. Or here
# Support Vector Machines
svc = SVC()
svc.fit(X_train, Y_train)
Y_pred = svc.predict(X_test)
acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
acc_svc
k-Nearest Neighbors algorithm
classified by a majority vote of its neighbors, with the sample being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. Reference Wikipedia.
Naive Bayes classifier
the use of Bayes' theorem in the classifier's decision rule.(although you don't need to obey Bayesian rule.)
Decision trees
Tree models where the target variable can take a finite set of values are called classification trees.
Leaves - class labels;
Branches - conjunctions of features that lead to those class labels.
Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Reference Wikipedia.
协方差(Covariance)在概率论和统计学中用于衡量两个变量的总体误差。 而方差是协方差的一种特殊情况,即当两个变量是相同的情况。
Degrees of freedom (statistics) - 理解
网友评论