美文网首页
Titanic 模型建立 - Kaggle

Titanic 模型建立 - Kaggle

作者: 程序猪小羊 | 来源:发表于2018-02-25 01:21 被阅读25次

    Thanks a lot to the author!

    Model, predict and solve

    We must understand the type of problem and solution requirement to narrow down to a select few models which we can evaluate.

    Supervised Learning plus Classification and Regression

    Logistic Regression
    KNN or k-Nearest Neighbors
    Support Vector Machines
    Naive Bayes classifier
    Decision Tree
    Random Forrest
    Perceptron
    Artificial neural network
    RVM or Relevance Vector Machine

    logistic regression

    In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of a binary dependent variable—that is, where the output can take only two values, "0" and "1", which represent outcomes such as pass/fail.

    We can use Logistic Regression to validate our assumptions and decisions for feature creating and completing goals. This can be done by calculating the coefficient of the features in the decision function.

    Positive coefficients increase the log-odds of the response (and thus increase the probability), and negative coefficients decrease the log-odds of the response (and thus decrease the probability).

    Feature Correlation
    

    1 Sex 2.201527
    5 Title 0.398234
    2 Age 0.287163
    4 Embarked 0.261762
    6 IsAlone 0.129140
    3 Fare -0.085150
    7 Age*Class -0.311200
    0 Pclass -0.749007

    模型评价
    confidence score generated by the model based on our training dataset.
    acc_log = round(logreg.score(X_train, Y_train) * 100, 2)
    acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
    acc_svc

    Support Vector Machines
    Reference Wikipedia. Or here

    # Support Vector Machines
    
    svc = SVC()
    svc.fit(X_train, Y_train)
    Y_pred = svc.predict(X_test)
    acc_svc = round(svc.score(X_train, Y_train) * 100, 2)
    acc_svc
    

    k-Nearest Neighbors algorithm
    classified by a majority vote of its neighbors, with the sample being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. Reference Wikipedia.

    Naive Bayes classifier
    the use of Bayes' theorem in the classifier's decision rule.(although you don't need to obey Bayesian rule.)

    Decision trees
    Tree models where the target variable can take a finite set of values are called classification trees.
    Leaves - class labels;
    Branches - conjunctions of features that lead to those class labels.
    Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Reference Wikipedia.

    协方差(Covariance)在概率论和统计学中用于衡量两个变量的总体误差。 而方差是协方差的一种特殊情况,即当两个变量是相同的情况。
    Degrees of freedom (statistics) - 理解

    相关文章

      网友评论

          本文标题:Titanic 模型建立 - Kaggle

          本文链接:https://www.haomeiwen.com/subject/lygfxftx.html