美文网首页
pyspark提升树(GBT)

pyspark提升树(GBT)

作者: 米斯特芳 | 来源:发表于2021-08-12 12:53 被阅读0次

没什么好说的,就是一种调用方式学习

分类

from pyspark.ml import Pipeline
from pyspark.ml.classification import GBTClassifier
from pyspark.ml.feature import StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("GradientBoostedTreeClassifierExample")\
    .getOrCreate()

data = spark.read.format("libsvm").load("sample_libsvm_data.txt")
labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel").fit(data)
featureIndexer =VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)# 小于5种取值的特征转为类别向量
(trainingData, testData) = data.randomSplit([0.7, 0.3])
gbt = GBTClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", maxIter=10)
pipeline = Pipeline(stages=[labelIndexer, featureIndexer, gbt])
model = pipeline.fit(trainingData)
predictions = model.transform(testData)
predictions.select("prediction", "indexedLabel", "features").show(5)
evaluator = MulticlassClassificationEvaluator(
    labelCol="indexedLabel", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictions)
print("Test Error = %g" % (1.0 - accuracy))
gbtModel = model.stages[2]
print(gbtModel) # 打印第3阶段的模型


回归

from pyspark.ml import Pipeline
from pyspark.ml.regression import GBTRegressor
from pyspark.ml.feature import VectorIndexer
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.sql import SparkSession

spark = SparkSession\
    .builder\
    .appName("GradientBoostedTreeRegressorExample")\
    .getOrCreate()

data = spark.read.format("libsvm").load("sample_libsvm_data.txt")
featureIndexer =\
    VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)
(trainingData, testData) = data.randomSplit([0.7, 0.3])
gbt = GBTRegressor(featuresCol="indexedFeatures", maxIter=10)
pipeline = Pipeline(stages=[featureIndexer, gbt])
model = pipeline.fit(trainingData)
predictions = model.transform(testData)
predictions.select("prediction", "label", "features").show(5)
evaluator = RegressionEvaluator(
    labelCol="label", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)
print("Root Mean Squared Error (RMSE) on test data = %g" % rmse)
gbtModel = model.stages[1]
print(gbtModel)  

相关文章

  • pyspark提升树(GBT)

    没什么好说的,就是一种调用方式学习 分类 回归

  • Spark之获取GBT二分类函数的概率值

      在Spark中,GBT(Gradient Boost Trees,提升树)函数用于实现机器学习中的提升树算法,...

  • XGBOOST

    在了解xgboost之前我们先了解一下梯度提升树(gbt) 梯度提升树 梯度提升是构建预测模型的最强大技术之一,它...

  • Spark Python API Docs(part one)

    pyspark package subpackages pyspark.sql module pyspark.st...

  • GBT

    既然做出了选择那就不要后悔。既然融不进去,那就不要硬融。不适合的地方,很难再回首。唯一伤感的就是没能好聚好散…

  • gbt

    { "capabilities": [ "proposal" ], "version": 4, "previous...

  • 梯度增强分类/回归器参数详解

    1. GBT类库概述 GBT( Gradient Boosting Tree) 有很多简称,有GTB (Gradi...

  • pyspark整理

    pyspark入门资料 公众号回复:pyspark (会有pyspark资料大礼包:Learning PySpar...

  • PySpark初见

    PySpark PySpark 是 Spark 为 Python 开发者提供的 API。 子模块pyspark.s...

  • Jupyter配置教程

    将jupyter notebook作为pyspark的默认编辑器 安装pyspark通过拷贝pyspark包安装源...

网友评论

      本文标题:pyspark提升树(GBT)

      本文链接:https://www.haomeiwen.com/subject/ycllbltx.html