美文网首页
2023-06-16 pypark自定义ml模型 Transfo

2023-06-16 pypark自定义ml模型 Transfo

作者: 破阵子沙场秋点兵 | 来源:发表于2023-06-15 15:48 被阅读0次

Pipeline里面的重要概念

MLlib standardizes APIs for machine learning algorithms to make it easier to combine multiple algorithms into a single pipeline, or workflow. This section covers the key concepts introduced by the Pipelines API, where the pipeline concept is mostly inspired by the scikit-learn project.

  • DataFrame: This ML API uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.

  • Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms a DataFrame with features into a DataFrame with predictions.

  • Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.

  • Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.

  • Parameter: All Transformers and Estimators now share a common API for specifying parameters.

参考链接
https://stackoverflow.com/questions/51415784/how-to-add-my-own-function-as-a-custom-stage-in-a-ml-pyspark-pipeline

自定义Transformer请查看
https://stackoverflow.com/questions/32331848/create-a-custom-transformer-in-pyspark-ml

自定义Transformer请查看
https://stackoverflow.com/questions/41399399/serialize-a-custom-transformer-using-python-to-be-used-within-a-pyspark-ml-pipel

自定义Estimtor请查看
https://stackoverflow.com/questions/37270446/how-to-create-a-custom-estimator-in-pyspark

相关文章

网友评论

      本文标题:2023-06-16 pypark自定义ml模型 Transfo

      本文链接:https://www.haomeiwen.com/subject/sdjlydtx.html