美文网首页
AWS Certified Machine Learning

AWS Certified Machine Learning

作者: 数科每日 | 来源:发表于2020-11-26 12:47 被阅读0次

这里的解释并不准确,主要为了容易记住, 顺利通过认证考试。 严格的解释还需要参考AWS 官方文档。


1. AWS Kinesis Data Streams

以低延时的方案,高可定制化方式, 为分析端提供 stream 数据。 在题目中看到 steream data, 而且data 直接分析,就高度怀疑该选项。 注意区别与 Kinesis Data Firehose。

Example

  1. Your organization is looking for a solution that can help the business with streaming data several services will require access to read and process the same stream concurrently. What AWS service meets the business requirements?
    A. Amazon Kinesis Firehose
    B.Amazon Kinesis Streams
    C. Amazon CloudFront
    D. Amazon SQS

  2. Your application generates a 1 KB JSON payload that needs to be queued and delivered to EC2 instances for applications. At the end of the day, the application needs to replay the data for the past 24 hours. In the near future, you also need the ability for other multiple EC2 applications to consume the same stream concurrently. What is the best solution for this?
    A. Kinesis Data Streams
    B. Kinesis Firehose
    C. SNS
    D. SQS

2. Amazon Kinesis Firehose

将 stream data 存储到AWS 某个地方,比如 S3,Elasticsearch Service, 或者 Redshift 。后继的分析过程是基于已存储的Data。

Example

  1. Your organization needs to ingest a big data stream into their data lake on Amazon S3. The data may stream in at a rate of hundreds of megabytes per second. What AWS service will accomplish the goal with the least amount of management?
    A. Amazon Kinesis Firehose
    B. Amazon Kinesis Streams
    C. Amazon CloudFront
    D. Amazon SQS

Reference
AWS Kinesis Data Streams vs Kinesis Data Firehose

3. Protobuf RecordIO Format

Protobuf RecordIO 是AWS 反复强调可以提高训练速度的数据格式, 如果题目中碰到关于数据格式, 训练速度的概念。 就要高度警惕答案中的 Protobuf RecordIO 格式。我参考 AWS 的文档, 整理了一个文件格式 vs Buildin Algorithm 的表格, 该表格比 AWS的表格少2行, 更易于备考记忆。

ContentType Algorithm
application/x-image, image/jpeg, image/png Object Detection Algorithm, Semantic Segmentation
application/x-recordio Object Detection Algorithm
application/x-recordio-protobuf, text/csv K-Means, k-NN, Latent Dirichlet Allocation, Linear Learner, NTM, PCA, RCF
application/x-recordio-protobuf, Factorization Machine, Sequence-to-Sequence
text/csv, text/libsvm XGBoost
application/jsonlines BlazingText, DeepAR

Reference

  1. Content Types Supported by Built-In Algorithms

  2. Using Pipe input mode for Amazon SageMaker algorithms

4. Amazon QuickSight

提供Dash Board 的 BI 工具。

1. Amazon QuickSight ML Insights

AWS 给 ML 定制的 Data Visualization 的工具, 由于是ML 的一个出口, 又是AWS力推的产品, 考试大概率会涉及。

https://aws.amazon.com/quicksight/features-ml/?nc=sn&loc=2&dn=2

5. SageMaker 内建 ML 算法

Algorithm Comments
BlazingText Word2vec , 文本分类
DeepAR Forecasting 基于RNN的, 一维时间序列预测算法, 有监督
Factorization Machines 在高维稀疏数据中, 寻找 interactions
Image Classification Algorithm 图片分类, 有监督
IP Insights 分析可能与IPv4 有关联的数据
K-Means Algorithm K-means 离散分组, 无监督
K-Nearest Neighbors (k-NN) Algorithm 用已经标记的数据分组, 有监督(与K-means 不同)
Latent Dirichlet Allocation (LDA) 无监督,文档分类
Linear learner algorithm 有监督,线性分类
Neural Topic Model (NTM) Algorithm 无监督,文档分类
Object2Vec 有监督,用于特征工程,用高密度低维特征,替代高维特征
Object Detection Algorithm 有监督图像目标检测
Principal Component Analysis (PCA) Algorithm 无监督,降维
Random Cut Forest (RCF) Algorithm 无监督,异常检测
Semantic Segmentation 图像处理,细颗粒度
Sequence to Sequence (seq2seq) 有监督,序列生成,不限于文本
XGBoost Algorithm 有监督,回归,分类,分级

Reference

  1. Use Amazon SageMaker Built-in Algorithms

  2. Sequence-to-Sequence Algorithm

  3. XGBoost Algorithm

6. SageMaker 读取训练数据

SageMaker 只能从S3中读取数据, 如果数据不在 S3中 (所以不能从类似数据库这样的地方直接读取), 要先存在S3中, Glue 服务会帮忙

image.png

7. AWS ML Related Services

  • Kinesis : Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application.
  • Athena : Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
  • Glue : AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
  • Quick Sight : Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud.
  • Quick Redshift : No other data warehouse makes it as easy to gain new insights from all your data. With Redshift, you can query and combine exabytes of structured and semi-structured data across your data warehouse, operational database, and data lake using standard SQL.
  • Lex : Amazon Lex is a service for building conversational interfaces into any application using voice and text.
  • Polly : Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products.
  • Transcribe : Amazon Transcribe makes it easy for developers to add speech to text capabilities to their applications. Audio data is virtually impossible for computers to search and analyze.

相关文章

网友评论

      本文标题:AWS Certified Machine Learning

      本文链接:https://www.haomeiwen.com/subject/ewraiktx.html