AWS Certified Machine Learning

作者: 数科每日 | 来源:发表于2020-11-26 12:47 被阅读0次

AWS Certified Machine Learning
AWS助理级架构师考试准备
AWS Certified Solutions Architec
AWS认证攻略 – E哥的AWS Solution Archit
Machine learning booooks
SAA-C02 考点概要
00 Machine Learning Introduction
【ML】Machine learning model
AWS认证攻略-AWS Cloud Practitioner 认
Machine Learning @ Python

这里的解释并不准确，主要为了容易记住，顺利通过认证考试。严格的解释还需要参考AWS 官方文档。

1. AWS Kinesis Data Streams

以低延时的方案，高可定制化方式，为分析端提供 stream 数据。在题目中看到 steream data，而且data 直接分析，就高度怀疑该选项。注意区别与 Kinesis Data Firehose。

Example

Your organization is looking for a solution that can help the business with streaming data several services will require access to read and process the same stream concurrently. What AWS service meets the business requirements?
A. Amazon Kinesis Firehose
B.Amazon Kinesis Streams
C. Amazon CloudFront
D. Amazon SQS
Your application generates a 1 KB JSON payload that needs to be queued and delivered to EC2 instances for applications. At the end of the day, the application needs to replay the data for the past 24 hours. In the near future, you also need the ability for other multiple EC2 applications to consume the same stream concurrently. What is the best solution for this?
A. Kinesis Data Streams
B. Kinesis Firehose
C. SNS
D. SQS

2. Amazon Kinesis Firehose

将 stream data 存储到AWS 某个地方，比如 S3，Elasticsearch Service, 或者 Redshift 。后继的分析过程是基于已存储的Data。

Example

Your organization needs to ingest a big data stream into their data lake on Amazon S3. The data may stream in at a rate of hundreds of megabytes per second. What AWS service will accomplish the goal with the least amount of management?
A. Amazon Kinesis Firehose
B. Amazon Kinesis Streams
C. Amazon CloudFront
D. Amazon SQS

Reference
AWS Kinesis Data Streams vs Kinesis Data Firehose

3. Protobuf RecordIO Format

Protobuf RecordIO 是AWS 反复强调可以提高训练速度的数据格式，如果题目中碰到关于数据格式，训练速度的概念。就要高度警惕答案中的 Protobuf RecordIO 格式。我参考 AWS 的文档，整理了一个文件格式 vs Buildin Algorithm 的表格，该表格比 AWS的表格少2行，更易于备考记忆。

ContentType	Algorithm
application/x-image, image/jpeg, image/png	Object Detection Algorithm, Semantic Segmentation
application/x-recordio	Object Detection Algorithm
application/x-recordio-protobuf, text/csv	K-Means, k-NN, Latent Dirichlet Allocation, Linear Learner, NTM, PCA, RCF
application/x-recordio-protobuf,	Factorization Machine, Sequence-to-Sequence
text/csv, text/libsvm	XGBoost
application/jsonlines	BlazingText, DeepAR

Reference

4. Amazon QuickSight

提供Dash Board 的 BI 工具。

1. Amazon QuickSight ML Insights

AWS 给 ML 定制的 Data Visualization 的工具，由于是ML 的一个出口，又是AWS力推的产品，考试大概率会涉及。

https://aws.amazon.com/quicksight/features-ml/?nc=sn&loc=2&dn=2

5. SageMaker 内建 ML 算法

Algorithm	Comments
BlazingText	Word2vec , 文本分类
DeepAR Forecasting	基于RNN的，一维时间序列预测算法，有监督
Factorization Machines	在高维稀疏数据中，寻找 interactions
Image Classification Algorithm	图片分类，有监督
IP Insights	分析可能与IPv4 有关联的数据
K-Means Algorithm	K-means 离散分组，无监督
K-Nearest Neighbors (k-NN) Algorithm	用已经标记的数据分组，有监督（与K-means 不同）
Latent Dirichlet Allocation (LDA)	无监督，文档分类
Linear learner algorithm	有监督，线性分类
Neural Topic Model (NTM) Algorithm	无监督，文档分类
Object2Vec	有监督，用于特征工程，用高密度低维特征，替代高维特征
Object Detection Algorithm	有监督图像目标检测
Principal Component Analysis (PCA) Algorithm	无监督，降维
Random Cut Forest (RCF) Algorithm	无监督，异常检测
Semantic Segmentation	图像处理，细颗粒度
Sequence to Sequence (seq2seq)	有监督，序列生成，不限于文本
XGBoost Algorithm	有监督，回归，分类，分级

Reference

6. SageMaker 读取训练数据

SageMaker 只能从S3中读取数据，如果数据不在 S3中（所以不能从类似数据库这样的地方直接读取），要先存在S3中， Glue 服务会帮忙

image.png

7. AWS ML Related Services

Kinesis : Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application.
Athena : Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Glue : AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
Quick Sight : Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud.
Quick Redshift : No other data warehouse makes it as easy to gain new insights from all your data. With Redshift, you can query and combine exabytes of structured and semi-structured data across your data warehouse, operational database, and data lake using standard SQL.
Lex : Amazon Lex is a service for building conversational interfaces into any application using voice and text.
Polly : Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products.
Transcribe : Amazon Transcribe makes it easy for developers to add speech to text capabilities to their applications. Audio data is virtually impossible for computers to search and analyze.

网友评论

本文标题：AWS Certified Machine Learning

本文链接：https://www.haomeiwen.com/subject/ewraiktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

AWS Certified Machine Learning

1. AWS Kinesis Data Streams

2. Amazon Kinesis Firehose

3. Protobuf RecordIO Format

4. Amazon QuickSight

1. Amazon QuickSight ML Insights

5. SageMaker 内建 ML 算法

6. SageMaker 读取训练数据

7. AWS ML Related Services

相关文章

AWS Certified Machine Learning

AWS助理级架构师考试准备

AWS Certified Solutions Architec

AWS认证攻略 – E哥的AWS Solution Archit

Machine learning booooks

SAA-C02 考点概要

00 Machine Learning Introduction

【ML】Machine learning model

AWS认证攻略-AWS Cloud Practitioner 认

Machine Learning @ Python

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读