美文网首页
AWS Kinesis & Glue

AWS Kinesis & Glue

作者: Lyudmilalala | 来源:发表于2021-08-17 23:37 被阅读0次

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs.

Amazon Kinesis Data Streams synchronously replicates data across three availability zones, providing high availability and data durability.

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is the easiest way to load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today.

It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration.

It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.

Amazon Kinesis Data Firehose synchronously replicates data across three facilities in an AWS Region, providing high availability and durability for the data as it is transported to the destinations.

delivery stream - A delivery stream is the underlying entity of Amazon Kinesis Data Firehose. You use Firehose by creating a delivery stream and then sending data to it.
record - A record is the data of interest your data producer sends to a delivery stream.
destination - A destination is the data store where your data will be delivered.

Amazon Kinesis Data Analytics

Amazon Kinesis Data Analytics is the easiest way to transform and analyze streaming data in real time with Apache Flink. Apache Flink is an open source framework and engine for processing data streams. Amazon Kinesis Data Analytics reduces the complexity of building, managing, and integrating Apache Flink applications with other AWS services.

Amazon Kinesis Client Library

AWS Glue

AWS Glue is a serverless Apache Spark-based data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
AWS Glue consists of:

  • AWS Data Catalog, which is
    1. a central metadata repository discovers data in Amazon S3, Amazon RDS, Amazon Redshift, etc,
    2. an ETL engine that can automatically generate Scala or Python code, and can query and report using services like Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum;
    3. a flexible scheduler that handles dependency resolution, job monitoring, and retries;
  • AWS Glue DataBrew for cleaning and normalizing data with a visual interface;
  • AWS Glue Elastic Views, for combining and replicating data across multiple data stores.

When should I use AWS Glue Streaming and when should I use Amazon Kinesis Data Analytics?

Both AWS Glue and Amazon Kinesis Data Analytics can be used to process streaming data. AWS Glue is recommended when your use cases are primarily ETL and when you want to run jobs on a serverless Apache Spark-based platform. Amazon Kinesis Data Analytics is recommended when your use cases are primarily building sophisticated streaming applications to analyze streaming data in real time and when you want to run jobs on a serverless Apache Flink-based platform.

enables you to b.

When should I use AWS Glue and when should I use Amazon Kinesis Data Firehose?

Both AWS Glue and Amazon Kinesis Data Firehose can be used for streaming ETL. AWS Glue is recommended for complex ETL, including joining streams, and partitioning the output in Amazon S3 based on the data content. Amazon Kinesis Data Firehose is recommended when your use cases focus on data delivery and preparing data to be processed after it is delivered.

相关文章

网友评论

      本文标题:AWS Kinesis & Glue

      本文链接:https://www.haomeiwen.com/subject/ojhzultx.html