美文网首页
[转]大数据和分布式经典论文汇总

[转]大数据和分布式经典论文汇总

作者: 贺大伟 | 来源:发表于2019-02-28 14:52 被阅读4次

    下面论文均为大数据和分布式比较经典的论文,包括:CAP、BASE、2PC、一致性协议、一致性哈希、逻辑时钟、Leases 等。如果大家还有比较好的论文,欢迎在下面评论。

    文章目录

    1 分布式理论

    2 Google

    3 Amazon

    4 Facebook

    5 Streaming

    6 Microsoft

    7 Apache Spark

    8 Apache Hadoop

    9 Apache Flink

    10 Apache ZooKeeper

    11 Apache Mesos

    12 Apache Kafka

    13 KV Database

    14 Schedulers

    分布式理论

    Time, Clocks, and the Ordering of Events in a Distributed System

    (Paxos) The Part-Time Parliament

    Paxos Made Simple

    Paxos Made Practical

    Paxos Made Live - An Engineering Perspective

    Revisiting the Paxos algorithm

    Distributed Snapshots: Determining Global States of Distributed Systems

    Reaching Agreement in the Presence of Faults

    The Byzantine General Problem

    (CAP) Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

    (2PC) Concurrency Control and Recovery in Database Systems

    BASE: An Acid Alternative

    An Overview of Clock Synchronization

    Epidemic Algorithms for Replicated Database Maintenance

    Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency

    Weighted Voting for Replicated Data

    A Quorum-Consensus Replication Method for Abstract Data Types

    Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

    (Raft) In Search of an Understandable Consensus Algorithm

    Google

    The Google File System

    MapReduce: Simplified Data Processing on Large Clusters

    Bigtable: A Distributed Storage System for Structured Data

    The Chubby lock service for loosely-coupled distributed systems

    Large-scale Incremental Processing Using Distributed Transactions and Notifications

    Dremel: Interactive Analysis of Web-Scale Datasets

    Omega: flexible, scalable schedulers for large compute clusters

    MillWheel: Fault-Tolerant Stream Processing at Internet Scale

    Large-scale cluster management at Google with Borg

    Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

    Percolator: Large-scale Incremental Processing Using Distributed Transactions and Notifications

    Spanner: Google's Globally-Distributed Database

    F1: A Distributed SQL Database That Scales

    Pregel: A System for Large-Scale Graph Processing

    The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing

    Amazon

    Dynamo: Amazon's Highly Available Key-value Store

    Facebook

    Cassandra - A Decentralized Structured Storage System

    Hive - A Warehousing Solution Over a Map-Reduce Framework

    Riffle: Optimized Shuffle Service for Large-Scale Data Analytics

    Streaming

    S4: Distributed Stream Computing Platform

    Microsoft

    Schema-Agnostic Indexing with Azure DocumentDB

    Apache Spark

    Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

    Spark: Cluster Computing with Working Sets

    GraphX: Graph Processing in a Distributed Dataflow Framework

    MLlib: Machine Learning in Apache Spark

    Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling

    Shark: SQL and Rich Analytics at Scale

    Spark SQL: Relational Data Processing in Spark

    Discretized Streams: Fault-Tolerant Streaming Computation at Scale

    Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

    Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark

    Apache Hadoop

    The Hadoop Distributed File System

    Apache Hadoop YARN: Yet Another Resource Negotiator

    Apache Flink

    Apache Flink™: Stream and Batch Processing in a Single Engine

    Lightweight Asynchronous Snapshots for Distributed Dataflows

    Apache ZooKeeper

    ZooKeeper's atomic broadcast protocol: Theory and practice

    ZooKeeper: Wait-free coordination for Internet-scale systems

    Apache Mesos

    Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

    Apache Kafka

    Kafka: a Distributed Messaging System for Log Processing

    KV Database

    Serving Large-scale Batch Computed Data with Project Voldemort

    Schedulers

    Column-Stores vs. Row-Stores: How Different Are They

    Really?

    相关文章

      网友评论

          本文标题:[转]大数据和分布式经典论文汇总

          本文链接:https://www.haomeiwen.com/subject/vgghuqtx.html