CSCI 8980
Spring 2018

Class Information
Lecture Schedule and Notes
Reading List of Papers
Coursework Submission (Moodle)
Grades (Moodle)
Class Forum (Moodle)
Library Webpage
Examples
Useful Resources
CSCI 4061: References
Large-scale Distributed Systems

CSci 8980 - Spring 2018


Basic Concepts
  • Time, clocks, and the ordering of events in a distributed system,  Leslie Lamport,   Communications of the ACM, July 1978
  • Timestamps in the Message Passing System that Preserve Partial Ordering,  J. Fidge, Proceedings of the 11th Australian Computer Science Conference, 1991, pp. 56-78.

 

Consensus and Agreement Protocols

  • Paxos made simple by Leslie Lamport
  • Viewstamped Replication,  Oki and Liskov
  • Vierwstampled Replication Revisited by Liskov and Cowling
  • In Search of an Understandable Consensus Algorithm  by Deigo Ongaro and John Ousterhout  (Raft protocol)
  • Paxos vs. Viewstampled Replication vs. Zab by van Renesse, Schiper, and Schneider
  • Is there more Consensus in Egalitarian Parlaiaments
  • Mastering Agreement Problems in Distributed Systems, Raynal and Singhal, IEEE Software, July/August 2001.

 

Group Communication Models  and Reliable Broadcast

  • Reliable Communication in the Presence of Failures, Ken Birman and Thomas Joseph, ACM TOCS 1987.
  • Survey of Broadcast and Multicast Protocols in Distributed Systems
  • The Process Group Approach to Reliable  Distributed Computing, Communications of the ACM, December 1993.
  • Mencius: Building efficient state machiens for WANS

 

 

Publish-Subscribe Models and Message Brokers

  • Many faces of Publish/Subscribe

  • Advanced Message Queuing Protocol

  • Kafka: Distributed Messaging System for Log Processing

  • RabbitMQ in Action

  • Presentation on RabbitMQ by Matthew Sackman

  • RabbitMQ Performance Evaluation

  • RabbitMQ Presentation by Alexis Richardson

  • Kafka-vs-RabbitMQ

  • Wormhole: Reliable Pub-Sub to Support Geo-Replicated Internet Services

  • Thialfi: A cllientnotification service for Internet scale applications

  • Scalable Consistency Maintenance in Content Distribution Networks using Cooperative Leases, IEEE TKDE July/August 2003.

 

 

Consistency Models
  • Evnetually Consistent Werner Vogels
  • Jungle of Consistenecy Criteria  Raynal and Mizuno

 

 

Replication Management and Group Configuration Management

  • Implementing fault-tolerant services using the state machine approach: a tutorial, Fred Schneider,  ACM Computing Surveys 1990.

  • Classification of Update Methods for Replicated Data

  • Rambo  - Group Configuration Management

  • SmartMerge

 


Concurrency Control

  • Concurrency Control in Distributed Database Systems, Philip A. Bernstein   Nathan Goodman, ACM Computing Surveys 1981

  • The Optimistic Methods for Concurrency Control by Kung and Robinson

  • Making Snapshot Isolation Serializable

  • Lazy Consistency using Loosely Synchronized Clocks

  • Critique of ANSI SQL Isolation Levels

  • Efficient Optimist Concurrency Control using Loosely Synchronized Clocks


Cloud  File  Systems and Key-value based Data Storage Systems

  • Google File System, Sanjay Ghemaway, Howard Gobioff, and Shun-Tak Leung, SOSP'03.

  • Hadoop Distributed File System

  • Data Management Projects at Google, Wilson Hsieh, Jayant Madhavan, Rob Pike, SIGMOD 2006.

  • Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Proceedings of OSDI 2006, Seattle, WA, 2006.

  • Introduction to Hadoop/HBase

  • Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Ion Stoica et al SIGCOMM'01 2001.

  • Cassandra: A Decentralized Structured Storage

  • Dynamo: Amazon's Highly Available Key-Value Store

  • Spark

  • Hadoop and Spark software stacks

  • Tao: Facebook's Distributed Data Store for Social Graph

Monitoring and Coordination in Large-scale Systems

  •  Zookeeper: wait-free coordination for Internet-scale systems

  • The Chubby lock service f0r loosely-coupled distributed systems

Parallel  Computing Models
  • MapReduce
  • Distributed GraphLab
  • Pregel
  • Trinity.
  • Resilient Distributed Datsets (RDD)   (Spark)

Geo-replication and Transactional Systems

  • Megastore: Providing Scalable, Highly Available  Storage for Interactive Services

  • PNUTS: Yahoo!'s Hosted Data Serving Platform

  • Sinfornia: A new paradigm for building scalable distributed systems

  • Don't Settle for Eventual: Scalable Causasl Consistency for Wide-Area Storage with COPS

  • Transactional storage for geo-replicated systems

  • Salt: Combining ACID and BASE Transacitons

  • Lessons from Giant-Scale Services

  • Listed below are papers utiliziing phyiscal clock synchronization for concurrency control

  • Spanner: Google's Globally Distributed Systems

  • GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks


Event stream Processing Systems

  • Storm and Heron

  • Borealis

  • S4

  • Konark

  • Papers on reliable stream processing


 

Peer-to-Peer Systems and Epidemic Communication

  • A survey of robust P2P networks

  • Papers on BitTorrent

  • Gossip based speeer sampling

  • Efficient and Adaptive Epidemic Style Protocols for Reliable and Scalable  Multicast, I. Gupta, Anne-Marie Kermarrec, A. Ganesh. IEEE TPDS July 2006.