Metadata

On distributed systems broadly defined and other curiosities. The opinions on this site are my own.
https://muratbuffalo.blogspot.com/ (RSS)
visit blog
UB Hacking 2024
11 Nov 2024 | original ↗

I attended the University at Buffalo Hacking event over the weekend. It was fun. There were 90+ projects, I judged 15 projects. There were some interesting talks as well. It was good to see youth energy. It feels good to teach next generation something.Another thing, GeoGuessr played as a group game under time pressure is a lot of fun. This may...

DDIA: Chp 9. Consistency and Consensus
25 Oct 2024 | original ↗

The chapter 9 of the Designing Data Intensive Applications (DDIA) book has the following 4 sections (which contain a total of 10 subsections). Consistency guaranteesLinearizabilityOrdering guaranteesDistributed transactions and consensusTMI (Too much info)The chapter tries to do too much. Almost an entire semester of distributed systems content...

Auto-WLM: machine learning enhanced workload management in Amazon Redshift
16 Oct 2024 | original ↗

This paper appeared in Sigmod'23.What?Auto-WLM is a machine learning based *automatic workload manager* currently used in production in Amazon Redshift.I thought this would be a machine learning paper, you know deep learning and stuff. But this paper turned out to be a practical/applied data systems paper. At its core, this paper is about...

DDIA: Chp 8. The Trouble with Distributed Systems
12 Oct 2024 | original ↗

This is a long chapter. It touches on so many things. Here is the table of contents.Faults and partial failuresUnreliable networksdetecting faultstimeouts and unbounded delayssync vs async networksUnreliable clocksmonotonic vs time-of-day clocksclock sync and accuracyrelying on sync clocksprocess pausesKnowledge truth and liestruth defined by...

DDIA: Chp 7. Transactions (Part 2): Serializability
9 Oct 2024 | original ↗

We are continuing from the first part of our Chapter 7 review. Serializable isolation ensures that the final result of concurrent transactions is equivalent to if they had been run one at a time, without any concurrency. This eliminates any concurrency anomalies, since it ensures the transactions would behave as they would in a sequential...

DDIA: Chp 7. Transactions (Part 1)
8 Oct 2024 | original ↗

Chapter 7 of the Designing Data Intensive Applications (DDIA) book discusses transactions, yay! Transactions in database systems group multiple operations into a logical unit, a box of operations if you will. They simplify error handling and help manage concurrency issues. See Gray-Reuters book introduction and fault-tolerance sections for the...

700
4 Oct 2024 | original ↗

This is a special milestone: 700th post, after 14 years of blogging here.700 posts is a lot of blogging. But that comes down to 50 posts per year, which is one post a week, totally doable, right?If I can get another 14 years of blogging at this rate, I will get to 1400. That is more than the EWD documents in terms of the sheer number of posts,...

SRDS Day 2
4 Oct 2024 | original ↗

Ok, continuing on the SRDS day 1 post, I bring you SRDS day 2.Here are my summaries from the keynote, and from the talks for which I took some notes. Mahesh Balakrishnan's KeynoteMahesh's keynote was titled "An Hourglass Architecture for Distributed Systems". His talk focused on the evolution of his perspective on distributed systems research and...

SRDS Day 1
4 Oct 2024 | original ↗

This week, I was at the 43rd International Symposium on Reliable Distributed Systems (SRDS 2024) at Charlotte, NC. The conference center was at the UNC Charlotte, which has a large and beautiful campus.I was the Program Committee chair for SRDS'24 along with Silvia Bonomi. Marco Vieira and Bojan Cukic were the general co-chairs. A lot of work...

DDIA: Chp 6. Partitioning
25 Sept 2024 | original ↗

Chapter 6 of the Designing Data Intensive Applications (DDIA) book discusses partitioning, a key technique for scaling large datasets and high query throughput in distributed databases. By breaking data into smaller partitions, it can be distributed across multiple nodes in a shared-nothing cluster. This allows the storage and processing load to...

HPTS day 2, part 2
24 Sept 2024 | original ↗

Continuing with our HPTS series. This is now the afternoon of second day.The first session was on HTAP and streaming, and the second one on caching. Session 7: HTAP and Streaming Who cares about HTAP? - Tianyu Li (MIT)Tianyu argued that while Hybrid Transactional/Analytical Processing (HTAP) showed great promise in 2014, it has failed to make a...

HPTS day 2, part 1
23 Sept 2024 | original ↗

Continuing with our series. This is day 2, Tuesday morning. It had two session on Hardware. I wasn't exaggerating when I said hardware/software codesign was all the buzz at HPTS this year. It looks like future databases will be more tightly integrated with hardware capabilities and more responsive to user needs.You may have gotten a bit tired of...

Transactional storage for geo-replicated systems
23 Sept 2024 | original ↗

This paper from SOSP 2011 describes a distributed storage system called Walter. Walter targets web applications that operate across multiple geographic sites, and it aims to balance consistency and latency in such systems by providing strong consistency within a site, and weaker consistency across sites.Parallel Snapshot Isolation (PSI)The paper...

HPTS'24 day 1, part 2
20 Sept 2024 | original ↗

This is part 2 of day 1 of HPTS'24. (You can tell I did some lisp programming back in the day, huh?) Here is the first part of day 1, you should check that out as well. There were 2 sessions each with 4 talks in the afternoon of day 2. After dinner, we had a gong show presentation on miscellaneous topics as well. Session 3: DBOSVirtual Memory: a...

HPTS'24 Day 1, part 1
20 Sept 2024 | original ↗

Wow, what a week that was! The two days of HPTS (Monday and Tuesday this week) felt like a week to me. I learned a lot, and had a lot of good conversations, and even was able to squeeze in some beach walks in there. HPTS has been operating since 1985, convenening mostly every two years. It has been described as Davos for database systems. Pat...

DDIA: Chp 5. Replication (Part 2)
19 Sept 2024 | original ↗

Chapter 5 of the Designing Data Intensive Applications (DDIA) book discusses strategies and challenges in replicating data across distributed systems. I had covered the first part last week, here is the second part of leaderless replication.Leaderless replication abandons the concept of a leader node, and allows any replica to directly accept...

DDIA: Chp 5. Replication (Part 1)
12 Sept 2024 | original ↗

Chapter 5 of the Designing Data Intensive Applications (DDIA) book discusses strategies and challenges in replicating data across distributed systems.Replication is critical for ensuring data availability, fault tolerance, and scalability. One of the key challenges in replication is maintaining consistency across multiple replicas. Leader-based...

FlexiRaft: Flexible Quorums with Raft
6 Sept 2024 | original ↗

This paper appeared in CIDR23 and is from Meta (wow, this is the first time I used the new name without needing to mention it is in fact Facebook... wait.. goddammit). The paper talks about how they applied Raft to MySQL replication, and used the flexible quorums in the process.This is not a technically deep paper, but it was interesting to see a...

DDIA: Chp 4. Encoding and Evolution (Part 2)
4 Sept 2024 | original ↗

This second part of Chapter 4 of the Designing Data Intensive Applications (DDIA) book discusses methods of data flow in distributed systems, covering dataflow through databases, service calls, and asynchronous message passing.For databases, the process writing to the database encodes the data, and the reading process decodes it. We need both...

Taming Consensus in the Wild (with the Shared Log Abstraction)
31 Aug 2024 | original ↗

This paper recently appeared at ACM SIGOPS Operating Systems Review. It provides an overview of the shared log abstraction in distributed systems, particularly focusing on its application in State Machine Replication (SMR) and consensus protocols. The paper argues that this abstraction can simplify the design and implementation of distributed...

DDIA: Chp 4. Encoding and Evolution (Part 1)
28 Aug 2024 | original ↗

This first part of Chapter 4 of the Designing Data Intensive Applications (DDIA) book discusses the concepts of data encoding and evolution in data-intensive applications. As applications inevitably change over time, it's important to build systems that can adapt to these changes easily, a property referred to as evolvability (under...

Looming Liability Machines (LLMs)
24 Aug 2024 | original ↗

As part of our zoom reading group (wow, 4.5 years old now), we discussed a paper that uses LLMs for automatic root cause analysis (RCA) for cloud incidents.This was a pretty straightforward application of LLMs. The proposed system employs an LLM to match incoming incidents to incident handlers based on their alert types, predicts the incident's...

DDIA: Chp 3. Storage and Retrieval (Part 2)
21 Aug 2024 | original ↗

This is Chapter 3, part 2 for the Designing Data Intensive Applications (DDIA) book. This focuses on storage and retrieval for OLAP databases. Analytics, data warehousing, star and snowflake schemasA data warehouse is a dedicated database designed for analytical queries. It houses a read-only replica of data from transactional systems within the...

DDIA: Chp 3. Storage and Retrieval (Part 1)
20 Aug 2024 | original ↗

This is Chapter 3, part 1 for the Designing Data Intensive Applications (DDIA) book. This part focuses on storage and retrieval for OLTP databases.Even if you won't be implementing a storage engine from scratch, it is still important to understand how databases handle storage and retrieval internally. This knowledge allows you to select and tune...

Making database systems usable
20 Aug 2024 | original ↗

C. J. Date's Sigmod 1983 keynote, "Database Usability", was prescient. Usability is the most important thing to the customers. They care less about impressive benchmarks or clever algorithms, and more about whether they can operate and use a database efficiently to query, update, analyze, and persist their data with minimal headache. (BTW, does...

Linearizability: A Correctness Condition for Concurrent Objects
9 Aug 2024 | original ↗

This paper is from Herlihy and Wing appeared in ACM Transactions on Programming Languages and Systems 1990. This is the canonical reference for the linearizability definition.I had not read this paper in detail before, so I thought it would be good to go to the source to see if there are additional delightful surprises in the original text....

Designing Data Intensive Applications (DDIA) Book
6 Aug 2024 | original ↗

We started reading this book as part of Alex Petrov's book club. We just got started, so you can join us, by joining the discord channel above. We meet Wednesday's 11am Eastern Time. Previously we had read transaction processing book by Grey and Reuters. This page links to my summaries of that book.Chp 1. Reliable, Scalable, and Maintainable...

↑ these items are from RSS. Visit the blog itself at https://muratbuffalo.blogspot.com/ to find other articles and to appreciate the author's digital home.