Metadata
https://muratbuffalo.blogspot.com/ (RSS)
This concludes our series on the use of time in distributed databases, where we explored how use of time in distributed systems evolved from a simple ordering mechanism to a sophisticated tool for coordination and performance optimization.A key takeaway is that time serves as a shared reference frame that enables nodes to make consistent...
This is part 4 of our "Use of Time in Distributed Databases" series. In this post, we explore how synchronized physical clocks enhance production database systems.SpannerGoogle's Spanner (OSDI'12) implemented a novel approach to handling time in distributed database systems through its TrueTime API. TrueTime API provides time as an interval that...
I recently came across the Occult paper (NSDI'17) during my series on "The Use of Time in Distributed Databases." I had high expectations, but my in-depth reading surfaced significant concerns about its contributions and claims. Let me share my analysis, as there are still many valuable lessons to learn from Occult about causality maintenance and...
This is part 3 of our "Use of Time in Distributed Databases" series. In this post, we explore how synchronized physical clocks enhance database systems, focusing on research and prototype databases. Discussion of time's role in production databases will follow in our next post.To begin, let's revisit the utility of synchronized clocks in...
This is part 2 of our "Use of Time in Distributed Databases" series. We talk about the use of logical clocks in databases in this post. We consider three different approaches:vector clocksdependency graph maintenanceepoch service In the upcoming posts we will allow in physical clocks for timestamping, so there is no (almost no) physical clocks...
Distributed systems are characterized by nodes executing concurrently with no shared state and no common clock. Coordination between nodes are needed to satisfy some correctness properties, but since coordination requires message communication there is a performance tradeoff preventing nodes from frequently communicating/coordinating.Timestamping...
This master's thesis at Lund University Sweden explores how CockroachDB's transactional performance can be improved by using tightly synchronized clocks. The paper addresses two questions: how to integrate high-precision clock synchronization into CockroachDB and the resulting impact on performance. Given the publicly available clock...
I can't believe we wasted another good year. It is time to reflect back on the best posts at Metadata blog in 2024. (I think you guys should tip me just because I didn't call this post "Metadata wrapped".)Distributed systems postsTransactional storage for geo-replicated systems(SOSP11): I like this paper because it asked the right questions, and...
The DDIA book is a great textbook, because it is not written as a textbook, but more of a guidebook. Textbooks are generally bland and boring. Textbooks that are written by professors even more so, because thoser are often written to impress other professors and to flaunt academic flair. Few textbooks take teaching as the primary goal.DDIA book...
Daily batch processes introduce significant latency, since input changes reflected in the output only after a day. For fast paced business, this is too slow. To reduce delays, stream processing occurs more frequently (e.g., every second) or continuously, where events are handled as they happen. In stream processing, a record is typically called...
I have been impressed by the usability of TLA-Web from Will Schultz. Recently I have been using it for my TLA+ modeling of MongoDB catalog protocols internally, and found it very useful to explore and understand behavior. This got me thinking that TLA-Web would be really useful when exploring and understanding an unfamiliar spec I picked up on...
[trigger warning: blood]I had my first blood draw in 13 years yesterday. The lengthy gap is not random. My last blood draw had gone horribly wrong.The last timeThat previous visit had been for a fasting blood draw. Until then, I'd never had issues with blood draw before. Nurses always complimented me on my veins. One of them said that I had...
This paper from CIDR'21 introduces the Deferred Action Framework (DAF). This framework aims to unify transaction control and data structure maintenance under multi-version concurrency control (MVCC) database systems, particularly for complex maintenance tasks like garbage collection and index cleanup.In MVCC systems, transactions and data...
Batch processing allows large-scale data transformations through bulk-synchronous processing. The simplicity of this approach allowed building reliable, scalable, maintainable applications with it. If you recall, "reliable-scalable-maintainable" was what we set out to learn when we began the DDIA book.This story of MapReduce starts when Google...
Incremental computation represents a transformative (!) approach to data processing. Instead of recomputing everything when your input changes slightly, incremental computation aims to reuse the original output and efficiently update the results. Efficiently means performing work proportional only to input and output changes.This paper introduces...
I attended the University at Buffalo Hacking event over the weekend. It was fun. There were 90+ projects, I judged 15 projects. There were some interesting talks as well. It was good to see youth energy. It feels good to teach next generation something.Another thing, GeoGuessr played as a group game under time pressure is a lot of fun. This may...
The chapter 9 of the Designing Data Intensive Applications (DDIA) book has the following 4 sections (which contain a total of 10 subsections). Consistency guaranteesLinearizabilityOrdering guaranteesDistributed transactions and consensusTMI (Too much info)The chapter tries to do too much. Almost an entire semester of distributed systems content...
This paper appeared in Sigmod'23.What?Auto-WLM is a machine learning based *automatic workload manager* currently used in production in Amazon Redshift.I thought this would be a machine learning paper, you know deep learning and stuff. But this paper turned out to be a practical/applied data systems paper. At its core, this paper is about...
This is a long chapter. It touches on so many things. Here is the table of contents.Faults and partial failuresUnreliable networksdetecting faultstimeouts and unbounded delayssync vs async networksUnreliable clocksmonotonic vs time-of-day clocksclock sync and accuracyrelying on sync clocksprocess pausesKnowledge truth and liestruth defined by...
We are continuing from the first part of our Chapter 7 review. Serializable isolation ensures that the final result of concurrent transactions is equivalent to if they had been run one at a time, without any concurrency. This eliminates any concurrency anomalies, since it ensures the transactions would behave as they would in a sequential...
Chapter 7 of the Designing Data Intensive Applications (DDIA) book discusses transactions, yay! Transactions in database systems group multiple operations into a logical unit, a box of operations if you will. They simplify error handling and help manage concurrency issues. See Gray-Reuters book introduction and fault-tolerance sections for the...
This is a special milestone: 700th post, after 14 years of blogging here.700 posts is a lot of blogging. But that comes down to 50 posts per year, which is one post a week, totally doable, right?If I can get another 14 years of blogging at this rate, I will get to 1400. That is more than the EWD documents in terms of the sheer number of posts,...
Ok, continuing on the SRDS day 1 post, I bring you SRDS day 2.Here are my summaries from the keynote, and from the talks for which I took some notes. Mahesh Balakrishnan's KeynoteMahesh's keynote was titled "An Hourglass Architecture for Distributed Systems". His talk focused on the evolution of his perspective on distributed systems research and...
This week, I was at the 43rd International Symposium on Reliable Distributed Systems (SRDS 2024) at Charlotte, NC. The conference center was at the UNC Charlotte, which has a large and beautiful campus.I was the Program Committee chair for SRDS'24 along with Silvia Bonomi. Marco Vieira and Bojan Cukic were the general co-chairs. A lot of work...
Chapter 6 of the Designing Data Intensive Applications (DDIA) book discusses partitioning, a key technique for scaling large datasets and high query throughput in distributed databases. By breaking data into smaller partitions, it can be distributed across multiple nodes in a shared-nothing cluster. This allows the storage and processing load to...
Continuing with our HPTS series. This is now the afternoon of second day.The first session was on HTAP and streaming, and the second one on caching. Session 7: HTAP and Streaming Who cares about HTAP? - Tianyu Li (MIT)Tianyu argued that while Hybrid Transactional/Analytical Processing (HTAP) showed great promise in 2014, it has failed to make a...
Continuing with our series. This is day 2, Tuesday morning. It had two session on Hardware. I wasn't exaggerating when I said hardware/software codesign was all the buzz at HPTS this year. It looks like future databases will be more tightly integrated with hardware capabilities and more responsive to user needs.You may have gotten a bit tired of...
This paper from SOSP 2011 describes a distributed storage system called Walter. Walter targets web applications that operate across multiple geographic sites, and it aims to balance consistency and latency in such systems by providing strong consistency within a site, and weaker consistency across sites.Parallel Snapshot Isolation (PSI)The paper...
This is part 2 of day 1 of HPTS'24. (You can tell I did some lisp programming back in the day, huh?) Here is the first part of day 1, you should check that out as well. There were 2 sessions each with 4 talks in the afternoon of day 2. After dinner, we had a gong show presentation on miscellaneous topics as well. Session 3: DBOSVirtual Memory: a...
Wow, what a week that was! The two days of HPTS (Monday and Tuesday this week) felt like a week to me. I learned a lot, and had a lot of good conversations, and even was able to squeeze in some beach walks in there. HPTS has been operating since 1985, convenening mostly every two years. It has been described as Davos for database systems. Pat...
Chapter 5 of the Designing Data Intensive Applications (DDIA) book discusses strategies and challenges in replicating data across distributed systems. I had covered the first part last week, here is the second part of leaderless replication.Leaderless replication abandons the concept of a leader node, and allows any replica to directly accept...
Chapter 5 of the Designing Data Intensive Applications (DDIA) book discusses strategies and challenges in replicating data across distributed systems.Replication is critical for ensuring data availability, fault tolerance, and scalability. One of the key challenges in replication is maintaining consistency across multiple replicas. Leader-based...
This paper appeared in CIDR23 and is from Meta (wow, this is the first time I used the new name without needing to mention it is in fact Facebook... wait.. goddammit). The paper talks about how they applied Raft to MySQL replication, and used the flexible quorums in the process.This is not a technically deep paper, but it was interesting to see a...
This second part of Chapter 4 of the Designing Data Intensive Applications (DDIA) book discusses methods of data flow in distributed systems, covering dataflow through databases, service calls, and asynchronous message passing.For databases, the process writing to the database encodes the data, and the reading process decodes it. We need both...
This paper recently appeared at ACM SIGOPS Operating Systems Review. It provides an overview of the shared log abstraction in distributed systems, particularly focusing on its application in State Machine Replication (SMR) and consensus protocols. The paper argues that this abstraction can simplify the design and implementation of distributed...
This first part of Chapter 4 of the Designing Data Intensive Applications (DDIA) book discusses the concepts of data encoding and evolution in data-intensive applications. As applications inevitably change over time, it's important to build systems that can adapt to these changes easily, a property referred to as evolvability (under...
As part of our zoom reading group (wow, 4.5 years old now), we discussed a paper that uses LLMs for automatic root cause analysis (RCA) for cloud incidents.This was a pretty straightforward application of LLMs. The proposed system employs an LLM to match incoming incidents to incident handlers based on their alert types, predicts the incident's...
This is Chapter 3, part 2 for the Designing Data Intensive Applications (DDIA) book. This focuses on storage and retrieval for OLAP databases. Analytics, data warehousing, star and snowflake schemasA data warehouse is a dedicated database designed for analytical queries. It houses a read-only replica of data from transactional systems within the...
This is Chapter 3, part 1 for the Designing Data Intensive Applications (DDIA) book. This part focuses on storage and retrieval for OLTP databases.Even if you won't be implementing a storage engine from scratch, it is still important to understand how databases handle storage and retrieval internally. This knowledge allows you to select and tune...
C. J. Date's Sigmod 1983 keynote, "Database Usability", was prescient. Usability is the most important thing to the customers. They care less about impressive benchmarks or clever algorithms, and more about whether they can operate and use a database efficiently to query, update, analyze, and persist their data with minimal headache. (BTW, does...
This paper is from Herlihy and Wing appeared in ACM Transactions on Programming Languages and Systems 1990. This is the canonical reference for the linearizability definition.I had not read this paper in detail before, so I thought it would be good to go to the source to see if there are additional delightful surprises in the original text....
We started reading this book as part of Alex Petrov's book club. We just got started, so you can join us, by joining the discord channel above. We meet Wednesday's 11am Eastern Time. Previously we had read transaction processing book by Grey and Reuters. This page links to my summaries of that book.Chp 1. Reliable, Scalable, and Maintainable...