NULL BITMAP by Justin Jaffray

It is easier to imagine an end to computing than an end to SQL. This newsletter is where I will write periodic self-indulgent articles on topics in databases, with a focus on query languages, query planning, and transaction processing. This newsletter is an extension of my blog [justinjaffray.com](https://justinjaffray.com/) and some, but not all, posts will be mirrored between them.
https://buttondown.com/jaffray (RSS)
visit blog
The CVM Algorithm
11 Nov 2024 | original ↗

Everything you need to know about query planning can be understood from this query: SELECT * FROM xy WHERE y = 3 ORDER BY x Imagine we have two indexes, one ordered on y, and one ordered on x. Then two possible plans suggest themselves: Plan 1: scan the y index, restricting it to the values of y such that y = 3, then sort the result by x. Plan 2:...

The CVM Algorithm
11 Nov 2024 | original ↗

Everything you need to know about query planning can be understood from this query: SELECT * FROM xy WHERE y = 3 ORDER BY x Imagine we have two indexes, one ordered on y, and one ordered on x. Then two possible plans suggest themselves: Plan 1: scan the y index, restricting it to the values of y such that y = 3, then sort the result by x. Plan 2:...

How Many Grains of Sand are in a Pile
4 Nov 2024 | original ↗

“This code’s no good,” says Pittsford, pointing at the screen: def update_and_get_balance(account, amount): database.debit(account, amount) return database.get_balance(account) Sol, the author of the code, peers over his shoulder and squints. “What’s wrong with it? It adds the amount, then gets back the new balance.” Pittsford shakes his...

How Many Grains of Sand are in a Pile
4 Nov 2024 | original ↗

“This code’s no good,” says Pittsford, pointing at the screen: def update_and_get_balance(account, amount): database.debit(account, amount) return database.get_balance(account) Sol, the author of the code, peers over his shoulder and squints. “What’s wrong with it? It adds the amount, then gets back the new balance.” Pittsford shakes his...

Adventures in Probability
28 Oct 2024 | original ↗

I hope everyone had a good weekend. I went on a hike. It was great. Statistics One of my great school regrets, next to “taking three years of business classes,” is taking less statistics than I did, which was basically the bare minimum. It was, in fact the bare minimum: one probability class and one statistics class. In my defense, my probability...

Adventures in Probability
28 Oct 2024 | original ↗

I hope everyone had a good weekend. I went on a hike. It was great. Statistics One of my great school regrets, next to “taking three years of business classes,” is taking less statistics than I did, which was basically the bare minimum. It was, in fact the bare minimum: one probability class and one statistics class. In my defense, my probability...

Pushing Values to Zero
21 Oct 2024 | original ↗

There is a popular presentation of the exponential distribution that goes through the geometric distribution. The geometric distribution arises from flipping a weighted coin until you see a heads, and reporting how long it took. It can be simulated with this code: def geo(p): count = 1 while rand() < p: count++ return count The...

Pushing Values to Zero
21 Oct 2024 | original ↗

There is a popular presentation of the exponential distribution that goes through the geometric distribution. The geometric distribution arises from flipping a weighted coin until you see a heads, and reporting how long it took. It can be simulated with this code: def geo(p): count = 1 while rand() < p: count++ return count The...

Linearity in Query Processing
14 Oct 2024 | original ↗

I saw a fun post this week about interpreting colours as vectors and the way you can exploit that interpretation to nicely represent certain operations you might want to do on those colours. As the author notes, you can perform an operation on a colour via a matrix multiplication if and only if that operation is linear. Linear maps obey and One...

Linearity in Query Processing
14 Oct 2024 | original ↗

I saw a fun post this week about interpreting colours as vectors and the way you can exploit that interpretation to nicely represent certain operations you might want to do on those colours. As the author notes, you can perform an operation on a colour via a matrix multiplication if and only if that operation is linear. Linear maps obey and One...

The Relational Derivative Pt. 2: Groups are Good
7 Oct 2024 | original ↗

Last week we talked about the notion of the relational derivative and its application to enforcing integrity constraints in databases. The relational derivative is defined by And we can compute it by rearranging: We can use this to do things like incrementally update a query in response to new data, or to tell whether or not a query will change...

The Relational Derivative Pt. 2: Groups are Good
7 Oct 2024 | original ↗

Last week we talked about the notion of the relational derivative and its application to enforcing integrity constraints in databases. The relational derivative is defined by And we can compute it by rearranging: We can use this to do things like incrementally update a query in response to new data, or to tell whether or not a query will change...

Integrity Constraints and the Relational Derivative
30 Sept 2024 | original ↗

In a SQL database, you can set up a foreign key with REFERENCES: nullbitmap=# CREATE TABLE ab (a INT PRIMARY KEY, b INT); CREATE TABLE nullbitmap=# INSERT INTO ab VALUES (1, 10), (2, 20), (3, 30); INSERT 0 3 nullbitmap=# SELECT * FROM ab; a | b ---+---- 1 | 10 2 | 20 3 | 30 (3 rows) nullbitmap=# CREATE TABLE xa (x INT PRIMARY KEY, a INT...

Integrity Constraints and the Relational Derivative
30 Sept 2024 | original ↗

In a SQL database, you can set up a foreign key with REFERENCES: nullbitmap=# CREATE TABLE ab (a INT PRIMARY KEY, b INT); CREATE TABLE nullbitmap=# INSERT INTO ab VALUES (1, 10), (2, 20), (3, 30); INSERT 0 3 nullbitmap=# SELECT * FROM ab; a | b ---+---- 1 | 10 2 | 20 3 | 30 (3 rows) nullbitmap=# CREATE TABLE xa (x INT PRIMARY KEY, a INT...

The Two Machines
23 Sept 2024 | original ↗

There's a joke in my friend circle that asks "is it a database?" A startup, a program, a syscall, a person good with numbers, a person with a good memory. It's all very "is it a sandwich.” But it’s kind of true that it’s weird you can look at RocksDB and Snowflake and say “these are the same class of thing,” because they have very little...

The Two Machines
23 Sept 2024 | original ↗

There's a joke in my friend circle that asks "is it a database?" A startup, a program, a syscall, a person good with numbers, a person with a good memory. It's all very "is it a sandwich.” But it’s kind of true that it’s weird you can look at RocksDB and Snowflake and say “these are the same class of thing,” because they have very little...

Benchmarks That Aren't Your Friends
16 Sept 2024 | original ↗

We’ve now talked twice about an important dimension of a benchmark: the openness of the loop. While there’s more subtlety to it, if what you take away is: open-loop is better for measuring latency, and closed-loop is better for measuring throughput. then you’re not going to be in such a bad place. There’s maybe three main roles someone could find...

Benchmarks That Aren't Your Friends
16 Sept 2024 | original ↗

We’ve now talked twice about an important dimension of a benchmark: the openness of the loop. While there’s more subtlety to it, if what you take away is: open-loop is better for measuring latency, and closed-loop is better for measuring throughput. then you’re not going to be in such a bad place. There’s maybe three main roles someone could find...

Measuring Throughput
9 Sept 2024 | original ↗

Note: I'm trying out enabling comments. Behave! I reserve the right to disable them again for any reason including "the very idea of someone commenting made me anxious." We talked recently about why it’s easy to use a closed-loop benchmark to inappropriately measure latency, and why an open-loop benchmark is a better choice for this, generally....

Measuring Throughput
9 Sept 2024 | original ↗

Note: I'm trying out enabling comments. Behave! I reserve the right to disable them again for any reason including "the very idea of someone commenting made me anxious." We talked recently about why it’s easy to use a closed-loop benchmark to inappropriately measure latency, and why an open-loop benchmark is a better choice for this, generally....

SQL's Grammar Ambiguity
2 Sept 2024 | original ↗

I was going to write a longer, more involved post this week, but I wound up getting Covid and didn't have it in me to work through all the stuff I had to for that. So here is a smaller issue. I know of exactly one situation in which the grammar of SQL is "tricky" to parse. I use the word tricky because I've been out of the game too long to...

SQL's Grammar Ambiguity
2 Sept 2024 | original ↗

I was going to write a longer, more involved post this week, but I wound up getting Covid and didn't have it in me to work through all the stuff I had to for that. So here is a smaller issue. I know of exactly one situation in which the grammar of SQL is "tricky" to parse. I use the word tricky because I've been out of the game too long to...

Languages Without Abstraction
26 Aug 2024 | original ↗

Implementing something like a compiler, there is the understanding that we want different representations of a program for different purposes. This is why we have stuff like a “control-flow graph” or “SSA form.” Some kinds of analyses and transformations are easier in more abstract representations that have thrown away certain information. But...

Languages Without Abstraction
26 Aug 2024 | original ↗

Implementing something like a compiler, there is the understanding that we want different representations of a program for different purposes. This is why we have stuff like a “control-flow graph” or “SSA form.” Some kinds of analyses and transformations are easier in more abstract representations that have thrown away certain information. But...

The Closed-Loop Benchmark Trap
19 Aug 2024 | original ↗

Let's make a little mock database. It won't actually do anything but pretend to handle requests. I've had cause at work lately to write some Go so we're going to use Go. type Database struct { requests chan Request } func NewDatabase() Database { return Database{ requests: make(chan Request), } } const maxConcurrentRequests =...

The Closed-Loop Benchmark Trap
19 Aug 2024 | original ↗

Let's make a little mock database. It won't actually do anything but pretend to handle requests. I've had cause at work lately to write some Go so we're going to use Go. type Database struct { requests chan Request } func NewDatabase() Database { return Database{ requests: make(chan Request), } } const maxConcurrentRequests =...

Some Books I Like // One Year of NULL BITMAP
12 Aug 2024 | original ↗

This is the 52nd NULL BITMAP, which means I have been doing this for a whole year! I skipped one week before I decided that this was going to be a weekly thing, but besides that there has been a NULL BITMAP every Monday. If you have read any of them then thank you for partaking in this whole deal I am very grateful :) I wasn’t sure what an...

Some Books I Like // One Year of NULL BITMAP
12 Aug 2024 | original ↗

This is the 52nd NULL BITMAP, which means I have been doing this for a whole year! I skipped one week before I decided that this was going to be a weekly thing, but besides that there has been a NULL BITMAP every Monday. If you have read any of them then thank you for partaking in this whole deal I am very grateful :) I wasn’t sure what an...

Solving Rubik's Cubes with Computers
5 Aug 2024 | original ↗

The first place I got into programming was making games. You can probably dig up some games I made long ago if you try hard enough (do not). In high school, my focus on programming went over to implementing solvers for various Rubik's Cube like puzzles. Since this is the first thing people wonder when I mention stuff like this, I was okay (this...

Solving Rubik's Cubes with Computers
5 Aug 2024 | original ↗

The first place I got into programming was making games. You can probably dig up some games I made long ago if you try hard enough (do not). In high school, my focus on programming went over to implementing solvers for various Rubik's Cube like puzzles. Since this is the first thing people wonder when I mention stuff like this, I was okay (this...

What do we want? Negation! When do we want it? Not later!
29 Jul 2024 | original ↗

Hello I am traveling this weekend! I am home in Ontario. I wanted to share this picture I took of this guy feeding a bear. Bear eating an object out of a guy’s hand Don’t feed bears! This is not an endorsement of this behaviour! Quite the opposite! We love Datalog here. We love it! We're fiends for it. Datalog is very elemental and that makes it...

What do we want? Negation! When do we want it? Not later!
29 Jul 2024 | original ↗

Hello I am traveling this weekend! I am home in Ontario. I wanted to share this picture I took of this guy feeding a bear. Bear eating an object out of a guy’s hand Don’t feed bears! This is not an endorsement of this behaviour! Quite the opposite! We love Datalog here. We love it! We're fiends for it. Datalog is very elemental and that makes it...

A Query Planning Guideline
22 Jul 2024 | original ↗

A number of readers last week reached out to direct my attention to What if a SQL Statement Returned a Database which is a much more thorough and well-articulated explanation of the core idea—thank you! A Query Planning Guideline Say I want to construct a plan for the following query: SELECT a, sum(b) FROM (SELECT a, a + 1 AS x, b FROM ab)...

A Query Planning Guideline
22 Jul 2024 | original ↗

A number of readers last week reached out to direct my attention to What if a SQL Statement Returned a Database which is a much more thorough and well-articulated explanation of the core idea—thank you! A Query Planning Guideline Say I want to construct a plan for the following query: SELECT a, sum(b) FROM (SELECT a, a + 1 AS x, b FROM ab)...

Multiple Returns in SQL
15 Jul 2024 | original ↗

Podcast Appearance I was on the Thinking About Computers podcast recently! We had a fun chat on the topic of writing technical content and research. Personally I can't stand hearing my own voice so I haven't listened to it but I enjoyed participating. They're also on podcast platforms and they have lots of good episodes! Multiple Returns in SQL...

Multiple Returns in SQL
15 Jul 2024 | original ↗

Podcast Appearance I was on the Thinking About Computers podcast recently! We had a fun chat on the topic of writing technical content and research. Personally I can't stand hearing my own voice so I haven't listened to it but I enjoyed participating. They're also on podcast platforms and they have lots of good episodes! Multiple Returns in SQL...

Deriving Functional Dependencies for Selections
8 Jul 2024 | original ↗

Over the last two weeks you were babysat by the pre-scheduling feature of Buttondown, since I was away in Japan without my laptop. Here's me in Japan: I had a nice trip. To hear more about what I thought, ask your nearest "white guy who likes video games" how he felt about going to Japan. Anyway, I'm back and writing these weekly again! Back to...

Deriving Functional Dependencies for Selections
8 Jul 2024 | original ↗

Over the last two weeks you were babysat by the pre-scheduling feature of Buttondown, since I was away in Japan without my laptop. Here's me in Japan: I had a nice trip. To hear more about what I thought, ask your nearest "white guy who likes video games" how he felt about going to Japan. Anyway, I'm back and writing these weekly again! Back to...

To Understand Correctness, You Must First Understand Incorrectness
1 Jul 2024 | original ↗

Recently I was in a Discord channel where someone wrote something akin to “I have a question about linearizability” which had an attached thread with 80 replies. This is what game designers call "environmental storytelling." Anyway, this exchange got me thinking again about why this is a famously confusing topic despite being so fundamental. I...

To Understand Correctness, You Must First Understand Incorrectness
1 Jul 2024 | original ↗

Recently I was in a Discord channel where someone wrote something akin to “I have a question about linearizability” which had an attached thread with 80 replies. This is what game designers call "environmental storytelling." Anyway, this exchange got me thinking again about why this is a famously confusing topic despite being so fundamental. I...

Physical Properties #4
24 Jun 2024 | original ↗

Previous parts of this series: Physical Properties #1 Physical Properties #2 Physical Properties #3 Relational query planning makes a distinction between "logical properties" and "physical properties." Logical properties are the attributes of a result set that are essential to it being considered correct, such as "which rows are present."...

Physical Properties #4
24 Jun 2024 | original ↗

Previous parts of this series: Physical Properties #1 Physical Properties #2 Physical Properties #3 Relational query planning makes a distinction between "logical properties" and "physical properties." Logical properties are the attributes of a result set that are essential to it being considered correct, such as "which rows are present."...

In Codd we Trust (or not)
17 Jun 2024 | original ↗

Get in loser, we’re doing another Codd philosophizing session. Codd’s paper introducing the relational model opens up like so: Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). Ted Codd (illus. Pittsford Travers) This is a very strong statement. And why...

In Codd we Trust (or not)
17 Jun 2024 | original ↗

Get in loser, we’re doing another Codd philosophizing session. Codd’s paper introducing the relational model opens up like so: Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). Ted Codd (illus. Pittsford Travers) This is a very strong statement. And why...

NULL BITMAP Builds a Database #2: Enter the Memtable
10 Jun 2024 | original ↗

I didn't realize how hard it would be to bin-pack episodes of this thing into the roughly ~750 word chunks that I try to keep issues of this newsletter at. Lots of things are like, oh that's small, so I'm going to stuff it in with something else, but then that something else is sort of big and so it's iffy to cram something else in there too....

NULL BITMAP Builds a Database #2: Enter the Memtable
10 Jun 2024 | original ↗

I didn't realize how hard it would be to bin-pack episodes of this thing into the roughly ~750 word chunks that I try to keep issues of this newsletter at. Lots of things are like, oh that's small, so I'm going to stuff it in with something else, but then that something else is sort of big and so it's iffy to cram something else in there too....

TPC-See?
3 Jun 2024 | original ↗

One thing about concurrency control (“isolation”) in a transactional database is that it incurs costs, and there’s broadly two kinds of such costs. The first such cost is the essential sacrifice of providing isolation. In a general transactional model, two transactions simply cannot concurrently operate mutably on the same piece of data. There’s...

TPC-See?
3 Jun 2024 | original ↗

One thing about concurrency control (“isolation”) in a transactional database is that it incurs costs, and there’s broadly two kinds of such costs. The first such cost is the essential sacrifice of providing isolation. In a general transactional model, two transactions simply cannot concurrently operate mutably on the same piece of data. There’s...

Avoiding Cross Products with the Query Graph
27 May 2024 | original ↗

To compute the join of two relations, we find all pairs of rows their rows which have the same value for any columns with the same name. This is sometimes called the natural join in the software world, and in the join processing world is often just called the join. fn main() { let r = Relation::new(["a", "b"]) .row([1, 2]) ...

Avoiding Cross Products with the Query Graph
27 May 2024 | original ↗

To compute the join of two relations, we find all pairs of rows their rows which have the same value for any columns with the same name. This is sometimes called the natural join in the software world, and in the join processing world is often just called the join. fn main() { let r = Relation::new(["a", "b"]) .row([1, 2]) ...

NULL BITMAP Builds a Database #1: The Log is Literally the Database
20 May 2024 | original ↗

It is time to end the tyranny of people becoming interested in database implementation and building a BTree. Let us turn to the succor of immutable storage. Today we are starting a new series. I know I have several ongoing series. But writing for different series requires different states of mind and I make the rules here. We are going to make an...

NULL BITMAP Builds a Database #1: The Log is Literally the Database
20 May 2024 | original ↗

It is time to end the tyranny of people becoming interested in database implementation and building a BTree. Let us turn to the succor of immutable storage. Today we are starting a new series. I know I have several ongoing series. But writing for different series requires different states of mind and I make the rules here. We are going to make an...

A Card Counting Trick
13 May 2024 | original ↗

I used to call this thing a “game” but my friend Kevin (who is not the same Kevin as last week but who is also a mathematician) kept telling me it’s really more of a “trick.” So, it’s a trick, in the canon, I guess. I was interning at a finance firm where we (the interns) were asked to “teach everyone something.” I didn’t participate (I didn’t...

A Card Counting Trick
13 May 2024 | original ↗

I used to call this thing a “game” but my friend Kevin (who is not the same Kevin as last week but who is also a mathematician) kept telling me it’s really more of a “trick.” So, it’s a trick, in the canon, I guess. I was interning at a finance firm where we (the interns) were asked to “teach everyone something.” I didn’t participate (I didn’t...

The Official NULL BITMAP Glossary: Graph Theory Edition
6 May 2024 | original ↗

Last week a post of mine made it to God’s favourite website and one thing I was struck by was how many people disagreed about basic graph theory terminology. Some people also seemed to complain about the lack of an introduction. To be clear: I have no intention of changing the expositional style of NULL BITMAP, at least explicitly or consciously,...

The Official NULL BITMAP Glossary: Graph Theory Edition
6 May 2024 | original ↗

Last week a post of mine made it to God’s favourite website and one thing I was struck by was how many people disagreed about basic graph theory terminology. Some people also seemed to complain about the lack of an introduction. To be clear: I have no intention of changing the expositional style of NULL BITMAP, at least explicitly or consciously,...

Not all Graphs are Trees
29 Apr 2024 | original ↗

It's pretty easy to imagine how to represent relational algebra expressions as a tree—they are already structurally rooted trees where each operator has its inputs as children. Even in a language like Rust, which has not yet implemented mutable aliasing (an oft-requested feature stuck at the RFC stage) we can implement this easily: enum RelExpr...

Not all Graphs are Trees
29 Apr 2024 | original ↗

It's pretty easy to imagine how to represent relational algebra expressions as a tree—they are already structurally rooted trees where each operator has its inputs as children. Even in a language like Rust, which has not yet implemented mutable aliasing (an oft-requested feature stuck at the RFC stage) we can implement this easily: enum RelExpr...

Heath's Theorem
22 Apr 2024 | original ↗

We don't have all that much in the world of relational query planning that could be considered a "fundamental theorem," as in like, some central idea that everything else rests on. This is partly because we don't really have a lot of "theorems" in the first place, outside of more involved stuff like join bounds and join ordering algorithms, there...

Heath's Theorem
22 Apr 2024 | original ↗

We don't have all that much in the world of relational query planning that could be considered a "fundamental theorem," as in like, some central idea that everything else rests on. This is partly because we don't really have a lot of "theorems" in the first place, outside of more involved stuff like join bounds and join ordering algorithms, there...

The Geometry of SQL
15 Apr 2024 | original ↗

Today I want to talk about a way to think about some relational algebra operations. First, let's start with this relation: This is just a handful of games that appeared on a handful of systems, but let's pretend this was a complete list of games. One way to visualize this data is as a set of points in a 3-dimensional space (the third dimension...

↑ these items are from RSS. Visit the blog itself at https://buttondown.com/jaffray to find other articles and to appreciate the author's digital home.