Comparing full text search algorithms: BM25, TF-IDF, and Postgres
Related
More from Evan Schwartz
If you're developing an application and find yourself running a benchmark whose results are measured in nanoseconds... you should probably stop and get back to more important tasks. But here we are. I'm using binary vector embeddings to build Scour, a service that scours noisy feeds for content related to your interests. Scour uses the Hamming...
A delicious (and somewhat blasphemous) mashup of two very different traditional foods: Chicago Italian beef sandwiches and Chinese soup dumplings.
BM25, or Best Match 25, is a widely used algorithm for full text search. It is the default in Lucene/Elasticsearch and SQLite, among others. Recently, it has become common to combine full text search and vector similarity search into "hybrid search". I wanted to understand how full text search works, and specifically BM25, so here is my attempt...
Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression 🤯.
I am reading Mara Bos' Rust Atomics and Locks. On the first pass, I didn't really grok memory ordering. So here's my attempt at understanding by explaining.