Unnecessary Optimization in Rust: Hamming Distances, SIMD, and Auto-Vectorization
Related
More from Evan Schwartz
I wrote another post about Understanding the BM25 full text search algorithm and had initially included comparisons with two other algorithms. However, that post was already quite long so here are the brief comparisons between BM25, TF-IDF, and PostgreSQL's full text search. BM25 vs TF-IDFTF-IDF was the main model that was used prior to the...
A delicious (and somewhat blasphemous) mashup of two very different traditional foods: Chicago Italian beef sandwiches and Chinese soup dumplings.
BM25, or Best Match 25, is a widely used algorithm for full text search. It is the default in Lucene/Elasticsearch and SQLite, among others. Recently, it has become common to combine full text search and vector similarity search into "hybrid search". I wanted to understand how full text search works, and specifically BM25, so here is my attempt...
Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression 🤯.
I am reading Mara Bos' Rust Atomics and Locks. On the first pass, I didn't really grok memory ordering. So here's my attempt at understanding by explaining.