Comparing 13 Rust Crates for Extracting Text from HTML

from blog Evan Schwartz, | ↗ original
Applications that run documents through LLMs or embedding models need to clean the text before feeding it into the model. I'm building a personalized content feed called Scour and was looking for a Rust crate to extract text from scraped HTML. I started off using a library that's used by a couple of LLM-related projects. However, while hunting a...