Special thanks to Doug Turnbull, Daniel Svonava, Atita Arora, Aarne Talman, Saurabh Rai, Andre Zayarni, Leo Boytsov, Pat Lasserre and Bob van Luijt for reading and commenting the drafts of this postJo Kristian Bergum recently wrote a massively influential X post: “The rise and fall of the vector database infrastructure...
This Fall I have co-taught a new course on LLMs and Generative AI at the University of Helsinki. It was the first course in its kind, with quite a large group of students.AI-Generated imageUnderstanding LLMs from the ground up is essential, especially as they dominate discussions in tech today. Beyond the allure of impressive demos, diving deeper...
Last week, I had a pleasure to teach the Week-6 topic: “Use cases and applications of LLMs”. Week-5 on RAG can be found here.We looked at multimodal LLMs, as a very interesting, and in many ways still an emerging trend in the LLM world, covering text, image, video and audio modalities (you can ask: “What do you hear in this video?”, for...
This Fall we are teaching a course on LLMs and Generative AI at the University of Helsinki, together with Aarne Talman (Accenture) and Jussi Kalgren (Silo.AI, now AMD).Screenshot of the PDF RAG streamlit appSyllabus:Week 1: Introduction to Generative AI and Large Language Models (LLM)Introduction to Large Language Models (LLMs) and their...
“Large Language Models are complex systems. So the output, the final weights of the neural network, is just one little part of the entire picture.”This is the quote of Alessandro from the episode we recorded at Berlin Buzzwords’24.I also tweeted (X’d?) about how alarming it is to see the downward trend in open-sourcing various components of these...
Another re-blog: this time about Lucene’s TokenFilter’s (originally published in 9 June 2014). For those into neural search from scratch, I also wrote this piece, that deals with embeddings on Lucene level.At the recent Berlin buzzwords conference talk on Apache Lucene 4 Robert Muir mentioned the Lucene’s internal testing library. This library is...
This re-blog will be a hard one, but I find such topics fascinating and intellectually stimulating. Here it goes, originally published 31 March 2014 on Blogger.Whenever you need to implement a query parser in Solr, you start by sub-classing the LuceneQParserPlugin:public class MyGroundShakingQueryParser extends...
This blog post was originally published on Blogger, 17 November 2014. I’m re-blogging it here on Medium, merely for fun and as a backup (although I don’t expect Blogger to go anywhere soon). If you find it useful in your work, it’ll be awesome!A colleague of mine has just returned from the AWS re:Invent and brought in all the excitement about new...
I’ve just released an episode with Sonam Pankaj. She works on EmbedAnything. We have recorded this episode at Berlin Buzzwords back in June, where I also got the chance to test my new audio recording gear (RØDE Wireless GO II).EmbedAnything is an infrastructure layer, that allows you to embed anything (different text formats, but also other...
Back in May I have announced, that I received an invitation from the Berlin Buzzwords org team to record live onsite.https://medium.com/media/17f56eed3b027320553c55b87443e9c3/hrefThe invite to do so actually came already for 2023, but due to family reasons I could not attend. It was a happy and an exciting moment to be able to finally come to...
In this new episode with Eric Pugh, co-founder and CEO at OSC (OpenSource Connections — these are the same folks behind Haystack search conference!) shows a cool demo with web sockets in Quepid and bulk upload feature for persisting ratings. He also talks about new development, involving an LLM rater called Judge Judy, which will allow you take...