Multimodality and Large Multimodal Models (LMMs)

from blog Chip Huyen, 10 Oct 2023 | ↗ original

For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read, talk, and see. We listen to music to relax and watch out for strange noises to...

This is a short summary. ↗ Open original to view full content

Everything I've learned so far about running local LLMs

null program | original ↗

What's new with ML in production

Vicki Boykis | original ↗

Language Models are Illiterate

matt.sh | original ↗

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

LLMs in the middle: Content aware client-side filtering

Karim Jedda | original ↗

Automatically classifying the content of sound files using ML

Get Info | original ↗

A Tale of Two Technologies: Why Large Language Models are the Future and the Metaverse Isn't

Jaz's Blog | original ↗

The risks of OpenAI's Whisper audio transcription model

Baldur Bjarnason's Notes on the Web | original ↗

How I'm using AI as a technical writer

passo.uno | original ↗

Hallucinating with art models

Monica Dinculescu | original ↗

More from Chip Huyen

Common pitfalls when building generative AI applications

16 Jan 2025 | original ↗

As we’re still in the early days of building applications with foundation models, it’s normal to make mistakes. This is a quick note with examples of some of the most common pitfalls that I’ve seen, both from public case studies and from my personal experience. Because these pitfalls are common, if you’ve worked on any AI product, you’ve probably...

Agents

7 Jan 2025 | original ↗

Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines the field of AI research as “the study and design of rational agents.” The unprecedented capabilities of foundation models have opened the door to...

Building A Generative AI Platform

25 Jul 2024 | original ↗

After studying how companies deploy generative AI applications, I noticed many similarities in their platforms. This post outlines the common components of a generative AI platform, what they do, and how they are implemented. I try my best to keep the architecture general, but certain applications might deviate. This is what the overall...

Measuring personal growth

17 Apr 2024 | original ↗

My founder friends constantly think about growth. They think about how to measure their business growth and how to get to the next order of magnitude scale. If they’re making $1M ARR today, they think about how to get to $10M ARR. If they have 1,000 users today, they think about how to get to 10,000 users. This made me wonder if/how people are...

What I learned from looking at 900 most popular open source AI tools

14 Mar 2024 | original ↗

[Hacker News discussion, LinkedIn discussion, Twitter thread] Four years ago, I did an analysis of the open source ML ecosystem. Since then, the landscape has changed, so I revisited the topic. This time, I focused exclusively on the stack around foundation models. The full list of open source AI repos is hosted at llama-police. The list is...

Multimodality and Large Multimodal Models (LMMs)

Related

More from Chip Huyen