Running prompts against images, PDFs, audio and video with Google Gemini

from blog Simon Willison TIL, 23 Oct 2024 | ↗ original

I'm still working towards adding multi-modal support to my LLM tool. In the meantime, here are notes on running prompts against images and PDFs and audio and video files from the command-line using the Google Gemini family of models. Update: I integrated the research from this TIL into my LLM tool, which can now run multi-modal prompts against...

This is a short summary. ↗ Open original to view full content

Running prompts against images and PDFs with Google Gemini

Simon Willison's Weblog | original ↗

You can now run prompts against images, audio and video in your terminal using LLM

Simon Willison's Weblog | original ↗

ColdFusion Component for Google Gemini

Raymond Camden | original ↗

llm-gemini 0.4

Simon Willison's Weblog | original ↗

q What do I title this article?

Two-Wrongs | original ↗

docs.jina.ai - the Jina meta-prompt

Simon Willison's Weblog | original ↗

Multimodality and Large Multimodal Models (LMMs)

Chip Huyen | original ↗

Generating Illustrated Stories with AI

Raymond Camden | original ↗

Automating Object Detection with Google Gemini GenAI and Pipedream

Raymond Camden | original ↗

Open-LLMs - A list of LLMs for Commercial Use

Eugene Yan | original ↗

More from Simon Willison TIL

Calculating the size of all LFS files in a repo

25 Dec 2024 | original ↗

I wanted to know how large the deepseek-ai/DeepSeek-V3-Base repo on Hugging Face was without actually downloading all of the files. With some help from Claude, here's the recipe that worked. First, clone the repo without having Git LFS download the files: GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/deepseek-ai/DeepSeek-V3-Base cd...

Named Entity Resolution with dslim/distilbert-NER

24 Dec 2024 | original ↗

I was exploring the original BERT model from 2018, which is mainly useful if you fine-tune a model on top of it for a specific task. dslim/distilbert-NER by David S. Lim is a popular implementation of this, with around 20,000 downloads from Hugging Face every month. I tried the demo from the README but it didn't quite work - it complained about...

Fixes for datetime UTC warnings in Python

12 Dec 2024 | original ↗

I was getting the following warning for one of my Python test suites: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). I also saw a deprecation warning elsewhere for my usage of...

Publishing a simple client-side JavaScript package to npm with GitHub Actions

8 Dec 2024 | original ↗

Here's what I learned about publishing a single file JavaScript package to NPM for my Prompts.js project. The code is in simonw/prompts-js on GitHub. The NPM package is prompts-js. A simple single file client-side package For this project, I wanted to create an old-fashioned JavaScript file that you could include in a web page using a tag. No...

GitHub OAuth for a static site using Cloudflare Workers

29 Nov 2024 | original ↗

My tools.simonwillison.net site is a growing collection of small HTML and JavaScript applications hosted as static files on GitHub Pages. Many of those tools take advantage of external APIs such as those provided by OpenAI and Anthropic and Google Gemini, thanks to the increasingly common access-control-allow-origin: * CORS header. I want to...

Running prompts against images, PDFs, audio and video with Google Gemini

Related

More from Simon Willison TIL