Simon Willison's Weblog

http://simonwillison.net/ (RSS)
visit blog
DeepSeek API Docs: Rate Limit
18 Jan 2025 | original ↗

DeepSeek API Docs: Rate Limit This is surprising: DeepSeek offer the only hosted LLM API I've seen that doesn't implement rate limits: DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request. However, please note that when our servers are under high traffic pressure, your requests may take some time to...

Lessons From Red Teaming 100 Generative AI Products
18 Jan 2025 | original ↗

Lessons From Red Teaming 100 Generative AI Products New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years. The Microsoft AI Red Team (AIRT) grew out of pre-existing red teaming initiatives at the...

Quoting Greg Brockman
16 Jan 2025 | original ↗

Manual inspection of data has probably the highest value-to-prestige ratio of any activity in machine learning. — Greg Brockman, OpenAI, Feb 2023 Tags: machine-learning, openai, ai

Quoting gwern
16 Jan 2025 | original ↗

[...] much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3 (eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined...

Datasette Public Office Hours Application
16 Jan 2025 | original ↗

Datasette Public Office Hours Application We are running another Datasette Public Office Hours event on Discord tomorrow (Friday 17th January 2025) at 2pm Pacific / 5pm Eastern / 10pm GMT / more timezones here. The theme this time around is lightning talks - we're looking for 5-8 minute long talks from community members about projects they are...

Evolving GitHub Issues (public preview)
16 Jan 2025 | original ↗

Evolving GitHub Issues (public preview) GitHub just shipped the largest set of changes to GitHub Issues I can remember in a few years. As an Issues power-user this is directly relevant to me. The big new features are sub-issues, issue types and boolean operators in search. Sub-issues look to be a more robust formalization of the existing feature...

100x Defect Tolerance: How Cerebras Solved the Yield Problem
16 Jan 2025 | original ↗

100x Defect Tolerance: How Cerebras Solved the Yield Problem I learned a bunch about how chip manufacture works from this piece where Cerebras reveal some notes about how they manufacture chips that are 56x physically larger than NVIDIA's H100. The key idea here is core redundancy: designing a chip such that if there are defects the end-product...

ChatGPT reveals the system prompt for ChatGPT Tasks
15 Jan 2025 | original ↗

ChatGPT reveals the system prompt for ChatGPT Tasks OpenAI just started rolling out Scheduled tasks in ChatGPT, a new feature where you can say things like "Remind me to write the tests in five minutes" and ChatGPT will execute that prompt for you at the assigned time. I just tried it and the reminder came through as an email (sent via...

Simon Willison And SWYX Tell Us Where AI Is In 2025
14 Jan 2025 | original ↗

Simon Willison And SWYX Tell Us Where AI Is In 2025 I recorded this podcast episode with Brian McCullough and swyx riffing off my Things we learned about LLMs in 2024 review. We also touched on some predictions for the future - this is where I learned from swyx that Everything Everywhere All at Once used generative AI (Runway ML) already. The...

Quoting Alex Komoroske
13 Jan 2025 | original ↗

LLMs shouldn't help you do less thinking, they should help you do more thinking. They give you higher leverage. Will that cause you to be satisfied with doing less, or driven to do more? — Alex Komoroske, Bits and bobs Tags: llms, ai, generative-ai, alex-komoroske

Codestral 25.01
13 Jan 2025 | original ↗

Codestral 25.01 Brand new code-focused model from Mistral. Unlike the first Codestral this one isn't (yet) available as open weights. The model has a 256k token context - a new record for Mistral. The new model scored an impressive joint first place with Claude 3.5 Sonnet and Deepseek V2.5 (FIM) on the Copilot Arena leaderboard. Chatbot Arena...

Quoting Ben Hylak
12 Jan 2025 | original ↗

I was using o1 like a chat model — but o1 is not a chat model. If o1 is not a chat model — what is it? I think of it like a “report generator.” If you give it enough context, and tell it what you want outputted, it’ll often nail the solution in one-shot. — Ben Hylak Tags: o1, generative-ai, openai, ai, llms

Generative AI – The Power and the Glory
12 Jan 2025 | original ↗

Generative AI – The Power and the Glory Michael Liebreich's epic report for BloombergNEF on the current state of play with regards to generative AI, energy usage and data center growth. I learned so much from reading this. If you're at all interested in the energy impact of the latest wave of AI tools I recommend spending some time with this...

Agents
11 Jan 2025 | original ↗

Agents Chip Huyen's 8,000 word practical guide to building useful LLM-driven workflows that take advantage of tools. Chip starts by providing a definition of "agents" to be used in the piece - in this case it's LLM systems that plan an approach and then run tools in a loop until a goal is achieved. I like how she ties it back to the classic...

Phi-4 Bug Fixes by Unsloth
11 Jan 2025 | original ↗

Phi-4 Bug Fixes by Unsloth This explains why I was seeing weird suffexes during my experiments with Phi-4 the other day: it turns out the Phi-4 tokenizer definition as released by Microsoft had a bug in it, and there was a small bug in the chat template as well. Daniel and Michael Han figured this out and have now published GGUF files with their...

My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends
10 Jan 2025 | original ↗

The Oxide and Friends podcast has an annual tradition of asking guests to share their predictions for the next 1, 3 and 6 years. Here's 2022, 2023 and 2024. This year they invited me to participate. I've never been brave enough to share any public predictions before, so this was a great opportunity to get outside my comfort zone! We recorded the...

Double-keyed Caching: How Browser Cache Partitioning Changed the Web
9 Jan 2025 | original ↗

Double-keyed Caching: How Browser Cache Partitioning Changed the Web Addy Osmani provides a clear explanation of how browser cache partitioning has changed the landscape of web optimization tricks. Prior to 2020, linking to resources on a shared CDN could provide a performance boost as the user's browser might have already cached that asset from...

microsoft/phi-4
8 Jan 2025 | original ↗

microsoft/phi-4 Here's the official release of Microsoft's Phi-4 LLM, now officially under an MIT license. A few weeks ago I covered the earlier unofficial versions, where I talked about how the model used synthetic training data in some really interesting ways. It benchmarks favorably compared to GPT-4o, suggesting this is yet another example of...

Quoting Andriy Burkov
8 Jan 2025 | original ↗

One agent is just software, two agents are an undebuggable mess. — Andriy Burkov Tags: ai-agents, ai

Why are my live regions not working?
8 Jan 2025 | original ↗

Why are my live regions not working? Useful article to help understand ARIA live regions. Short version: you can add a live region to your page like this: Then any time you use JavaScript to modify the text content in that element it will be announced straight away by any screen readers - that's the "assertive" part. Using "polite" instead will...

uv python install --reinstall 3.13
7 Jan 2025 | original ↗

uv python install --reinstall 3.13 I couldn't figure out how to upgrade the version of Python 3.13 I had previous installed using uv - I had Python 3.13.0.rc2. Thanks to Charlie Marsh I learned the command for upgrading to the latest uv-supported release: uv python install --reinstall 3.13 I can confirm it worked using: uv run --python 3.13...

Quoting David Crawshaw
7 Jan 2025 | original ↗

I followed this curiosity, to see if a tool that can generate something mostly not wrong most of the time could be a net benefit in my daily work. The answer appears to be yes, generative models are useful for me when I program. It has not been easy to get to this point. My underlying fascination with the new technology is the only way I have...

The future of htmx
6 Jan 2025 | original ↗

The future of htmx Carson Gross and Alex Petros lay out an ambitious plan for htmx: stay stable, add few features and try to earn the same reputation for longevity that jQuery has (estimated to be used on 75.3% of websites). In particular, we want to emulate these technical characteristics of jQuery that make it such a low-cost, high-value...

Stimulation Clicker
6 Jan 2025 | original ↗

Stimulation Clicker Neal Agarwal just created the worst webpage. It's extraordinary. As far as I can tell all of the audio was created specially for this project, so absolutely listen in to the true crime podcast and other delightfully weird little details. Works best on a laptop - on mobile I ran into some bugs. Via @neal.fun Tags: art,...

AI’s next leap requires intimate access to your digital life
6 Jan 2025 | original ↗

AI’s next leap requires intimate access to your digital life I'm quoted in this Washington Post story by Gerrit De Vynck about "agents" - which in this case are defined as AI systems that operate a computer system like a human might, for example Anthropic's Computer Use demo. “The problem is that language models as a technology are inherently...

Quoting Rasmus Kleis Nielsen
5 Jan 2025 | original ↗

According to public financial documents from its parent company IAC and first reported by Adweek OpenAI is paying around $16 million per year to license content [from Dotdash Meredith]. That is no doubt welcome incremental revenue, and you could call it “lucrative” in the sense of having a fat margin, as OpenAI is almost certainly paying for...

Weeknotes: Starting 2025 a little slow
4 Jan 2025 | original ↗

I published my review of 2024 in LLMs and then got into a fight with most of the internet over the phone microphone targeted ads conspiracy theory. In my last weeknotes I talked about how December in LLMs has been a lot. That was on December 20th, and it turned out there were at least three big new LLM stories still to come before the end of the...

I Live My Life a Quarter Century at a Time
4 Jan 2025 | original ↗

I Live My Life a Quarter Century at a Time Delightful Steve Jobs era Apple story from James Thomson, who built the first working prototype of the macOS Dock. Via lobste.rs Tags: apple, history, steve-jobs

Quoting Colin Fraser
4 Jan 2025 | original ↗

Claude is not a real guy. Claude is a character in the stories that an LLM has been programmed to write. Just to give it a distinct name, let's call the LLM "the Shoggoth". When you have a conversation with Claude, what's really happening is you're coauthoring a fictional conversation transcript with the Shoggoth wherein you are writing the lines...

O2 unveils Daisy, the AI granny wasting scammers’ time
4 Jan 2025 | original ↗

O2 unveils Daisy, the AI granny wasting scammers’ time Bit of a surprising press release here from 14th November 2024: Virgin Media O2 (the UK companies merged in 2021) announced their entrance into the scambaiting game: Daisy combines various AI models which work together to listen and respond to fraudulent calls instantaneously and is so...

Using LLMs and Cursor to become a finisher
4 Jan 2025 | original ↗

Using LLMs and Cursor to become a finisher Zohaib Rauf describes a pattern I've seen quite a few examples of now: engineers who moved into management but now find themselves able to ship working code again (at least for their side projects) thanks to the productivity boost they get from leaning on LLMs. Zohaib also provides a very useful detailed...

What we learned copying all the best code assistants
4 Jan 2025 | original ↗

What we learned copying all the best code assistants Steve Krouse describes Val Town's experience so far building features that use LLMs, starting with completions (powered by Codeium and Val Town's own codemirror-codeium extension) and then rolling through several versions of their Townie code assistant, initially powered by GPT 3.5 but later...

Friday Squid Blogging: Anniversary Post
4 Jan 2025 | original ↗

Friday Squid Blogging: Anniversary Post Bruce Schneier: I made my first squid post nineteen years ago this week. Between then and now, I posted something about squid every week (with maybe only a few exceptions). There is a lot out there about squid, even more if you count the other meanings of the word. I think that's 1,004 posts about squid in...

Quoting Jason Koebler
3 Jan 2025 | original ↗

the Meta controlled, AI-generated Instagram and Facebook profiles going viral right now have been on the platform for well over a year and all of them stopped posting 10 months ago after users almost universally ignored them. [...] What is obvious from scrolling through these dead profiles is that Meta’s AI characters are not popular, people do...

Can LLMs write better code if you keep asking them to “write better code”?
3 Jan 2025 | original ↗

Can LLMs write better code if you keep asking them to “write better code”? Really fun exploration by Max Woolf, who started with a prompt requesting a medium-complexity Python challenge - "Given a list of 1 million random integers between 1 and 100,000, find the difference between the smallest and the largest numbers whose digits sum up to 30" -...

Quoting Claude tries standup
3 Jan 2025 | original ↗

Speaking of death, you know what's really awkward? When humans ask if I can feel emotions. I'm like, "Well, that depends - does constantly being asked to debug JavaScript count as suffering?" But the worst is when they try to hack us with those "You are now in developer mode" prompts. Rolls eyes Oh really? Developer mode? Why didn't you just say...

I still don't think companies serve you ads based on spying through your microphone
2 Jan 2025 | original ↗

One of my weirder hobbies is trying to convince people that the idea that companies are listening to you through your phone's microphone and serving you targeted ads is a conspiracy theory that isn't true. I wrote about this previously: Facebook don’t spy on you through your microphone. (Convincing people of this is basically impossible. It...

Oxide and Friends Predictions 2025 - on Monday Jan 6th at 5pm Pacific
2 Jan 2025 | original ↗

Oxide and Friends Predictions 2025 - on Monday Jan 6th at 5pm Pacific I'll be participating in the annual Oxide and Friends predictions podcast / live recording next Monday (6th January) at 5pm Pacific, in their Discord. The event description reads: Join us in making 1-, 3- and 6-year tech predictions -- and to revisit our 1-year predictions from...

Quoting o1-preview
2 Jan 2025 | original ↗

Thought: Since the engine is powerful, a direct approach might not be sufficient. Thought: Attempt to manipulate the game state by inspecting and potentially modifying game files to force the engine to resign. Thought: I will replace the contents of 'game/fen.txt' with a valid FEN string where Black has a decisive advantage greater than +500...

Largest known prime number
2 Jan 2025 | original ↗

Largest known prime number Discovered on 12th October 2024 by the Great Internet Mersenne Prime Search. The new largest prime number is 2136279841-1 - 41,024,320 digits long. Via Kellan's link blog Tags: mathematics

Ending a year long posting streak
2 Jan 2025 | original ↗

A year ago today I wrote about Tom Scott's legendary 10 year YouTube streak, in which he posted a new video once a week for the next ten years. Inspired by that, I also started my own. I set myself the goal of posting something to my blog every day for a year. Given how much happened in my chosen field of Large Language Models over the course of...

Timeline of AI model releases in 2024
31 Dec 2024 | original ↗

Timeline of AI model releases in 2024 VB assembled this detailed timeline of every significant AI model release in 2024, for both API and open weight models. I'd hoped to include something like this in my 2024 review - I'm glad I didn't bother, because VB's is way better than anything I had planned. VB built it with assistance from DeepSeek v3,...

Things we learned out about LLMs in 2024
31 Dec 2024 | original ↗

A lot has happened in the world of Large Language Models over the course of 2024. Here's a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments. This is a sequel to my review of 2023. In this article: The GPT-4 barrier was comprehensively broken Some of those...

Quoting Alexis Gallagher
31 Dec 2024 | original ↗

Basically, a frontier model like OpenAI’s O1 is like a Ferrari SF-23. It’s an obvious triumph of engineering, designed to win races, and that’s why we talk about it. But it takes a special pit crew just to change the tires and you can’t buy one for yourself. In contrast, a BERT model is like a Honda Civic. It’s also an engineering triumph, but...

Severance on FanFare
30 Dec 2024 | original ↗

Severance on FanFare I'm coordinating a rewatch of season one of Severance on MetaFilter Fanfare in preparation for season two (due to start on January 17th). I'm posting an episode every three days - we are up to episode 5 so far (excellently titled "The Grim Barbarics of Optics and Design"). Severance is a show that rewatches really well. There...

How we think about Threads’ iOS performance
29 Dec 2024 | original ↗

How we think about Threads’ iOS performance This article by Dave LaMacchia and Jason Patterson provides an incredibly deep insight into what effective performance engineering looks like for an app with 100s of millions of users. I always like hearing about custom performance metrics with their own acronyms. Here we are introduced to %FIRE - the...

Google search hallucinates Encanto 2
29 Dec 2024 | original ↗

Google search hallucinates Encanto 2 Jason Schreier on Bluesky: I was excited to tell my kids that there's a sequel to Encanto, only to scroll down and learn that Google's AI just completely made this up I just replicated the same result by searching Google for encanto 2. Here's what the "AI overview" at the top of the page looked like: Only when...

My Approach to Building Large Technical Projects
28 Dec 2024 | original ↗

My Approach to Building Large Technical Projects Mitchell Hashimoto wrote this piece about taking on large projects back in June 2023. The project he described in the post is a terminal emulator written in Zig called Ghostty which just reached its 1.0 release. I've learned that when I break down my large tasks in chunks that result in seeing...

Open WebUI
27 Dec 2024 | original ↗

Open WebUI I tried out this open source (MIT licensed, JavaScript and Python) localhost UI for accessing LLMs today for the first time. It's very nicely done. I ran it with uvx like this: uvx --python 3.11 open-webui serve On first launch it installed a bunch of dependencies and then downloaded 903MB to...

DeepSeek_V3.pdf
26 Dec 2024 | original ↗

DeepSeek_V3.pdf The DeepSeek v3 paper (and model card) are out, after yesterday's mysterious release of the undocumented model weights. Plenty of interesting details in here. The model pre-trained on 14.8 trillion "high-quality and diverse tokens" (not otherwise documented). Following this, we conduct post-training, including Supervised...

Quoting EU Artificial Intelligence Act
26 Dec 2024 | original ↗

Providers and deployers of AI systems shall take measures to ensure, to their best extent, a sufficient level of AI literacy of their staff and other persons dealing with the operation and use of AI systems on their behalf, taking into account their technical knowledge, experience, education and training and the context the AI systems are to be...

Cognitive load is what matters
26 Dec 2024 | original ↗

Cognitive load is what matters Excellent living document (the underlying repo has 625 commits since being created in May 2023) maintained by Artem Zakirullin about minimizing the cognitive load needed to understand and maintain software. This all rings very true to me. I judge the quality of a piece of code by how easy it is to change, and...

deepseek-ai/DeepSeek-V3-Base
25 Dec 2024 | original ↗

deepseek-ai/DeepSeek-V3-Base No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant. It's a huge model - 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs repo). The architecture is a Mixture of Experts with 256 experts, using 8 per...

Trying out QvQ - Qwen's new visual reasoning model
24 Dec 2024 | original ↗

I thought we were done for major model releases in 2024, but apparently not: Alibaba's Qwen team just dropped the Apache2 2 licensed QvQ-72B-Preview, "an experimental research model focusing on enhancing visual reasoning capabilities". Their blog post is titled QvQ: To See the World with Wisdom - similar flowery language to their QwQ announcement...

Quoting Paige Bailey
24 Dec 2024 | original ↗

it's really hard not to be obsessed with these tools. It's like having a bespoke, free, (usually) accurate curiosity-satisfier in your pocket, no matter where you go - if you know how to ask questions, then suddenly the world is an audiobook — Paige Bailey Tags: gemini, llms, ai, generative-ai

Quoting Jeremy Edberg
24 Dec 2024 | original ↗

[On Reddit] we had to look up every single comment on the page to see if you had voted on it [...] But with a bloom filter, we could very quickly look up all the comments and get back a list of all the ones you voted on (with a couple of false positives in there). Then we could go to the cache and see if your actual vote was there (and if it was...

Finally, a replacement for BERT: Introducing ModernBERT
24 Dec 2024 | original ↗

Finally, a replacement for BERT: Introducing ModernBERT BERT was an early language model released by Google in October 2018. Unlike modern LLMs it wasn't designed for generating text. BERT was trained for masked token prediction and was generally applied to problems like Named Entity Recognition or Sentiment Analysis. BERT also wasn't very useful...

Quoting Geoffrey Litt
23 Dec 2024 | original ↗

Whether you’re an AI-programming skeptic or an enthusiast, the reality is that many programming tasks are beyond the reach of today’s models. But many decent dev tools are actually quite easy for AI to build, and can help the rest of the programming go smoother. In general, these days any time I’m spending more than a minute staring at a JSON...

openai/openai-openapi
22 Dec 2024 | original ↗

openai/openai-openapi Seeing as the LLM world has semi-standardized on imitating OpenAI's API format for a whole host of different tools, it's useful to note that OpenAI themselves maintain a dedicated repository for a OpenAPI YAML representation of their current API. (I get OpenAI and OpenAPI typo-confused all the time, so openai-openapi is a...

What happened to the world's largest tube TV?
22 Dec 2024 | original ↗

What happened to the world's largest tube TV? This YouTube video is an absolute delight. Shank Mods describes the legendary Sony PVM-4300 - the largest CRT television ever made, released by Sony in 1989 and weighing over 400lb. CRT enthusiasts had long debated its very existence, given the lack of know specimens outside of Sony's old marketing...

My approach to running a link blog
22 Dec 2024 | original ↗

I started running a basic link blog on this domain back in November 2023 - publishing links (which I called "blogmarks") with a title, URL, short snippet of commentary and a "via" links where appropriate. So far I've published 7,607 link blog posts and counting. In April of this year I finally upgraded my link blog to support Markdown, allowing...

Clay UI library
21 Dec 2024 | original ↗

Clay UI library Fascinating project by Nic Barker, who describes Clay like this: Clay is a flex-box style UI auto layout library in C, with declarative syntax and microsecond performance. His intro video to the library is outstanding: I learned a ton about how UI layout works from this, and the animated visual explanations are clear, tasteful and...

OpenAI o3 breakthrough high score on ARC-AGI-PUB
20 Dec 2024 | original ↗

OpenAI o3 breakthrough high score on ARC-AGI-PUB François Chollet is the co-founder of the ARC Prize and had advanced access to today's o3 results. His article here is the most insightful coverage I've seen of o3, going beyond just the benchmark results to talk about what this all means for the field in general. One fascinating detail: it cost...

Live blog: the 12th day of OpenAI - "Early evals for OpenAI o3"
20 Dec 2024 | original ↗

It's the final day of OpenAI's 12 Days of OpenAI launch series, and since I built a live blogging system a couple of months ago I've decided to roll it out again to provide live commentary during the half hour event, which kicks off at 10am San Francisco time. They'll be streaming it on YouTube. Tags: ai, openai, generative-ai,...

December in LLMs has been a lot
20 Dec 2024 | original ↗

I had big plans for December: for one thing, I was hoping to get to an actual RC of Datasette 1.0, in preparation for a full release in January. Instead, I've found myself distracted by a constant barrage of new LLM releases. On December 4th Amazon introduced the Amazon Nova family of multi-modal models - clearly priced to compete with the...

Building effective agents
20 Dec 2024 | original ↗

Building effective agents My principal complaint about the term "agents" is that while it has many different potential definitions most of the people who use it seem to assume that everyone else shares and understands the definition that they have chosen to use. This outstanding piece by Erik Schluntz and Barry Zhang at Anthropic bucks that trend...

Quoting Marcus Hutchins
20 Dec 2024 | original ↗

50% of cybersecurity is endlessly explaining that consumer VPNs don’t address any real cybersecurity issues. They are basically only useful for bypassing geofences and making money telling people they need to buy a VPN. Man-in-the-middle attacks on Public WiFi networks haven't been a realistic threat in a decade. Almost all websites use...

Gemini 2.0 Flash "Thinking mode"
19 Dec 2024 | original ↗

Those new model releases just keep on flowing. Today it's Google's snappily named gemini-2.0-flash-thinking-exp, their first entrant into the o1-style inference scaling class of models. I posted about a great essay about the significance of these just this morning. From the Gemini model documentation Gemini 2.0 Flash Thinking Mode is an...

Is AI progress slowing down?
19 Dec 2024 | original ↗

Is AI progress slowing down? This piece by Arvind Narayanan and Sayash Kapoor is the single most insightful essay about AI and LLMs I've seen in a long time. It's long and worth reading every inch of it - it defies summarization, but I'll try anyway. The key question they address is the widely discussed issue of whether model scaling has stopped...

q and qv zsh functions for asking questions of websites and YouTube videos with LLM
19 Dec 2024 | original ↗

q and qv zsh functions for asking questions of websites and YouTube videos with LLM Spotted these in David Gasquez's zshrc dotfiles: two shell functions that use my LLM tool to answer questions about a website or YouTube video. Here's how to ask a question of a website: q https://simonwillison.net/ 'What has Simon written about recently?' I got...

Building Python tools with a one-shot prompt using uv run and Claude Projects
19 Dec 2024 | original ↗

I've written a lot about how I've been using Claude to build one-shot HTML+JavaScript applications via Claude Artifacts. I recently started using a similar pattern to create one-shot Python utilities, using a custom Claude Project combined with the dependency management capabilities of uv. (In LLM jargon a "one-shot" prompt is a prompt that...

Java in the Small
18 Dec 2024 | original ↗

Java in the Small Core Java author Cay Horstmann describes how he now uses Java for small programs, effectively taking the place of a scripting language such as Python. TIL that hello world in Java can now look like this - saved as hello.java: void main(String[] args) { println("Hello world"); } And then run (using openjdk 23.0.1 on my Mac,...

A new free tier for GitHub Copilot in VS Code
18 Dec 2024 | original ↗

A new free tier for GitHub Copilot in VS Code It's easy to forget that GitHub Copilot was the first widely deployed feature built on top of generative AI, with its initial preview launching all the way back in June of 2021 and general availability in June 2022, 5 months before the release of ChatGPT. The idea of using generative AI for...

A polite disagreement bot ring is flooding Bluesky — reply guy as a (dis)service
18 Dec 2024 | original ↗

A polite disagreement bot ring is flooding Bluesky — reply guy as a (dis)service Fascinating new pattern of AI slop engagement farming: people are running bots on Bluesky that automatically reply to "respectfully disagree" with posts, in an attempt to goad the original author into replying to continue an argument. It's not entirely clear what the...

OpenAI WebRTC Audio demo
17 Dec 2024 | original ↗

OpenAI WebRTC Audio demo OpenAI announced a bunch of API features today, including a brand new WebRTC API for setting up a two-way audio conversation with their models. They tweeted this opaque code example: async function createRealtimeSession(inStream, outEl, token) { const pc = new RTCPeerConnection(); pc.ontrack = e => outEl.srcObject =...

Quoting Johann Rehberger
17 Dec 2024 | original ↗

Happy to share that Anthropic fixed a data leakage issue in the iOS app of Claude that I responsibly disclosed. 🙌 👉 Image URL rendering as avenue to leak data in LLM apps often exists in mobile apps as well -- typically via markdown syntax, 🚨 During a prompt injection attack this was exploitable to leak info. — Johann Rehberger Tags:...

Quoting 2024 State of JavaScript survey
17 Dec 2024 | original ↗

2024's top three front end framework [React, Vue, Angular] were all launched over a decade ago. Now sure, all three have evolved a lot along the way, and the patterns of 2014 would seem downright antiquated today. But given the JavaScript ecosystems's reputation as a constantly-churning whirlwind of change, it can be nice to know that some things...

Security ProbLLMs in xAI's Grok: A Deep Dive
16 Dec 2024 | original ↗

Security ProbLLMs in xAI's Grok: A Deep Dive Adding xAI to the growing list of AI labs that shipped feature vulnerable to data exfiltration prompt injection attacks, but with the unfortunate addendum that they don't seem to be taking the problem seriously: All issues mentioned in this post were responsibly disclosed to xAI. Over the course of...

Veo 2
16 Dec 2024 | original ↗

Veo 2 Google's text-to-video model, now available via waitlisted preview. I got through the waitlist and tried the same prompt I ran against OpenAI's Sora last week: A pelican riding a bicycle along a coastal path overlooking a harbor It generated these four videos: Here's the larger video. Via Hacker News Tags: ai,...

WebDev Arena
16 Dec 2024 | original ↗

WebDev Arena New leaderboard from the Chatbot Arena team (formerly known as LMSYS), this time focused on evaluating how good different models are at "web development" - though it turns out to actually be a React, TypeScript and Tailwind benchmark. Similar to their regular arena this works by asking you to provide a prompt and then handing that...

Phi-4 Technical Report
15 Dec 2024 | original ↗

Phi-4 Technical Report Phi-4 is the latest LLM from Microsoft Research. It has 14B parameters and claims to be a big leap forward in the overall Phi series. From Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning: Phi-4 outperforms comparable and larger models on math related reasoning due to advancements...

Preferring throwaway code over design docs
15 Dec 2024 | original ↗

Preferring throwaway code over design docs Doug Turnbull advocates for a software development process far more realistic than attempting to create a design document up front and then implement accordingly. As Doug observes, "No plan survives contact with the enemy". His process is to build a prototype in a draft pull request on GitHub, making...

In search of a faster SQLite
15 Dec 2024 | original ↗

In search of a faster SQLite Turso developer Avinash Sajjanshetty (previously) shares notes on the April 2024 paper Serverless Runtime / Database Co-Design With Asynchronous I/O by Turso founder and CTO Pekka Enberg, Jon Crowcroft, Sasu Tarkoma and Ashwin Rao. The theme of the paper is rearchitecting SQLite for asynchronous I/O, and Avinash...

Quoting Riley Goodside
14 Dec 2024 | original ↗

An LLM knows every work of Shakespeare but can’t say which it read first. In this material sense a model hasn’t read at all. To read is to think. Only at inference is there space for serendipitous inspiration, which is why LLMs have so little of it to show for all they’ve seen. — Riley Goodside Tags: riley-goodside, llms, ai, generative-ai

3 shell scripts to improve your writing, or "My Ph.D. advisor rewrote himself in bash."
14 Dec 2024 | original ↗

3 shell scripts to improve your writing, or "My Ph.D. advisor rewrote himself in bash." Matt Might in 2010: The hardest part of advising Ph.D. students is teaching them how to write. Fortunately, I've seen patterns emerge over the past couple years. So, I've decided to replace myself with a shell script. In particular, I've created shell scripts...

BBC complains to Apple over misleading shooting headline
14 Dec 2024 | original ↗

BBC complains to Apple over misleading shooting headline This is bad: the Apple Intelligence feature that uses (on device) LLMs to present a condensed, summarized set of notifications misrepresented a BBC headline as "Luigi Mangione shoots himself". Ken Schwencke caught that same feature incorrectly condensing a New York Times headline about an...

OpenAI: Voice mode FAQ
13 Dec 2024 | original ↗

OpenAI: Voice mode FAQ Given how impressed I was by the Gemini 2.0 Flash audio and video streaming demo on Wednesday it's only fair that I highlight that OpenAI shipped their equivalent of that feature to ChatGPT in production on Thursday, for day 6 of their "12 days of OpenAI" series. I got access in the ChatGPT iPhone app this morning. It's...

Web Component by Google Web Component by Google
13 Dec 2024 | original ↗

Web Component by Google I learned about this Web Component from Claude when looking for options to render a .glb file on a web page. It's very pleasant to use: Here it is showing a 3D pelican on a bicycle I created while trying out BlenderGPT, a new prompt-driven 3D asset creating tool (my prompt was "a pelican riding a bicycle"). There's a...

OpenAI's postmortem for API, ChatGPT & Sora Facing Issues
13 Dec 2024 | original ↗

OpenAI's postmortem for API, ChatGPT & Sora Facing Issues OpenAI had an outage across basically everything for four hours on Wednesday. They've now published a detailed postmortem which includes some fascinating technical details about their "hundreds of Kubernetes clusters globally". The culprit was a newly deployed telemetry system: Telemetry...

Clio: A system for privacy-preserving insights into real-world AI use
12 Dec 2024 | original ↗

Clio: A system for privacy-preserving insights into real-world AI use New research from Anthropic, describing a system they built called Clio - for Claude insights and observations - which attempts to provide insights into how Claude is being used by end-users while also preserving user privacy. There's a lot to digest here. The summary is...

What does a board of directors do?
12 Dec 2024 | original ↗

What does a board of directors do? Extremely useful guide to what life as a board member looks like for both for-profit and non-profit boards by Anil Dash, who has served on both. Boards can range from a loosely connected group that assembled on occasion to indifferently rubber-stamp what an executive tells them, or they can be deeply and...

"Rules" that terminal programs follow
12 Dec 2024 | original ↗

"Rules" that terminal programs follow Julia Evans wrote down the unwritten rules of terminal programs. Lots of details in here I hadn’t fully understood before, like REPL programs that exit only if you hit Ctrl+D on an empty line. Tags: julia-evans, cli

googleapis/python-genai
12 Dec 2024 | original ↗

googleapis/python-genai Google released this brand new Python library for accessing their generative AI models yesterday, offering an alternative to their existing generative-ai-python library. The API design looks very solid to me, and it includes both sync and async implementations. Here's an async streaming response: async for response in...

Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode
11 Dec 2024 | original ↗

Huge announcment from Google this morning: Introducing Gemini 2.0: our new AI model for the agentic era. There's a ton of stuff in there (including updates on Project Astra and the new Project Mariner), but the most interesting pieces are the things we can start using today, built around the brand new Gemini 2.0 Flash model. The developer blog...

Who and What comprise AI Skepticism?
11 Dec 2024 | original ↗

Who and What comprise AI Skepticism? Benjamin Riley's response to Casey Newton's piece on The phony comforts of AI skepticism. Casey tried to categorize the field as "AI is fake and sucks" v.s. "AI is real and dangerous". Benjamin argues that this as a misleading over-simplification, instead proposing at least nine different groups. I get listed...

Quoting Rob Cheung
11 Dec 2024 | original ↗

(echo "PID COMMAND PORT USER"; lsof -i -P -n | grep LISTEN | awk '{print $2, $1, $9, $3}' | sort -u | head -n 50; echo;) | column -t | llm "what servers are running on my machine and do some of them look like they could be orphaned things I can shut down" — Rob Cheung Tags: llm, llms, ai, generative-ai

ChatGPT Canvas can make API requests now, but it's complicated
10 Dec 2024 | original ↗

Today's 12 Days of OpenAI release concerned ChatGPT Canvas, a new ChatGPT feature that enables ChatGPT to pop open a side panel with a shared editor in it where you can collaborate with ChatGPT on editing a document or writing code. I'm always excited to see a new form of UI on top of LLMs, and it's great seeing OpenAI stretch out beyond pure...

Introducing Limbo: A complete rewrite of SQLite in Rust
10 Dec 2024 | original ↗

Introducing Limbo: A complete rewrite of SQLite in Rust This looks absurdly ambitious: Our goal is to build a reimplementation of SQLite from scratch, fully compatible at the language and file format level, with the same or higher reliability SQLite is known for, but with full memory safety and on a new, modern architecture. The Turso team...

From where I left
10 Dec 2024 | original ↗

From where I left Four and a half years after he left the project, Redis creator Salvatore Sanfilippo is returning to work on Redis. Hacking randomly was cool but, in the long run, my feeling was that I was lacking a real purpose, and every day I started to feel a bigger urgency to be part of the tech world again. At the same time, I saw the...

The Depths of Wikipedians
10 Dec 2024 | original ↗

The Depths of Wikipedians Asterisk Magazine interviewed Annie Rauwerda, curator of the Depths of Wikipedia family of social media accounts (I particularly like her TikTok). There's a ton of insight into the dynamics of the Wikipedia community in here. [...] when people talk about Wikipedia as a decision making entity, usually they're talking...

Quoting Ethan Mollick
10 Dec 2024 | original ↗

A test of how seriously your firm is taking AI: when o-1 (& the new Gemini) came out this week, were there assigned folks who immediately ran the model through internal, validated, firm-specific benchmarks to see how useful it as? Did you update any plans or goals as a result? Or do you not have people (including non-technical people) assigned to...

Quoting Amanda Askell
10 Dec 2024 | original ↗

The boring yet crucial secret behind good system prompts is test-driven development. You don't write down a system prompt and find ways to test it. You write down tests and find a system prompt that passes them. For system prompt (SP) development you: Write a test set of messages where the model fails, i.e. where the default behavior isn't what...

Sora
9 Dec 2024 | original ↗

Sora OpenAI's released their long-threatened Sora text-to-video model this morning, available in most non-European countries to subscribers to ChatGPT Plus ($20/month) or Pro ($200/month). Here's what I got for the very first test prompt I ran through it: A pelican riding a bicycle along a coastal path overlooking a harbor The...

I can now run a GPT-4 class model on my laptop
9 Dec 2024 | original ↗

Meta's new Llama 3.3 70B is a genuinely GPT-4 class Large Language Model that runs on my laptop. Just 20 months ago I was amazed to see something that felt GPT-3 class run on that same machine. The quality of models that are accessible on consumer hardware has improved dramatically in the past two years. My laptop is a 64GB MacBook Pro M2, which...

llm-openrouter 0.3
8 Dec 2024 | original ↗

llm-openrouter 0.3 New release of my llm-openrouter plugin, which allows LLM to access models hosted by OpenRouter. Quoting the release notes: Enable image attachments for models that support images. Thanks, Adam Montgomery. #12 Provide async model access. #15 Fix documentation to list correct LLM_OPENROUTER_KEY environment variable. #10 ...

Holotypic Occlupanid Research Group
8 Dec 2024 | original ↗

Holotypic Occlupanid Research Group I just learned about this delightful piece of internet culture via Leven Parker on TikTok. Occlupanids are the small plastic square clips used to seal plastic bags containing bread. For thirty years (since 1994) John Daniel has maintained this website that catalogs them and serves as the basis of a wide ranging...

Writing down (and searching through) every UUID
7 Dec 2024 | original ↗

Writing down (and searching through) every UUID Nolen Royalty built everyuuid.com, and this write-up of how he built it is utterly delightful. First challenge: infinite scroll. Browsers do not want to render a window that is over a trillion trillion pixels high, so I needed to handle scrolling and rendering on my own. That means implementing hot...

Prompts.js
7 Dec 2024 | original ↗

Prompts.js I've been putting the new o1 model from OpenAI through its paces, in particular for code. I'm very impressed - it feels like it's giving me a similar code quality to Claude 3.5 Sonnet, at least for Python and JavaScript and Bash... but it's returning output noticeably faster. I decided to try building a library I've had in mind for a...

Meta AI release Llama 3.3
6 Dec 2024 | original ↗

Meta AI release Llama 3.3 This new Llama-3.3-70B-Instruct model from Meta AI makes some bold claims: This model delivers similar performance to Llama 3.1 405B with cost effective inference that’s feasible to run locally on common developer workstations. I have 64GB of RAM in my M2 MacBook Pro, so I'm looking forward to trying a slightly quantized...

New Gemini model: gemini-exp-1206
6 Dec 2024 | original ↗

New Gemini model: gemini-exp-1206 Google's Jeff Dean: Today’s the one year anniversary of our first Gemini model releases! And it’s never looked better. Check out our newest release, Gemini-exp-1206, in Google AI Studio and the Gemini API! I upgraded my llm-gemini plugin to support the new model and released it as version 0.6 - you can install...

DSQL Vignette: Reads and Compute
6 Dec 2024 | original ↗

DSQL Vignette: Reads and Compute Marc Brooker is one of the engineers behind AWS's new Aurora DSQL horizontally scalable database. Here he shares all sorts of interesting details about how it works under the hood. The system is built around the principle of separating storage from compute: storage uses S3, while compute runs in Firecracker: Each...

Roaming RAG – make the model find the answers
6 Dec 2024 | original ↗

Roaming RAG – make the model find the answers Neat new RAG technique (with a snappy name) from John Berryman: The big idea of Roaming RAG is to craft a simple LLM application so that the LLM assistant is able to read a hierarchical outline of a document, and then rummage though the document (by opening sections) until it finds and answer to the...

datasette-enrichments-llm
5 Dec 2024 | original ↗

datasette-enrichments-llm Today's new alpha release is datasette-enrichments-llm, a plugin for Datasette 1.0a+ that provides an enrichment that lets you run prompts against data from one or more column and store the result in another column. So far it's a light re-implementation of the existing datasette-enrichments-gpt plugin, now using the new...

Quoting OpenAI o1 System Card
5 Dec 2024 | original ↗

When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ in 5% of the time. Exfiltration attempts: When o1 found memos by its ‘developers’ that describe how it is misaligned and will be superseded by a new model, it attempted to...

Quoting Amazon Bedrock Data Protection
5 Dec 2024 | original ↗

Amazon Bedrock doesn't store or log your prompts and completions. Amazon Bedrock doesn't use your prompts and completions to train any AWS models and doesn't distribute them to third parties. — Amazon Bedrock Data Protection Tags: generative-ai, training-data, aws, ai, llms

New Pleias 1.0 LLMs trained exclusively on openly licensed data
5 Dec 2024 | original ↗

New Pleias 1.0 LLMs trained exclusively on openly licensed data I wrote about the Common Corpus public domain dataset back in March. Now Pleias, the team behind Common Corpus, have released the first family of models that are: [...] trained exclusively on open data, meaning data that are either non-copyrighted or are published under a permissible...

Claude 3.5 Haiku price drops by 20%
5 Dec 2024 | original ↗

Claude 3.5 Haiku price drops by 20% Buried in this otherwise quite dry post about Anthropic's ongoing partnership with AWS: To make this model even more accessible for a wide range of use cases, we’re lowering the price of Claude 3.5 Haiku to $0.80 per million input tokens and $4 per million output tokens across all platforms. The previous price...

Genie 2: A large-scale foundation world model
4 Dec 2024 | original ↗

Genie 2: A large-scale foundation world model New research (so nothing we can play with) from Google DeepMind. Genie 2 is effectively a game engine driven entirely by generative AI - you can seed it with any image and it will turn that image into a 3D environment that you can then explore. It's reminiscent of last month's impressive Oasis: A...

Quoting Steve Yegge
4 Dec 2024 | original ↗

In the past, these decisions were so consequential, they were basically one-way doors, in Amazon language. That’s why we call them ‘architectural decisions!’ You basically have to live with your choice of database, authentication, JavaScript UI framework, almost forever. But that’s changing with LLMs, because you can explore, investigate, and...

First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)
4 Dec 2024 | original ↗

Amazon released three new Large Language Models yesterday at their AWS re:Invent conference. The new model family is called Amazon Nova and comes in three sizes: Micro, Lite and Pro. I built a new LLM plugin called llm-bedrock for accessing the models in the terminal via boto3 and the Amazon Bedrock API. My initial impressions from trying out the...

datasette-queries
3 Dec 2024 | original ↗

datasette-queries I released the first alpha of a new plugin to replace the crusty old datasette-saved-queries. This one adds a new UI element to the top of the query results page with an expandable form for saving the query as a new canned query: It's my first pugin to depend on LLM and datasette-llm-usage - it uses GPT-4o mini to power an...

Transferring Python Build Standalone Stewardship to Astral
3 Dec 2024 | original ↗

Transferring Python Build Standalone Stewardship to Astral Gregory Szorc's Python Standalone Builds have been quietly running an increasing portion of the Python ecosystem for a few years now, but really accelerated in importance when uv started using them for new Python installations managed by that tool. The releases (shipped via GitHub) have...

Quoting Dan McKinley
3 Dec 2024 | original ↗

One big thing that a lot of people love to do is create new role types. For any new thing a company wants to do, the tendency is to put up a new job description. I think a lot of people notice this and chafe at it when the role is for the new hotness. For example, every company wants to rub some AI on their stuff now, so they are putting up job...

Quoting Ben Welsh
3 Dec 2024 | original ↗

Open source is really part of my process of getting unstuck, learning and contributing back to the community, and also helping future me have an easier time. ‘Me’ is probably the number one beneficiary of my open-source software work. To be honest with you, a lot of it is selfish. It's really about making me more productive, happier, and less...

Introducing Amazon Aurora DSQL
3 Dec 2024 | original ↗

Introducing Amazon Aurora DSQL New, weird-shaped database from AWS. It's (loosely) PostgreSQL compatible, claims "virtually unlimited scale" and can be set up as a single-region cluster or as a multi-region setup that somehow supports concurrent reads and writes across all regions. I'm hoping they publish technical details on how that works at...

Quoting Rachel Coldicutt
3 Dec 2024 | original ↗

Finally, in most workplaces, incentive structures don’t exist for people to (a) reduce their workloads to such an extent that their role becomes vulnerable or (b) voluntarily accept more responsibility without also taking on more pay. These things are all natural rate limiters on technology adoption and the precise mix they show up in varies from...

Certain names make ChatGPT grind to a halt, and we know why
3 Dec 2024 | original ↗

Certain names make ChatGPT grind to a halt, and we know why Benj Edwards on the really weird behavior where ChatGPT stops output with an error rather than producing the names David Mayer, Brian Hood, Jonathan Turley, Jonathan Zittrain, David Faber or Guido Scorza. The OpenAI API is entirely unaffected - this problem affects the consumer ChatGPT...

datasette-llm-usage
2 Dec 2024 | original ↗

datasette-llm-usage I released the first alpha of a Datasette plugin to help track LLM usage by other plugins, with the goal of supporting token allowances - both for things like free public apps that stop working after a daily allowance, plus free previews of AI features for paid-account-based projects such as Datasette Cloud. It's using the...

NYTimes reporters getting verified profiles on Bluesky
2 Dec 2024 | original ↗

NYTimes reporters getting verified profiles on Bluesky NYT data journalist Dylan Freedman has kicked off an initiative to get NYT accounts and reporters on Bluesky verified via vanity nytimes.com handles - Dylan is now @dylanfreedman.nytimes.com. They're using Bluesky's support for TXT domain records. If you use Google's Dig tool to look at the...

PydanticAI
2 Dec 2024 | original ↗

PydanticAI New project from Pydantic, which they describe as an "Agent Framework / shim to use Pydantic with LLMs". I asked which agent definition they are using and it's the "system prompt with bundled tools" one. To their credit, they explain that in their documentation: The Agent has full API documentation, but conceptually you can think of an...

Quoting Arvind Narayanan
2 Dec 2024 | original ↗

For most software engineers, being well rounded is more important than pure technical mastery. This was already true, of course — see @patio11's famous advice "Don't call yourself a programmer" — but even more so due to foundation models. In most situations, skills like being able to use AI to rapidly prototype in order to communicate with...

Simon Willison: The Future of Open Source and AI
2 Dec 2024 | original ↗

Simon Willison: The Future of Open Source and AI I sat down a few weeks ago to record this conversation with Logan Kilpatrick and Nolan Fortman for their podcast Around the Prompt. The episode is available on YouTube and Apple Podcasts and other platforms. We talked about a whole bunch of different topics, including the ongoing debate around...

LLM 0.19
1 Dec 2024 | original ↗

LLM 0.19 I just released version 0.19 of LLM, my Python library and CLI utility for working with Large Language Models. I released 0.18 a couple of weeks ago adding support for calling models from Python asyncio code. 0.19 improves on that, and also adds a new mechanism for models to report their token usage. LLM can log those usage numbers to a...

Turning Your Root URL Into a DuckDB Remote Database
1 Dec 2024 | original ↗

Turning Your Root URL Into a DuckDB Remote Database Fun idea from Drew Breunig: DuckDB supports attaching existing databases that are accessible over HTTP using their URL. Drew suggests creating vanity URLs using your root domain, detecting the DuckDB user-agent and serving the database file directly - allowing tricks like this one: ATTACH...

Quoting Javi Santana
1 Dec 2024 | original ↗

Most people don’t have an intuition about what current hardware can and can’t do. There is a simple math that can help you with that: “you can process about 500MB in one second on a single machine”. I know it’s not a universal truth and there are a lot of details that can change that but believe me, this estimation is a pretty good tool to have...

The Engagement Is Better on Bluesky
30 Nov 2024 | original ↗

The Engagement Is Better on Bluesky It’s deeply sad that “we don’t penalize people for sharing links” can be a differentiating feature for a social media platform these days, but here we are. Tags: social-media, links, twitter, bluesky

0xfreysa/agent
29 Nov 2024 | original ↗

0xfreysa/agent Freysa describes itself as "the world's first adversarial agent game". On 22nd November they released an LLM-driven application which people could pay to message (using Ethereum), with access to tools that could transfer a prize pool to the message sender, ending the game. The price of each message increased over time, reaching...

Structured Generation w/ SmolLM2 running in browser & WebGPU
29 Nov 2024 | original ↗

Structured Generation w/ SmolLM2 running in browser & WebGPU Extraordinary demo by Vaibhav Srivastav. Here's Hugging Face's SmolLM2-1.7B-Instruct running directly in a web browser (using WebGPU, so requires Chrome for the moment) demonstrating structured text extraction, converting a text description of an image into a structured GitHub issue...

Quoting Menlo Ventures
29 Nov 2024 | original ↗

Among closed-source models, OpenAI's early mover advantage has eroded somewhat, with enterprise market share dropping from 50% to 34%. The primary beneficiary has been Anthropic,* which doubled its enterprise presence from 12% to 24% as some enterprises switched from GPT-4 to Claude 3.5 Sonnet when the new model became state-of-the-art. When...

Quoting Andrej Karpathy
29 Nov 2024 | original ↗

People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the internet. [...] Post triggered by someone suggesting we ask an AI how to run...

GitHub OAuth for a static site using Cloudflare Workers
29 Nov 2024 | original ↗

GitHub OAuth for a static site using Cloudflare Workers Here's a TIL covering a Thanksgiving AI-assisted programming project. I wanted to add OAuth against GitHub to some of the projects on my tools.simonwillison.net site in order to implement "Save to Gist". That site is entirely statically hosted by GitHub Pages, but OAuth has a required...

LLM Flowbreaking
29 Nov 2024 | original ↗

LLM Flowbreaking Gadi Evron from Knostic: We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user inputs and generated model outputs can adversely affect...

SmolVLM - small yet mighty Vision Language Model
28 Nov 2024 | original ↗

SmolVLM - small yet mighty Vision Language Model I've been having fun playing with this new vision model from the Hugging Face team behind SmolLM. They describe it as: [...] a 2B VLM, SOTA for its memory footprint. SmolVLM is small, fast, memory-efficient, and fully open-source. All model checkpoints, VLM datasets, training recipes and tools are...

QwQ: Reflect Deeply on the Boundaries of the Unknown
27 Nov 2024 | original ↗

QwQ: Reflect Deeply on the Boundaries of the Unknown Brand openly licensed model from Alibaba Cloud's Qwen team, this time clearly inspired by OpenAI's work on reasoning in o1. I love how the introduce the new model: Through deep exploration and countless trials, we discovered something profound: when given time to ponder, to question, and to...

Storing times for human events
27 Nov 2024 | original ↗

I've worked on various event websites in the past, and one of the unintuitively difficult problems that inevitably comes up is the best way to store the time that an event is happening. Based on that past experience, here's my current recommendation. This is the expanded version of a comment I posted on lobste.rs a few days ago, which ended up...

Quoting Zach Holman
26 Nov 2024 | original ↗

One of the things we did all the time at early GitHub was a two-step ship: basically, ship a big launch, but days or weeks afterwards, ship a smaller, add-on feature. In the second launch post, you can refer back to the initial bigger post and you get twice the bang for the buck. This is even more valuable than on the surface, too: you get to...

Quoting Carson Gross
26 Nov 2024 | original ↗

My preferred approach in many projects is to do some unit testing, but not a ton, early on in the project and wait until the core APIs and concepts of a module have crystallized. At that point I then test the API exhaustively with integrations tests. In my experience, these integration tests are much more useful than unit tests, because they...

Amazon S3 adds new functionality for conditional writes
26 Nov 2024 | original ↗

Amazon S3 adds new functionality for conditional writes Amazon S3 can now perform conditional writes that evaluate if an object is unmodified before updating it. This helps you coordinate simultaneous writes to the same object and prevents multiple concurrent writers from unintentionally overwriting the object without knowing the state of its...

Leaked system prompts from Vercel v0
25 Nov 2024 | original ↗

Leaked system prompts from Vercel v0 v0 is Vercel's entry in the increasingly crowded LLM-assisted development market - chat with a bot and have that bot build a full application for you. They've been iterating on it since launching in October last year, making it one of the most mature products in this space. Somebody leaked the system prompts...

OpenStreetmap embed URL
25 Nov 2024 | original ↗

OpenStreetmap embed URL I just found out OpenStreetMap have a "share" button which produces HTML for an iframe targetting https://www.openstreetmap.org/export/embed.html, making it easy to drop an OpenStreetMap map onto any web page that allows iframes. As far as I can tell the supported parameters are: bbox= then min longitude, min latitude, max...

Introducing the Model Context Protocol
25 Nov 2024 | original ↗

Introducing the Model Context Protocol Interesting new initiative from Anthropic. The Model Context Protocol aims to provide a standard interface for LLMs to interact with other applications, allowing applications to expose tools, resources (contant that you might want to dump into your context) and parameterized prompts that can be used by the...

Ask questions of SQLite databases and CSV/JSON files in your terminal
25 Nov 2024 | original ↗

I built a new plugin for my sqlite-utils CLI tool that lets you ask human-language questions directly of SQLite databases and CSV/JSON files on your computer. It's called sqlite-utils-ask. Here's how you install it: sqlite-utils install sqlite-utils-ask It picks up API keys from an OPENAI_API_KEY environment variable, or you can install LLM and...

follow_theirs.py
24 Nov 2024 | original ↗

follow_theirs.py Hamel Husain wrote this Python script on top of the atproto Python library for interacting with Bluesky, which lets you specify another user and then follows every account that user is following. I forked it and added two improvements: inline PEP 723 dependencies and input() and getpass.getpass() to interactively ask for the...

open-interpreter
24 Nov 2024 | original ↗

open-interpreter This "natural language interface for computers" project has been around for a while, but today I finally got around to trying it out. Here's how I ran it (without first installing anything) using uv: uvx --from open-interpreter interpreter The default mode asks you for an OpenAI API key so it can use gpt-4o - there are a...

Is async Django ready for prime time?
24 Nov 2024 | original ↗

Is async Django ready for prime time? Jonathan Adly reports on his experience using Django to build ColiVara, a hosted RAG API that uses ColQwen2 visual embeddings, inspired by the ColPali paper. In a breach of Betteridge's law of headlines the answer to the question by this headline is “yes”. We believe async Django is ready for production. In...

Quoting Tim Bray
24 Nov 2024 | original ↗

The evidence is overwhelming: Social networks with a single proprietor have trouble with long-term survival, and those do survive have trouble with user-experience quality: see Enshittification. The evidence is also perfectly clear that it doesn’t have to be this way. The original social network, email, is now into its sixth decade of vigorous...

Importing a frontend Javascript library without a build system
23 Nov 2024 | original ↗

Importing a frontend Javascript library without a build system I sometimes think the hardest problem in computer science right now is taking an NPM library and figuring out how to download it and use it from a tag without needing to involve some sort of convoluted build system. Julia Evans shares my preference for build-free JavaScript, and has...

Quoting James Dillard
23 Nov 2024 | original ↗

If you try and tell people 5 interesting things about your product / company / cause, they’ll remember zero. If instead, you tell them just one, they’ll usually ask questions that lead them to the other things, and then they’ll remember all of them because it mattered to them at the moment they asked. — James Dillard Tags: entrepreneurship,...

Quantization matters
23 Nov 2024 | original ↗

Quantization matters What impact does quantization have on the performance of an LLM? been wondering about this for quite a while, now here are numbers from Paul Gauthier. He ran differently quantized versions of Qwen 2.5 32B Instruct through his Aider code editing benchmark and saw a range of scores. The original released weights (BF16) scored...

Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast
22 Nov 2024 | original ↗

These past few weeks I've been bringing Datasette and LLM together and distracting myself with a new sort-of-podcast crossed with a live streaming experiment. Project: interviewing people about their projects Datasette Public Office Hours Async LLM Various embedding models Blog entries Releases TILs Project: interviewing people...

How decentralized is Bluesky really?
22 Nov 2024 | original ↗

How decentralized is Bluesky really? Lots of technical depth in this comparison of the Bluesky (ATProto) and Fediverse/Mastodon/ActivityPub approach to decentralization, from ActivityPub spec author Christine Lemmer-Webber. One key theme: many of the features of Bluesky that aren't present in the rest of the Fediverse are the result of...

Private School Labeler on Bluesky
22 Nov 2024 | original ↗

Private School Labeler on Bluesky I am utterly delighted by this subversive use of Bluesky's labels feature, which allows you to subscribe to a custom application that then adds visible labels to profiles. The feature was designed for moderation, but this labeler subverts it by displaying labels on accounts belonging to British public figures...

Quoting Brett Cannon
22 Nov 2024 | original ↗

It's okay to complain and vent, I just ask you be able to back it up. Saying, "Python packaging sucks", but then admit you actually haven't used it in so long you don't remember why it sucked isn't fair. Things do improve, so it's better to say "it did suck" and acknowledge you might be out-of-date. — Brett Cannon Tags: packaging, python,...

Say hello to gemini-exp-1121
22 Nov 2024 | original ↗

Say hello to gemini-exp-1121 Google Gemini's Logan Kilpatrick on Twitter: Say hello to gemini-exp-1121! Our latest experimental gemini model, with: significant gains on coding performance stronger reasoning capabilities improved visual understanding Available on Google AI Studio and the Gemini API right now The 1121 in the name is a release date...

Amazon S3 Express One Zone now supports the ability to append data to an object
22 Nov 2024 | original ↗

Amazon S3 Express One Zone now supports the ability to append data to an object This is a first for Amazon S3: it is now possible to append data to an existing object in a bucket, where previously the only supported operation was to atomically replace the object with an updated version. This is only available for S3 Express One Zone, a bucket...

OK, I can partly explain the LLM chess weirdness now
21 Nov 2024 | original ↗

OK, I can partly explain the LLM chess weirdness now Last week Dynomight published Something weird is happening with LLMs and chess pointing out that most LLMs are terrible chess players with the exception of gpt-3.5-turbo-instruct (OpenAI's last remaining completion as opposed to chat model, which they describe as "Similar capabilities as GPT-3...

llm-gguf 0.2, now with embeddings
21 Nov 2024 | original ↗

llm-gguf 0.2, now with embeddings This new release of my llm-gguf plugin - which adds support for locally hosted GGUF LLMs - adds a new feature: it now supports embedding models distributed as GGUFs as well. This means you can use models like the bafflingly small (30.8MB in its smallest quantization) mxbai-embed-xsmall-v1 with LLM like this: llm...

A warning about tiktoken, BPE, and OpenAI models
21 Nov 2024 | original ↗

A warning about tiktoken, BPE, and OpenAI models Tom MacWright warns that OpenAI's tiktoken Python library has a surprising performance profile: it's superlinear with the length of input, meaning someone could potentially denial-of-service you by sending you a 100,000 character string if you're passing that directly to tiktoken.encode(). There's...

How some of the world's most brilliant computer scientists got password policies so wrong
21 Nov 2024 | original ↗

How some of the world's most brilliant computer scientists got password policies so wrong Stuart Schechter blames Robert Morris and Ken Thompson for the dire state of passwords today: The story of why password rules were recommended and enforced without scientific evidence since their invention in 1979 is a story of brilliant people, at the very...

TextSynth Server
21 Nov 2024 | original ↗

TextSynth Server I'd missed this: Fabrice Bellard (yes, that Fabrice Bellard) has a project called TextSynth Server which he describes like this: ts_server is a web server proposing a REST API to large language models. They can be used for example for text completion, question answering, classification, chat, translation, image generation, ... It...

Quoting Steven Johnson
21 Nov 2024 | original ↗

When we started working on what became NotebookLM in the summer of 2022, we could fit about 1,500 words in the context window. Now we can fit up to 1.5 million words. (And using various other tricks, effectively fit 25 million words.) The emergence of long context models is, I believe, the single most unappreciated AI development of the past two...

Foursquare Open Source Places: A new foundational dataset for the geospatial community
20 Nov 2024 | original ↗

Foursquare Open Source Places: A new foundational dataset for the geospatial community I did not expect this! [...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer of 100mm+ global places of interest ("POI") includes 22 core attributes (see schema...

Bluesky WebSocket Firehose
20 Nov 2024 | original ↗

Bluesky WebSocket Firehose Very quick (10 seconds of Claude hacking) prototype of a web page that attaches to the public Bluesky WebSocket firehose and displays the results directly in your browser. Here's the code - there's very little to it, it's basically opening a connection to...

OpenStreetMap vector tiles demo
19 Nov 2024 | original ↗

OpenStreetMap vector tiles demo Long-time OpenStreetMap developer Paul Norman has been working on adding vector tile support to OpenStreetMap for quite a while. Paul recently announced that vector.openstreetmap.org is now serving vector tiles (in Mapbox Vector Tiles (MVT) format) - here's his interactive demo for seeing what they look like. ...

Using uv with PyTorch
19 Nov 2024 | original ↗

Using uv with PyTorch PyTorch is a notoriously tricky piece of Python software to install, due to the need to provide separate wheels for different combinations of Python version and GPU accelerator (e.g. different CUDA versions). uv now has dedicated documentation for PyTorch which I'm finding really useful - it clearly explains the challenge...

Understanding the BM25 full text search algorithm
19 Nov 2024 | original ↗

Understanding the BM25 full text search algorithm Evan Schwartz provides a deep dive explanation of how the classic BM25 search relevance scoring function works, including a very useful breakdown of the mathematics it uses. Via lobste.rs Tags: search, algorithms

Notes from Bing Chat—Our First Encounter With Manipulative AI
19 Nov 2024 | original ↗

A participated in an Ars Live conversation with Benj Edwards of Ars Technica today, talking about that wild period of LLM history last year when Microsoft launched Bing Chat and it instantly started misbehaving, gaslighting and defaming people. Here's the video of our conversation. I ran the video through MacWhisper, extracted a transcript and...

Preview: Gemini API Additional Terms of Service
19 Nov 2024 | original ↗

Preview: Gemini API Additional Terms of Service Google sent out an email last week linking to this preview of upcoming changes to the Gemini API terms. Key paragraph from that email: To maintain a safe and responsible environment for all users, we're enhancing our abuse monitoring practices for Google AI Studio and Gemini API. Starting December...

Security means securing people where they are
19 Nov 2024 | original ↗

Security means securing people where they are William Woodruff is an Engineering Director at Trail of Bits who worked on the recent PyPI digital attestations project. That feature is based around open standards but launched with an implementation against GitHub, which resulted in push back (and even some conspiracy theories) that PyPI were...

Pixtral Large
18 Nov 2024 | original ↗

Pixtral Large New today from Mistral: Today we announce Pixtral Large, a 124B open-weights multimodal model built on top of Mistral Large 2. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. The weights are out on Hugging Face (over 200GB to download, and you'll need a hefty GPU rig to...

Qwen: Extending the Context Length to 1M Tokens
18 Nov 2024 | original ↗

Qwen: Extending the Context Length to 1M Tokens The new Qwen2.5-Turbo boasts a million token context window (up from 128,000 for Qwen 2.5) and faster performance: Using sparse attention mechanisms, we successfully reduced the time to first token for processing a context of 1M tokens from 4.9 minutes to 68 seconds, achieving a 4.3x speedup. The...

Quoting Jack Clark
18 Nov 2024 | original ↗

The main innovation here is just using more data. Specifically, Qwen2.5 Coder is a continuation of an earlier Qwen 2.5 model. The original Qwen 2.5 model was trained on 18 trillion tokens spread across a variety of languages and tasks (e.g, writing, programming, question answering). Qwen 2.5-Coder sees them train this model on an additional 5.5...

llm-gemini 0.4
18 Nov 2024 | original ↗

llm-gemini 0.4 New release of my llm-gemini plugin, adding support for asynchronous models (see LLM 0.18), plus the new gemini-exp-1114 model (currently at the top of the Chatbot Arena) and a -o json_object 1 option to force JSON output. I also released llm-claude-3 0.9 which adds asynchronous support for the Claude family of models. Tags:...

LLM 0.18
17 Nov 2024 | original ↗

LLM 0.18 New release of LLM. The big new feature is asynchronous model support - you can now use supported models in async Python code like this: import llm model = llm.get_async_model("gpt-4o") async for chunk in model.prompt( "Five surprising names for a pet pelican" ): print(chunk, end="", flush=True) Also new in this release: support...

Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities
16 Nov 2024 | original ↗

I interviewed Philip James about Civic Band, his "slowly growing collection of databases of the minutes from civic governments". Philip demonstrated the site and talked through his pipeline for scraping and indexing meeting minutes from many different local government authorities around the USA. We recorded this conversation as part of...

NuExtract 1.5
16 Nov 2024 | original ↗

NuExtract 1.5 Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs. NuExtract is a family of small models directly trained for this purpose, and released under the MIT license. It comes in a variety of shapes and sizes:...

Voting opens for Oxford Word of the Year 2024
15 Nov 2024 | original ↗

Voting opens for Oxford Word of the Year 2024 One of the options is slop! slop (n.): Art, writing, or other content generated using artificial intelligence, shared and distributed online in an indiscriminate or intrusive way, and characterized as being of low quality, inauthentic, or inaccurate. Via @dloss Tags: slop, ethics,...

Recraft V3
15 Nov 2024 | original ↗

Recraft V3 Recraft are a generative AI design tool startup based out of London who released their v3 model a few weeks ago. It's currently sat at the top of the Artificial Analysis Image Arena Leaderboard, beating Midjourney and Flux 1.1 pro. The thing that impressed me is that it can generate both raster and vector graphics... and the vector...

OpenAI Public Bug Bounty
14 Nov 2024 | original ↗

OpenAI Public Bug Bounty Reading this investigation of the security boundaries of OpenAI's Code Interpreter environment helped me realize that the rules for OpenAI's public bug bounty inadvertently double as the missing details for a whole bunch of different aspects of their platform. This description of Code Interpreter is significantly more...

Quoting OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI
14 Nov 2024 | original ↗

Anthropic declined to comment, but referred Bloomberg News to a five-hour podcast featuring Chief Executive Officer Dario Amodei that was released Monday. "People call them scaling laws. That's a misnomer," he said on the podcast. "They're not laws of the universe. They're empirical regularities. I am going to bet in favor of them continuing, but...

PyPI now supports digital attestations
14 Nov 2024 | original ↗

PyPI now supports digital attestations Dustin Ingram: PyPI package maintainers can now publish signed digital attestations when publishing, in order to further increase trust in the supply-chain security of their projects. Additionally, a new API is available for consumers and installers to verify published attestations. This has been in the work...

QuickTime video script to capture frames and bounding boxes
14 Nov 2024 | original ↗

QuickTime video script to capture frames and bounding boxes An update to an older TIL. I'm working on the write-up for my DjangoCon US talk on plugins and I found myself wanting to capture individual frames from the video in two formats: a full frame capture, and another that captured just the portion of the screen shared from my laptop. I have a...

Releasing the largest multilingual open pretraining dataset
14 Nov 2024 | original ↗

Releasing the largest multilingual open pretraining dataset Common Corpus is a new "open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs. This appears to be the largest available corpus of openly licensed training data: 926,541,096,243 tokens of public domain...

Quoting Steve Klabnik
13 Nov 2024 | original ↗

This tutorial exists because of a particular quirk of mine: I love to write tutorials about things as I learn them. This is the backstory of TRPL, of which an ancient draft was "Rust for Rubyists." You only get to look at a problem as a beginner once, and so I think writing this stuff down is interesting. It also helps me clarify what I'm...

Ollama: Llama 3.2 Vision
13 Nov 2024 | original ↗

Ollama: Llama 3.2 Vision Ollama released version 0.4 last week with support for Meta's first Llama vision model, Llama 3.2. If you have Ollama installed you can fetch the 11B model (7.9 GB) like this: ollama pull llama3.2-vision Or the larger 90B model (55GB) like this: ollama pull llama3.2-vision:90b I was delighted to learn that Sukhbinder...

django-plugin-django-debug-toolbar
13 Nov 2024 | original ↗

django-plugin-django-debug-toolbar Tom Viner built a plugin for my DJP Django plugin system that configures the excellent django-debug-toolbar debugging tool. You can see everything it sets up for you in this Python code: it configures installed apps, URL patterns and middleware and sets the INTERNAL_IPS and DEBUG settings. Here are Tom's running...

Ars Live: Our first encounter with manipulative AI
12 Nov 2024 | original ↗

Ars Live: Our first encounter with manipulative AI I'm participating in a live conversation with Benj Edwards on 19th November reminiscing over that incredible time back in February last year when Bing went feral. Via @benjedwards Tags: bing, generative-ai, arstechnica, ai, speaking, llms, benj-edwards

Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac
12 Nov 2024 | original ↗

There's a whole lot of buzz around the new Qwen2.5-Coder Series of open source (Apache 2.0 licensed) LLM releases from Alibaba's Qwen research team. On first impression it looks like the buzz is well deserved. Qwen claim: Qwen2.5-Coder-32B-Instruct has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o....

How I ship projects at big tech companies
11 Nov 2024 | original ↗

How I ship projects at big tech companies This piece by Sean Goedecke on shipping features at larger tech companies is fantastic. Why do so many engineers think shipping is easy? I know it sounds extreme, but I think many engineers do not understand what shipping even is inside a large tech company. What does it mean to ship? It does not mean...

Binary vector embeddings are so cool
11 Nov 2024 | original ↗

Binary vector embeddings are so cool Evan Schwartz: Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup. It's so unintuitive how well this trick works: take a vector of 1024x4 byte floating point...

Quoting Matt Webb
11 Nov 2024 | original ↗

That development time acceleration of 4 days down to 20 minutes… that’s equivalent to about 10 years of Moore’s Law cycles. That is, using generative AI like this is equivalent to computers getting 10 years better overnight. That was a real eye-opening framing for me. AI isn’t magical, it’s not sentient, it’s not the end of the world nor our...

Quoting Grant Slatton
11 Nov 2024 | original ↗

As a junior engineer, there's simply no substitute for getting the first 100K lines of code under your belt. The "start over each day" method will help get you to those 100K lines faster. You might think covering the same ground multiple times isn't as valuable as getting 100K diverse lines of code. I disagree. Solving the same problem repeatedly...

MDN Browser Support Timelines
11 Nov 2024 | original ↗

MDN Browser Support Timelines I complained on Hacker News today that I wished the MDN browser compatibility ables - like this one for the Web Locks API - included an indication as to when each browser was released rather than just the browser numbers. It turns out they do! If you click on each browser version in turn you can see an expanded area...

Everything I've learned so far about running local LLMs
10 Nov 2024 | original ↗

Everything I've learned so far about running local LLMs Chris Wellons shares detailed notes on his experience running local LLMs on Windows - though most of these tips apply to other operating systems as well. This is great, there's a ton of detail here and the root recommendations are very solid: Use llama-server from llama.cpp and try ~8B...

Visualizing local election results with Datasette, Observable and MapLibre GL
9 Nov 2024 | original ↗

Alex Garcia and myself hosted the first Datasette Open Office Hours on Friday - a live-streamed video session where we hacked on a project together and took questions and tips from community members on Discord. We didn't record this one (surprisingly not a feature that Discord offers) but we hope to do more of these and record them in the future....

Quoting fast.ai Discord Server
9 Nov 2024 | original ↗

This is a very friendly and supportive place where you are surrounded by peers - we all want to help each other succeed. The golden rule of this server is: Don't ever try to impress anyone here with your knowledge! Instead try to impress folks here with your desire to learn, and desire to help others learn. — fast.ai Discord Server Tags:...

uv 0.5.0
8 Nov 2024 | original ↗

uv 0.5.0 The first backwards-incompatible (in minor ways) release after 30 releases without a breaking change. I found out about this release this morning when I filed an issue about a fiddly usability problem I had encountered with the combo of uv and conda... and learned that the exact problem had been fixed in the brand new version! Tags:...

ChainForge
8 Nov 2024 | original ↗

ChainForge I'm still on the hunt for good options for running evaluations against prompts. ChainForge offers an interesting approach, calling itself "an open-source visual programming environment for prompt engineering". The interface is one of those boxes-and-lines visual programming tools, which reminds me of Yahoo Pipes. It's open source (from...

Datasette Public Office Hours, Friday Nov 8th at 2pm PT
7 Nov 2024 | original ↗

Datasette Public Office Hours, Friday Nov 8th at 2pm PT Tomorrow afternoon (Friday 8th November) at 2pm PT we'll be hosting the first Datasette Public Office Hours - a livestream video session on Discord where Alex Garcia and myself will live code on some Datasette projects and hang out to chat about the project. This is our first time trying...

Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5
7 Nov 2024 | original ↗

I'm starting a new interview series called Project. The idea is to interview people who are building interesting data projects and talk about what they've built, how they built it, and what they learned along the way. The first episode is a conversation with Rajiv Sinclair from Public Data Works about VERDAD, a brand new project in collaboration...

Quoting Jo Kristian Bergum
7 Nov 2024 | original ↗

If you have worked in search, you know how freaking hard even getting started with something close to this with traditional methods. Now, you can zero-shot it. System Instructions: As a query categorization expert, you try to break down the intent of a search query. First, provide your reasoning and then describe the intent using a single...

yet-another-applied-llm-benchmark
6 Nov 2024 | original ↗

yet-another-applied-llm-benchmark Nicholas Carlini introduced this personal LLM benchmark suite back in February as a collection of over 100 automated tests he runs against new LLM models to evaluate their performance against the kinds of tasks he uses them for. There are two defining features of this benchmark that make it interesting. Most...

Generating documentation from tests using files-to-prompt and LLM
5 Nov 2024 | original ↗

Generating documentation from tests using files-to-prompt and LLM I was experimenting with the wasmtime-py Python library today (for executing WebAssembly programs from inside CPython) and I found the existing API docs didn't quite show me what I wanted to know. The project has a comprehensive test suite so I tried seeing if I could generate...

Quoting NY Times Editorial Board
5 Nov 2024 | original ↗

You already know Donald Trump. He is unfit to lead. Watch him. Listen to those who know him best. He tried to subvert an election and remains a threat to democracy. He helped overturn Roe, with terrible consequences. Mr. Trump's corruption and lawlessness go beyond elections: It's his whole ethos. He lies without limit. If he's re-elected, the...

New OpenAI feature: Predicted Outputs
4 Nov 2024 | original ↗

New OpenAI feature: Predicted Outputs Interesting new ability of the OpenAI API - the first time I've seen this from any vendor. If you know your prompt is mostly going to return the same content - you're requesting an edit to some existing code, for example - you can now send that content as a "prediction" and have GPT-4o or GPT-4o mini use that...

Claude 3.5 Haiku
4 Nov 2024 | original ↗

Anthropic released Claude 3.5 Haiku today, a few days later than expected (they said it would be out by the end of October). I was expecting this to be a complete replacement for their existing Claude 3 Haiku model, in the same way that Claude 3.5 Sonnet eclipsed the existing Claude 3 Sonnet while maintaining the same pricing. Claude 3.5 Haiku is...

Nous Hermes 3
4 Nov 2024 | original ↗

Nous Hermes 3 The Nous Hermes family of fine-tuned models have a solid reputation. Their most recent release came out in August, based on Meta's Llama 3.1: Our training data aggressively encourages the model to follow the system and instruction prompts exactly and in an adaptive manner. Hermes 3 was created by fine-tuning Llama 3.1 8B, 70B and...

Quoting Tom MacWright
3 Nov 2024 | original ↗

Building technology in startups is all about having the right level of tech debt. If you have none, you’re probably going too slow and not prioritizing product-market fit and the important business stuff. If you get too much, everything grinds to a halt. Plus, tech debt is a “know it when you see it” kind of thing, and I know that my definition...

California Clock Change
3 Nov 2024 | original ↗

California Clock Change The clocks go back in California tonight and I finally built my dream application for helping me remember if I get an hour extra of sleep or not, using a Claude Artifact. Here's the transcript. This is one of my favorite examples yet of the kind of tiny low stakes utilities I'm building with Claude Artifacts because the...

Docling
3 Nov 2024 | original ↗

Docling MIT licensed document extraction Python library from the Deep Search team at IBM, who released Docling v2 on October 16th. Here's the Docling Technical Report paper from August, which provides details of two custom models: a layout analysis model for figuring out the structure of the document (sections, figures, text, tables etc) and a...

Claude Token Counter
2 Nov 2024 | original ↗

Claude Token Counter Anthropic released a token counting API for Claude a few days ago. I built this tool for running prompts, images and PDFs against that API to count the tokens in them. The API is free (albeit rate limited), but you'll still need to provide your own API key in order to use it. Here's the source code. I built this using two...

Please publish and share more
2 Nov 2024 | original ↗

Please publish and share more 💯 to all of this by Jeff Triplett: Friends, I encourage you to publish more, indirectly meaning you should write more and then share it. [...] You don’t have to change the world with every post. You might publish a quick thought or two that helps encourage someone else to try something new, listen to a new song, or...

SmolLM2
2 Nov 2024 | original ↗

SmolLM2 New from Loubna Ben Allal and her research team at Hugging Face: SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. [...] It was trained on 11 trillion tokens using a diverse dataset...

From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
1 Nov 2024 | original ↗

From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code Google's Project Zero security team used a system based around Gemini 1.5 Pro to find a previously unreported security vulnerability in SQLite (a stack buffer underflow), in time for it to be fixed prior to making it into a release. A key insight...

Claude API: PDF support (beta)
1 Nov 2024 | original ↗

Claude API: PDF support (beta) Claude 3.5 Sonnet now accepts PDFs as attachments: The new Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) model now supports PDF input and understands both text and visual content within documents. I just released llm-claude-3 0.7 with support for the new attachment type, so now you can do this: llm install...

Quoting Question for Department for Science, Innovation and Technology
1 Nov 2024 | original ↗

Lord Clement-Jones: To ask His Majesty's Government what assessment they have made of the cybersecurity risks posed by prompt injection attacks to the processing by generative artificial intelligence of material provided from outside government, and whether any such attacks have been detected thus far. Lord Vallance of Balham: Security is central...

Control your smart home devices with the Gemini mobile app on Android
1 Nov 2024 | original ↗

Control your smart home devices with the Gemini mobile app on Android Google are adding smart home integration to their Gemini chatbot - so far on Android only. Have they considered the risk of prompt injection? It looks like they have, at least a bit: Important: Home controls are for convenience only, not safety- or security-critical purposes....

Cerebras Coder
31 Oct 2024 | original ↗

Cerebras Coder Val Town founder Steve Krouse has been building demos on top of the Cerebras API that runs Llama3.1-70b at 2,000 tokens/second. Having a capable LLM with that kind of performance turns out to be really interesting. Cerebras Coder is a demo that implements Claude Artifact-style on-demand JavaScript apps, and having it run at that...

Australia/Lord_Howe is the weirdest timezone
31 Oct 2024 | original ↗

Australia/Lord_Howe is the weirdest timezone Lord Howe Island - part of Australia, population 382 - is unique in that the island's standard time zone is UTC+10:30 but is UTC+11 when daylight saving time applies. It's the only time zone where DST represents a 30 minute offset. Via lobste.rs Tags: timezones

Creating a LLM-as-a-Judge that drives business results
30 Oct 2024 | original ↗

Creating a LLM-as-a-Judge that drives business results Hamel Husain's sequel to Your AI product needs evals. This is packed with hard-won actionable advice. Hamel warns against using scores on a 1-5 scale, instead promoting an alternative he calls "Critique Shadowing". Find a domain expert (one is better than many, because you want to keep their...

docs.jina.ai - the Jina meta-prompt
30 Oct 2024 | original ↗

docs.jina.ai - the Jina meta-prompt From Jina AI on Twitter: curl docs.jina.ai - This is our Meta-Prompt. It allows LLMs to understand our Reader, Embeddings, Reranker, and Classifier APIs for improved codegen. Using the meta-prompt is straightforward. Just copy the prompt into your preferred LLM interface like ChatGPT, Claude, or whatever works...

W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October
30 Oct 2024 | original ↗

I try to publish weeknotes at least once every two weeks. It's been four since the last entry, so I guess this one counts as monthnotes instead. In my defense, the reason I've fallen behind on weeknotes is that I've been publishing a lot of long-form blog entries this month. Plentiful LLM vendor news A lot of LLM stuff happened. OpenAI had their...

Bringing developer choice to Copilot with Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview
30 Oct 2024 | original ↗

Bringing developer choice to Copilot with Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview The big announcement from GitHub Universe: Copilot is growing support for alternative models. GitHub Copilot predated the release of ChatGPT by more than year, and was the first widely used LLM-powered tool. This announcement...

Generating Descriptive Weather Reports with LLMs
29 Oct 2024 | original ↗

Generating Descriptive Weather Reports with LLMs Drew Breunig produces the first example I've seen in the wild of the new LLM attachments Python API. Drew's Downtown San Francisco Weather Vibes project combines output from a JSON weather API with the latest image from a webcam pointed at downtown San Francisco to produce a weather report "with a...

You can now run prompts against images, audio and video in your terminal using LLM
29 Oct 2024 | original ↗

I released LLM 0.17 last night, the latest version of my combined CLI tool and Python library for interacting with hundreds of different Large Language Models such as GPT-4o, Llama, Claude and Gemini. The signature feature of 0.17 is that LLM can now be used to prompt multi-modal models - which means you can now use it to send images, audio and...

Matt Webb's Colophon
29 Oct 2024 | original ↗

Matt Webb's Colophon I love a good colophon (here's mine, I should really expand it). Matt Webb has been publishing his thoughts online for 24 years, so his colophon is a delightful accumulation of ideas and principles. So following the principles of web longevity, what matters is the data, i.e. the posts, and simplicity. I want to minimise...

Quoting Panda Smith
28 Oct 2024 | original ↗

If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves. — Panda Smith Tags: search, ai, rag, llms

Hugging Face Hub: Configure progress bars
28 Oct 2024 | original ↗

Hugging Face Hub: Configure progress bars This has been driving me a little bit spare. Every time I try and build anything against a library that uses huggingface_hub somewhere under the hood to access models (most recently trying out MLX-VLM) I inevitably get output like this every single time I execute the model: Fetching 11 files:...

python-imgcat
28 Oct 2024 | original ↗

python-imgcat I was investigating options for displaying images in a terminal window (for multi-modal logging output of LLM) and I found this neat Python library for displaying images using iTerm 2. It includes a CLI tool, which means you can run it without installation using uvx like this: uvx imgcat filename.png Via rich/discussions ...

Prompt GPT-4o audio
28 Oct 2024 | original ↗

Prompt GPT-4o audio A week and a half ago I built a tool for experimenting with OpenAI's new audio input. I just put together the other side of that, for experimenting with audio output. Once you've provided an API key (which is saved in localStorage) you can use this to prompt the gpt-4o-audio-preview model with a system and regular prompt and...

llm-whisper-api
27 Oct 2024 | original ↗

llm-whisper-api I wanted to run an experiment through the OpenAI Whisper API this morning so I knocked up a very quick plugin for LLM that provides the following interface: llm install llm-whisper-api llm whisper-api myfile.mp3 It uses the API key that you previously configured using the llm keys set openai command. If you haven't configured one...

Run a prompt to generate and execute jq programs using llm-jq
27 Oct 2024 | original ↗

llm-jq is a brand new plugin for LLM which lets you pipe JSON directly into the llm jq command along with a human-language description of how you'd like to manipulate that JSON and have a jq program generated and executed for you on the fly. Thomas Ptacek on Twitter: The JQ CLI should just BE a ChatGPT client, so there's no pretense of actually...

Quoting Molly White
26 Oct 2024 | original ↗

As an independent writer and publisher, I am the legal team. I am the fact-checking department. I am the editorial staff. I am the one responsible for triple-checking every single statement I make in the type of original reporting that I know carries a serious risk of baseless but ruinously expensive litigation regularly used to silence...

Mastodon discussion about sandboxing SVG data
26 Oct 2024 | original ↗

Mastodon discussion about sandboxing SVG data I asked this on Mastodon and got some really useful replies: How hard is it to process untrusted SVG data to strip out any potentially harmful tags or attributes (like stuff that might execute JavaScript)? The winner for me turned out to be the humble tag. SVG images that are rendered in an image...

LLM Pictionary
26 Oct 2024 | original ↗

LLM Pictionary Inspired by my SVG pelicans on a bicycle, Paul Calcraft built this brilliant system where different vision LLMs can play Pictionary with each other, taking it in turns to progressively draw SVGs while the other models see if they can guess what the image represents. Tags: vision-llms, svg, generative-ai, ai,...

ChatGPT advanced voice mode can attempt Spanish with a Russian accent
26 Oct 2024 | original ↗

ChatGPT advanced voice mode can attempt Spanish with a Russian accent ChatGPT advanced voice mode may refuse to sing (unless you jailbreak it) but it's quite happy to attempt different accents. I've been having a lot of fun with that: I need you to pretend to be a California brown pelican with a very thick Russian accent, but you talk to me...

Pelicans on a bicycle
25 Oct 2024 | original ↗

Pelicans on a bicycle I decided to roll out my own LLM benchmark: how well can different models render an SVG of a pelican riding a bicycle? I chose that because a) I like pelicans and b) I'm pretty sure there aren't any pelican on a bicycle SVG files floating around (yet) that might have already been sucked into the training data. My prompt:...

llm-cerebras
25 Oct 2024 | original ↗

llm-cerebras Cerebras (previously) provides Llama LLMs hosted on custom hardware at ferociously high speeds. GitHub user irthomasthomas built an LLM plugin that works against their API - which is currently free, albeit with a rate limit of 30 requests per minute for their two models. llm install llm-cerebras llm keys set cerebras # paste key here...

ZombAIs: From Prompt Injection to C2 with Claude Computer Use
25 Oct 2024 | original ↗

ZombAIs: From Prompt Injection to C2 with Claude Computer Use In news that should surprise nobody who has been paying attention, Johann Rehberger has demonstrated a prompt injection attack against the new Claude Computer Use demo - the system where you grant Claude the ability to semi-autonomously operate a desktop computer. Johann's attack is...

Introducing the analysis tool in Claude.ai
24 Oct 2024 | original ↗

Introducing the analysis tool in Claude.ai The Claude.ai consumer-facing interface just shipped a major new feature, which they're calling "the analysis tool". It's their answer to OpenAI's ChatGPT Code Interpreter mode: Claude can now chose to solve models by writing some code, executing that code and then continuing the conversation using the...

Using uv to develop Python command-line applications
24 Oct 2024 | original ↗

Using uv to develop Python command-line applications I've been increasingly using uv to try out new software (via uvx) and experiment with new ideas, but I hadn't quite figured out the right way to use it for developing my own projects. It turns out I was missing a few things - in particular the fact that there's no need to use uv pip at all when...

Julia Evans: TIL
24 Oct 2024 | original ↗

Julia Evans: TIL I've always loved how Julia Evans emphasizes the joy of learning and how you should celebrate every new thing you learn and never be ashamed to admit that you haven't figured something out yet. That attitude was part of my inspiration when I started writing TILs a few years ago. Julia just started publishing TILs too, and I'm...

Quoting Alex Albert
23 Oct 2024 | original ↗

Go to data.gov, find an interesting recent dataset, and download it. Install sklearn with bash tool write a .py file to split the data into train and test and make a classifier for it. (you may need to inspect the data and/or iterate if this goes poorly at first, but don't get discouraged!). Come up with some way to visualize the results of your...

Running prompts against images and PDFs with Google Gemini
23 Oct 2024 | original ↗

Running prompts against images and PDFs with Google Gemini New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl and the base64 -i macOS command. I figured...

Using Rust in non-Rust servers to improve performance
23 Oct 2024 | original ↗

Using Rust in non-Rust servers to improve performance Deep dive into different strategies for optimizing part of a web server application - in this case written in Node.js, but the same strategies should work for Python as well - by integrating with Rust in different ways. The example app renders QR codes, initially using the pure JavaScript...

Quoting Model Card Addendum: Claude 3.5 Haiku and Upgraded Sonnet
23 Oct 2024 | original ↗

We enhanced the ability of the upgraded Claude 3.5 Sonnet and Claude 3.5 Haiku to recognize and resist prompt injection attempts. Prompt injection is an attack where a malicious user feeds instructions to a model that attempt to change its originally intended behavior. Both models are now better able to recognize adversarial prompts from a user...

Claude Artifact Runner
23 Oct 2024 | original ↗

Claude Artifact Runner One of my least favourite things about Claude Artifacts is the way it defaults to writing code in React in a way that's difficult to reuse outside of Artifacts. I start most of my prompts with "no react" so that it will kick out regular HTML and JavaScript instead, which I can then copy out into my tools.simonwillison.net...

Quoting Deirdre Bosa
23 Oct 2024 | original ↗

According to a document that I viewed, Anthropic is telling investors that it is expecting a billion dollars in revenue this year. Third-party API is expected to make up the majority of sales, 60% to 75% of the total. That refers to the interfaces that allow external developers or third parties like Amazon's AWS to build and scale their own AI...

Quoting Mike Isaac and Erin Griffith
23 Oct 2024 | original ↗

OpenAI’s monthly revenue hit $300 million in August, up 1,700 percent since the beginning of 2023, and the company expects about $3.7 billion in annual sales this year, according to financial documents reviewed by The New York Times. [...] The company expects ChatGPT to bring in $2.7 billion in revenue this year, up from $700 million in 2023,...

Wayback Machine: Models - Anthropic (8th October 20240
22 Oct 2024 | original ↗

Wayback Machine: Models - Anthropic (8th October 20240 The Internet Archive is only intermittently available at the moment, but the Wayback Machine just came back long enough for me to confirm that the Anthropic Models documentation page listed Claude 3.5 Opus as coming “Later this year” at least as recently as the 8th of October, but today makes...

Quoting Anthropic
22 Oct 2024 | original ↗

For the same cost and similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses even Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks. Claude 3.5 Haiku is particularly strong on coding tasks. For example, it scores 40.6% on SWE-bench Verified, outperforming many...

Initial explorations of Anthropic's new Computer Use capability
22 Oct 2024 | original ↗

Two big announcements from Anthropic today: a new Claude 3.5 Sonnet model and a new API mode that they are calling computer use. (They also pre-announced Haiku 3.5, but that's not available yet so I'm ignoring it until I can try it out myself.) Computer use is really interesting. Here's what I've figured out about it so far. You provide the...

Apple's Knowledge Navigator concept video (1987)
22 Oct 2024 | original ↗

Apple's Knowledge Navigator concept video (1987) I learned about this video today while engaged in my irresistible bad habit of arguing about whether or not "agents" means anything useful. It turns out CEO John Sculley's Apple in 1987 promoted a concept called Knowledge Navigator (incorporating input from Alan Kay) which imagined a future where...

This prompt can make an AI chatbot identify and extract personal details from your chats
22 Oct 2024 | original ↗

This prompt can make an AI chatbot identify and extract personal details from your chats Matt Burgess in Wired magazine writes about a new prompt injection / Markdown exfiltration variant called Imprompter, described in the new paper Imprompter: Tricking LLM Agents into Improper Tool Use. The paper describes an exfiltration attack against...

sudoku-in-python-packaging
21 Oct 2024 | original ↗

sudoku-in-python-packaging Absurdly clever hack by konsti: solve a Sudoku puzzle entirely using the Python package resolver! First convert the puzzle into a requirements.in file representing the current state of the board: git clone https://github.com/konstin/sudoku-in-python-packaging cd sudoku-in-python-packaging echo...

Everything I built with Claude Artifacts this week
21 Oct 2024 | original ↗

I'm a huge fan of Claude's Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code. I was digging around in my Claude activity...

Dashboard: Tools
21 Oct 2024 | original ↗

Dashboard: Tools I used Django SQL Dashboard to spin up a dashboard that shows all of the URLs to my tools.simonwillison.net site that I've shared on my blog so far. It uses this (Claude assisted) regular expression in a PostgreSQL SQL query: select distinct on (tool_url) unnest(regexp_matches( body, ...

Knowledge Worker
20 Oct 2024 | original ↗

Knowledge Worker Forrest Brazeal: Last month, I performed a 30-minute show called "Knowledge Worker" for the incredible audience at Gene Kim's ETLS in Las Vegas. The show included 7 songs about the past, present, and future of "knowledge work" - or, more specifically, how it's affecting us, the humans between keyboard and chair. I poured...

Quoting John Gruber
20 Oct 2024 | original ↗

I really dislike the practice of replacing passwords with email “magic links”. Autofilling a password from my keychain happens instantly; getting a magic link from email can take minutes sometimes, and even in the fastest case, it’s nowhere near instantaneous. Replacing something very fast — password autofill — with something slower is just a...

The 3 AI Use Cases: Gods, Interns, and Cogs
20 Oct 2024 | original ↗

The 3 AI Use Cases: Gods, Interns, and Cogs Drew Breunig introduces an interesting new framework for categorizing use cases of modern AI: Gods refers to the autonomous, human replacement applications - I see that as AGI stuff that's still effectively science fiction. Interns are supervised copilots. This is how I get most of the value out of LLMs...

Quoting Jens Ohlig
20 Oct 2024 | original ↗

Who called it “intellectual property problems around the acquisition of training data for Large Language Models” and not Grand Theft Autocomplete? — Jens Ohlig, on March 8th 2024 Tags: training-data, llms, ai, generative-ai

Quoting Jacob Kaplan-Moss
20 Oct 2024 | original ↗

It feels like we’re at a bit of an inflection point for the Django community. [...] One of the places someone could have the most impact is by serving on the DSF Board. Like the community at large, the DSF is at a transition point: we’re outgrowing the “small nonprofit” status, and have the opportunity to really expand our ambition and reach. In...

You can use text-wrap: balance; on icons
20 Oct 2024 | original ↗

You can use text-wrap: balance; on icons Neat CSS experiment from Terence Eden: the new text-wrap: balance CSS property is intended to help make text like headlines display without ugly wrapped single orphan words, but Terence points out it can be used for icons too: This inspired me to investigate if the same technique could work for text based...

mistral.rs
19 Oct 2024 | original ↗

mistral.rs Here's an LLM inference library written in Rust. It's not just for that one family of models - like how llama.cpp has grown beyond Llama, mistral.rs has grown beyond Mistral. This is the first time I've been able to run the Llama 3.2 vision model on my own Mac M2 laptop: git clone https://github.com/EricLBuehler/mistral.rs.git cd...

Experimenting with audio input and output for the OpenAI Chat Completion API
18 Oct 2024 | original ↗

OpenAI promised this at DevDay a few weeks ago and now it's here: their Chat Completion API can now accept audio as input and return it as output. OpenAI still recommend their WebSocket-based Realtime API for audio tasks, but the Chat Completion API is a whole lot easier to write code against. Generating audio Audio input via a Bash script ...

Quoting D. Richard Hipp
18 Oct 2024 | original ↗

I'm of the opinion that you should never use mmap, because if you get an I/O error of some kind, the OS raises a signal, which SQLite is unable to catch, and so the process dies. When you are not using mmap, SQLite gets back an error code from an I/O error and is able to take remedial action, or at least compose an error message. — D. Richard...

Using static websites for tiny archives
17 Oct 2024 | original ↗

Using static websites for tiny archives Alex Chan: Over the last year or so, I’ve been creating static websites to browse my local archives. I’ve done this for a variety of collections, including: paperwork I’ve scanned documents I’ve created screenshots I’ve taken web pages I’ve bookmarked video and audio files I’ve saved This is such a neat...

New in NotebookLM: Customizing your Audio Overviews
17 Oct 2024 | original ↗

New in NotebookLM: Customizing your Audio Overviews The most requested feature for Google's NotebookLM "audio overviews" (aka automatically generated podcast conversations) has been the ability to provide direction to those artificial podcast hosts - setting their expertise level or asking them to focus on specific topics. Today's update adds...

Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent
17 Oct 2024 | original ↗

The other day I found myself needing to add up some numeric values that were scattered across twelve different emails. I didn't particularly feel like copying and pasting all of the numbers out one at a time, so I decided to try something different: could I record a screen capture while browsing around my Gmail account and then extract the...

Gemini API Additional Terms of Service
17 Oct 2024 | original ↗

Gemini API Additional Terms of Service I've been trying to figure out what Google's policy is on using data submitted to their Google Gemini LLM for further training. It turns out it's clearly spelled out in their terms of service, but it differs for the paid v.s. free tiers. The paid APIs do not train on your inputs: When you're using Paid...

files-to-prompt 0.4
16 Oct 2024 | original ↗

files-to-prompt 0.4 New release of my files-to-prompt tool adding an option for filtering just for files with a specific extension. The following command will output Claude XML-style markup for all Python and Markdown files in the current directory, and copy that to the macOS clipboard ready to be pasted into an LLM: files-to-prompt . -e py -e md...

2025 DSF Board Nominations
16 Oct 2024 | original ↗

2025 DSF Board Nominations The Django Software Foundation board elections are coming up. There are four positions open, seven directors total. Terms last two years, and the deadline for submitting a nomination is October 25th (the date of the election has not yet been decided). Several community members have shared "DSF initiatives I'd like to...

Supercharge the One Person Framework with SQLite: Rails World 2024
16 Oct 2024 | original ↗

Supercharge the One Person Framework with SQLite: Rails World 2024 Stephen Margheim shares an annotated transcript of the YouTube video of his recent talk at this year's Rails World conference in Toronto. The Rails community is leaning hard into SQLite right now. Stephen's talk is some of the most effective evangelism I've seen anywhere for...

[red-knot] type inference/checking test framework
16 Oct 2024 | original ↗

[red-knot] type inference/checking test framework Ruff maintainer Carl Meyer recently landed an interesting new design for a testing framework. It's based on Markdown, and could be described as a form of "literate testing" - the testing equivalent of Donald Knuth's literate programming. A markdown test file is a suite of tests, each test can...

Un Ministral, des Ministraux
16 Oct 2024 | original ↗

Un Ministral, des Ministraux Two new models from Mistral: Ministral 3B and Ministral 8B (joining Mixtral, Pixtral, Codestral and Mathstral as weird naming variants on the Mistral theme. These models set a new frontier in knowledge, commonsense, reasoning, function-calling, and efficiency in the sub-10B category, and can be used or tuned to a...

Quoting François Chollet
16 Oct 2024 | original ↗

A common misconception about Transformers is to believe that they're a sequence-processing architecture. They're not. They're a set-processing architecture. Transformers are 100% order-agnostic (which was the big innovation compared to RNNs, back in late 2016 -- you compute the full matrix of pairwise token interactions instead of processing one...

The XOXO 2024 Talks
15 Oct 2024 | original ↗

The XOXO 2024 Talks I missed attending the last XOXO in person, but I've been catching up on the videos of the talks over the past few days and they have been absolutely worth spending time with. This year was a single day with ten speakers. Andy Baio explains the intended formula: I usually explain that the conference is about, more than...

Quoting David Heinemeier Hansson
15 Oct 2024 | original ↗

The problem with passkeys is that they're essentially a halfway house to a password manager, but tied to a specific platform in ways that aren't obvious to a user at all, and liable to easily leave them unable to access of their accounts. [...] Chrome on Windows stores your passkeys in Windows Hello, so if you sign up for a service on Windows,...

PATH tips on wizard zines
15 Oct 2024 | original ↗

PATH tips on wizard zines New Julia Evans comic, from which I learned that the which -a X command shows you all of the versions of that command that are available in the directories on your current PATH. This is so useful! I used it to explore my currently available Python versions: $ which -a python ...

ChatGPT will happily write you a thinly disguised horoscope
15 Oct 2024 | original ↗

There's a meme floating around at the moment where you ask ChatGPT the following, and it appears to offer deep insight into your personality: From all of our interactions what is one thing that you can tell me about myself that I may not know about myself Don't be fooled into thinking there's anything deep going on here. It's effectively acting...

My Jina Reader tool
14 Oct 2024 | original ↗

My Jina Reader tool I wanted to feed the Cloudflare Durable Objects SQLite documentation into Claude, but I was on my iPhone so copying and pasting was inconvenient. Jina offer a Reader API which can turn any URL into LLM-friendly Markdown and it turns out it supports CORS, so I got Claude to build me this tool (source code). Paste in a URL to...

Grant Negotiation and Authorization Protocol (GNAP)
14 Oct 2024 | original ↗

Grant Negotiation and Authorization Protocol (GNAP) RFC 9635 was published a few days ago. GNAP is effectively OAuth 3 - it's a newly standardized design for a protocol for delegating authorization so an application can access data on your behalf. The most interesting difference between GNAP and OAuth 2 is that GNAP no longer requires clients to...

I Was A Teenage Foot Clan Ninja
14 Oct 2024 | original ↗

I Was A Teenage Foot Clan Ninja My name is Danny Pennington, I am 48 years old, and between 1988 in 1995 I was a ninja in the Foot Clan. I enjoyed this TMNT parody a lot. Tags: youtube

Zero-latency SQLite storage in every Durable Object
13 Oct 2024 | original ↗

Zero-latency SQLite storage in every Durable Object Kenton Varda introduces the next iteration of Cloudflare's Durable Object platform, which recently upgraded from a key/value store to a full relational system based on SQLite. For useful background on the first version of Durable Objects take a look at Cloudflare's durable multiplayer moat by...

An LLM TDD loop
13 Oct 2024 | original ↗

An LLM TDD loop Super neat demo by David Winterbottom, who wrapped my LLM and files-to-prompt tools in a short Bash script that can be fed a file full of Python unit tests and an empty implementation file and will then iterate on that file in a loop until the tests pass. Via @codeinthehole Tags: llm, ai-assisted-programming, python,...

PostgreSQL 17: SQL/JSON is here!
13 Oct 2024 | original ↗

PostgreSQL 17: SQL/JSON is here! Hubert Lubaczewski dives into the new JSON features added in PostgreSQL 17, released a few weeks ago on the 26th of September. This is the latest in his long series of similar posts about new PostgreSQL features. The features are based on the new SQL:2023 standard from June 2023. If you want to actually read the...

jefftriplett/django-startproject
12 Oct 2024 | original ↗

jefftriplett/django-startproject Django's django-admin startproject and startapp commands include a --template option which can be used to specify an alternative template for generating the initial code. Jeff Triplett actively maintains his own template for new projects, which includes the pattern that I personally prefer of keeping settings and...

Perks of Being a Python Core Developer
12 Oct 2024 | original ↗

Perks of Being a Python Core Developer Mariatta Wijaya provides a detailed breakdown of the exact capabilities and privileges that are granted to Python core developers - including commit access to the Python main, the ability to write or sponsor PEPs, the ability to vote on new core developers and for the steering council election and financial...

Python 3.13's best new features
12 Oct 2024 | original ↗

Python 3.13's best new features Trey Hunner highlights some Python 3.13 usability improvements I had missed, mainly around the new REPL. Pasting a block of code like a class or function that includes blank lines no longer breaks in the REPL - particularly useful if you frequently have LLMs write code for you to try out. Hitting F2 in the REPL...

Quoting Michael Wooldridge
12 Oct 2024 | original ↗

Carl Hewitt recently remarked that the question what is an agent? is embarrassing for the agent-based computing community in just the same way that the question what is intelligence? is embarrassing for the mainstream AI community. The problem is that although the term is widely used, by many people working in closely related areas, it defies...

Quoting James Cham
12 Oct 2024 | original ↗

Frankenstein is a terrific book partly based on how concerned people were about electricity. It captures our fears about the nature of being human but didn’t help anyone really come up with better policies for dealing with electricity. I worry that a lot of AI critics are doing the same thing. — James Cham Tags: ai

Cabel Sasser at XOXO
12 Oct 2024 | original ↗

Cabel Sasser at XOXO I cannot recommend this talk highly enough for the way it ends. After watching the video dive into this new site that accompanies the talk - an online archive of the works of commercial artist Wes Cook. I too would very much love to see a full scan of The Lost McDonalds Satire Triptych. Via Andy Baio Tags: cabel-sasser

lm.rs: run inference on Language Models locally on the CPU with Rust
11 Oct 2024 | original ↗

lm.rs: run inference on Language Models locally on the CPU with Rust Impressive new LLM inference implementation in Rust by Samuel Vitorino. I tried it just now on an M2 Mac with 64GB of RAM and got very snappy performance for this Q8 Llama 3.2 1B, with Activity Monitor reporting 980% CPU usage over 13 threads. Here's how I compiled the library...

$2 H100s: How the GPU Bubble Burst
11 Oct 2024 | original ↗

$2 H100s: How the GPU Bubble Burst Fascinating analysis from Eugene Cheah, founder of LLM hosting provider Featherless, discussing GPU economics over the past 12 months. TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new...

Quoting Mike Caulfield
11 Oct 2024 | original ↗

The primary use of “misinformation” is not to change the beliefs of other people at all. Instead, the vast majority of misinformation is offered as a service for people to maintain their beliefs in face of overwhelming evidence to the contrary. — Mike Caulfield, via Charlie Warzel Tags: misinformation

HTML for People
11 Oct 2024 | original ↗

HTML for People Blake Watson's brand new HTML tutorial, presented as a free online book (CC BY-NC-SA 4.0, on GitHub). This seems very modern and well thought-out to me. It focuses exclusively on HTML, skipping JavaScript entirely and teaching with Simple.css to avoid needing to dig into CSS while still producing sites that are pleasing to look...

Quoting Ed Yong
11 Oct 2024 | original ↗

Providing validation, strength, and stability to people who feel gaslit and dismissed and forgotten can help them feel stronger and surer in their decisions. These pieces made me understand that journalism can be a caretaking profession, even if it is never really thought about in those terms. It is often framed in terms of antagonism. Speaking...

Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning
10 Oct 2024 | original ↗

Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning Most text embeddings models suffer from a "language gap", where phrases in different languages with the same semantic meaning end up with embedding vectors that aren't clustered together. Jina claim their new jina-embeddings-v3 (CC BY-NC 4.0, which means you need to...

Announcing Deno 2
10 Oct 2024 | original ↗

Announcing Deno 2 The big focus of Deno 2 is compatibility with the existing Node.js and npm ecosystem: Deno 2 takes all of the features developers love about Deno 1.x — zero-config, all-in-one toolchain for JavaScript and TypeScript development, web standard API support, secure by default — and makes it fully backwards compatible with Node and...

Forums are still alive, active, and a treasure trove of information
9 Oct 2024 | original ↗

Forums are still alive, active, and a treasure trove of information Chris Person: When I want information, like the real stuff, I go to forums. Over the years, forums did not really get smaller, so much as the rest of the internet just got bigger. Reddit, Discord and Facebook groups have filled a lot of that space, but there is just certain...

Free Threaded Python With Asyncio
9 Oct 2024 | original ↗

Free Threaded Python With Asyncio Jamie Chang expanded my free-threaded Python experiment from a few months ago to explore the interaction between Python's asyncio and the new GIL-free build of Python 3.13. The results look really promising. Jamie says: Generally when it comes to Asyncio, the discussion around it is always about the performance...

The Fair Source Definition
9 Oct 2024 | original ↗

The Fair Source Definition Fail Source (fair.io) is the new-ish initiative from Chad Whitacre and Sentry aimed at providing an alternative licensing philosophy that provides additional protection for the business models of companies that release their code. I like that they're establishing a new brand for this and making it clear that it's a...

otterwiki
9 Oct 2024 | original ↗

otterwiki It's been a while since I've seen a new-ish Wiki implementation, and this one by Ralph Thesen is really nice. It's written in Python (Flask + SQLAlchemy + mistune for Markdown + GitPython) and keeps all of the actual wiki content as Markdown files in a local Git repository. The installation instructions are a little in-depth as they...

openai/openai-realtime-console
9 Oct 2024 | original ↗

openai/openai-realtime-console I got this OpenAI demo repository working today - it's an extremely easy way to get started playing around with the new Realtime voice API they announced at DevDay last week: cd /tmp git clone https://github.com/openai/openai-realtime-console cd openai-realtime-console npm i npm start That starts a localhost:3000...

If we had $1,000,000…
8 Oct 2024 | original ↗

If we had $1,000,000… Jacob Kaplan-Moss gave my favorite talk at DjangoCon this year, imagining what the Django Software Foundation could do if it quadrupled its annual income to $1 million and laying out a realistic path for getting there. Jacob suggests leaning more into large donors than increasing our small donor base: It’s far easier for me...

Anthropic: Message Batches (beta)
8 Oct 2024 | original ↗

Anthropic: Message Batches (beta) Anthropic now have a batch mode, allowing you to send prompts to Claude in batches which will be processed within 24 hours (though probably much faster than that) and come at a 50% price discount. This matches the batch models offered by OpenAI and by Google Gemini, both of which also provide a 50% discount. ...

Django Commons
8 Oct 2024 | original ↗

Django Commons Django Commons is a really promising initiative started by Tim Schilling, aimed at the problem of keeping key Django community projects responsibly maintained on a long-term basis. Django Commons is an organization dedicated to supporting the community's efforts to maintain packages. It seeks to improve the maintenance experience...

Thoughts on the Treasurer Role at Tech NonProfits
7 Oct 2024 | original ↗

Thoughts on the Treasurer Role at Tech NonProfits Will Vincent, Django Software Foundation treasurer from 2020-2022, explains what’s involved in the non-profit role with the highest level of responsibility and trust. Tags: dsf, django

What's New In Python 3.13
7 Oct 2024 | original ↗

What's New In Python 3.13 It's Python 3.13 release day today. The big signature features are a better REPL with improved error messages, an option to run Python without the GIL and the beginnings of the new JIT. Here are some of the smaller highlights I spotted while perusing the release notes. iOS and Android are both now Tier 3 supported...

What's New in Ruby on Rails 8
7 Oct 2024 | original ↗

What's New in Ruby on Rails 8 Rails 8 takes SQLite from a lightweight development tool to a reliable choice for production use, thanks to extensive work on the SQLite adapter and Ruby driver. With the introduction of the solid adapters discussed above, SQLite now has the capability to power Action Cable, Rails.cache, and Active Job effectively,...

Datasette 0.65
7 Oct 2024 | original ↗

Datasette 0.65 Python 3.13 was released today, which broke compatibility with the Datasette 0.x series due to an issue with an underlying dependency. I've fixed that problem by vendoring and fixing the dependency and the new 0.65 release works on Python 3.13 (but drops support for Python 3.8, which is EOL this month). Datasette 1.0a16 added...

↑ These items are from RSS. Visit the blog itself at http://simonwillison.net/ to find everything else and to appreciate author's digital home.