First Token Cutoff LLM sampling

from blog antirez, 12 Jan 2024 | ↗ original

From a theoretical standpoint, the best reply provided by an LLM is obtained by always picking the token associated with the highest probability. This approach makes the LLM output deterministic, which is not a good property for a number of applications. For this reason, in order to balance LLMs creativity while preserving adherence to the...

This is a short summary. ↗ Open original to view full content

Sampling for Text Generation

Chip Huyen | original ↗

Analyzing GPT-4 Tokens

Koen van Gilst | original ↗

Is creativity nothing more than a little randomness?

Koen van Gilst | original ↗

Llama 3.2: New Edge AI and Vision Models

Tao of Mac | original ↗

How LLMs Work, Explained Without Math

https://blog.miguelgrinberg.com/feed | original ↗

How Chain of Thought Prompting Boosts LLM Performance

Stanislav Khromov | original ↗

Speculative Decoding and Beyond: A Survey of Speculative Decoding Techniques

Confessions of a Code Addict | original ↗

Remember, the computers don’t think

Birchtree | original ↗

Logic Through the Lens of Neural Networks

cprimozic.net Blog | original ↗

Layer-wise inferencing + batching: Small VRAM doesn't limit LLM throughput anymore

Languages and Architecture | original ↗

More from antirez

From where I left

10 Dec 2024 | original ↗

I’m not the kind of person that develops a strong attachment to their own work. When I decided to leave Redis, about 1620 days ago (~ 4.44 years), I never looked at the source code, commit messages, or anything related to Redis again. From time to time, when I needed Redis, I just downloaded it and compiled it. I just typed “make” and I was very...

Playing audio files in a Pi Pico without a DAC

6 Mar 2024 | original ↗

The Raspberry Pico is suddenly becoming my preferred chip for embedded development. It is well made, durable hardware, with a ton of features that appear designed with smartness and passion (the state machines driving the GPIOs are a killer feature!). Its main weakness, the lack of connectivity, is now resolved by the W variant. The data sheet is...

Translating blog posts with GPT-4, or: on hope and fear

9 Jan 2024 | original ↗

My usual process for writing blog posts is more or less in two steps: 1. Think about what I want to say for weeks or months. No, I don’t spend weeks focusing on a blog post, the process is exactly reversed: I write blog posts about things that are so important to me to be in my mind for weeks. 2. Then, once enough ideas collapsed together in a...

LLMs and Programming in the first days of 2024

2 Jan 2024 | original ↗

I'll start by saying that this article is not meant to be a retrospective on LLMs. It's clear that 2023 was a special year for artificial intelligence: to reiterate that seems rather pointless. Instead, this post aims to be a testimony from an individual programmer. Since the advent of ChatGPT, and later by using LLMs that operate locally, I have...

The origins of the Idle Scan

19 Oct 2023 | original ↗

The Idle scan was conceived at the end of 1998, evidenced by emails. I had moved to Milan a few months prior, having been there since September if I recall correctly, brimming with new ideas, unaware that my stay in that city would be brief. I spent the summer on the beaches of Sicily, mainly occupied with reading many books recommended by the...

First Token Cutoff LLM sampling

Related

More from antirez