The ryg blog

UNORM and SNORM to float, hardware edition

24 Dec 2024 | original ↗

I mentioned in a previous post that doing exact UNORM or SNORM conversions to float in hardware was not particularly expensive, but didn’t go into detail how. Let’s rectify that! (If you haven’t read that post yet, please start there if you need an explanation of what the UNORM and SNORM formats are, as well […]

MRSSE

15 Nov 2024 | original ↗

For BC6H encoding in Oodle Texture, we needed a sensible error metric to use in the encoder core. BC6H is HDR which is more challenging than LDR data since we need to handle vast differences in magnitude. BC6H internally essentially treats the float16 bits as 16-bit integers (which is a semi-logarithmic mapping) and works with […]

Exact UNORM8 to float

7 Nov 2024 | original ↗

GPUs support UNORM formats that represent a number inside [0,1] as an 8-bit unsigned integer. In exact arithmetic, the conversion to a floating-point number is straightforward: take the integer and divide it by 255. 8-bit integers are for sure machine numbers (exactly represented) in float32 and so is 255, so if you’re willing to do […]

BC7 optimal solid-color blocks

4 Nov 2024 | original ↗

That’s right, it’s another texture compression blog post! I’ll keep it short. By “solid-color block”, I mean a 4×4 block of pixels that all have the same color. ASTC has a dedicated encoding for these (“void-extent blocks”), BC7 does not. Therefore we have an 8-bit RGBA input color and want to figure out how to […]

Why those particular integer multiplies?

26 Oct 2024 | original ↗

The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs has certain idiosyncrasies that have been there for literally over 25 years at this point. I don’t actually have any inside information, but it’s fun to...

Inserting a 0 bit in the middle of a value

25 Oct 2024 | original ↗

This one originally came up for me in Oodle Texture’s BC7 decoder. In the BC7 format, each pixel within a 4×4 block can choose from a limited set of between 4 to 16 colors (ignoring some caveats like the dual-index modes that don’t matter here) and consequently between 2 and 4 bits per pixel are […]

Zero or sign extend

24 Oct 2024 | original ↗

A while back I had to deal with a bit-packed format that contained a list of integer values encoded in one of a pre-defined sets of bit widths, where both the allowed bit widths and the signed-ness were denoted in a header elsewhere. These values never got particularly long (the largest bit width I needed […]

When is a BCn/ASTC endpoints-from-indices solve singular?

29 Aug 2024 | original ↗

This is a result I must have re-derived at least 4 times by now in various ways, but this time I’m writing it down so I just have a link next time. All right. If you’re encoding a BCn or ASTC block and are trying to find optimal endpoints (in a least-squares sense) for a […]

Oodle, Kraken etc. misconceptions

8 Aug 2024 | original ↗

Hi. I’m Fabian “ryg” Giesen, one of the co-authors of Oodle, originally made and sold by RAD Game Tools, now (after an acquisition in late 2020) officially Epic Games Tools as per the business license. Everyone (even at Epic) still mostly refers to us as RAD though; it’s been a long-standing name and nobody sees […]

Entropy decoding in Oodle Data: x86-64 6-stream Huffman decoders

30 Oct 2023 | original ↗

It’s been a while! Last time, I went over how the 3-stream Huffman decoders in Oodle Data work. The 3-stream layout is what we originally went with. It gives near-ideal performance on the last game console generation’s AMD Jaguar cores and is also a good fit for 32-bit x86 and ARM targets, both of which […]

Computational complexity of texture encoding

22 Jul 2023 | original ↗

Most standard texture compression formats use a type of algorithmic vector quantization (meaning that instead of storing an explicit codebook of possible blocks, the codebook entries are determined by an algorithm that uses the encoded block as an input). This is the case for all the BCn formats, ETC/EAC, and ASTC, but not PVRTC, where […]

A very brief BitKnit retrospective

7 May 2023 | original ↗

UPDATE May 7, 2023: I wrote this post yesterday somewhat in a huff (for reasons not worth going into) and the original post contained several inaccuracies. These have been corrected in this version and I’ve marked the places where I was wrong [like this]. Let’s just say I wish I hadn’t posted this in its […]

Notes on FFTs: for implementers

20 Mar 2023 | original ↗

In the previous post I’ve talked about things you might want to know as someone who uses FFTs, this part covers all kinds of FFT implementation details, including the underlying reasons for a lot of the API complexities that showed up last time. I’ll also give some recommendations on what I think are good ways […]

Notes on FFTs: for users

19 Mar 2023 | original ↗

I was just looking over SIMD FFT code I wrote in 2015 for Bink Audio and Miles Sound System to replace the old, all-scalar implementation we had been using since (presumably) the late 90s. That in turn reminded me of writing that code originally, reading a lot of papers on the subject, and how I […]

What’s that magic computation in stb__RefineBlock?

8 Nov 2022 | original ↗

Back in 2007 I wrote my DXT1/5 (aka BC1/3) encoder rygdxt, originally for “fr-041: debris” (so it was size-constrained). A bit later I put up the source and Sean Barrett adapted it into “stb_dxt”, which is probably the form that most know it in today. It’s a simple BC1 encoder that gives decent quality, the […]

On AlphaTensor’s new matrix multiplication algorithms

7 Oct 2022 | original ↗

Two acquaintances independently asked about this today, so it seems worth a write-up: recently (as of this writing), DeepMind published a new paper about a new practical fast matrix multiplication algorithm, along with a press release that is a bit misleading and seems to have led to some confusion already. In particular, while the paper […]

Morton codes addendum

9 Sept 2022 | original ↗

I wrote about Morton codes long ago, and I see that post and the accompanying code referenced frequently, but there’s a few points I want to make about it. First, if you’re on a x86 CPU with BMI2 support, you have access to PDEP and PEXT, which make Morton encoding/decoding trivial. 2D Morton encoding one […]

Related blogs