The LAVA Synthetic Bug Corpora
Related
More from Push the Red Button
TL;DR: After noticing an annoying warning, I went on an absurd yak shave, and discovered that because of a tiny handful of Python packages built with an appealing-sounding but dangerous compiler option, more than 2,500 Python packages—some with more than a million downloads per month—could end up causing any program that uses them to compute...
As part of my ongoing attempts to create some nice datasets for training large code models for C/C++, I've recently been attempting to build every package in Debian Unstable from source using bear to log the compilation and generate a compile_commands.json database for each build. Since it's not possible, in general, to parse C/C++ code without...
I suspect a lot of people in academia end up having a lot of ideas and projects that went nowhere for any number of reasons – maybe there were insurmountable technical challenges, maybe the right person to work on it never materialized, or maybe it just got crowded out by other projects and never picked back up. Here are a couple of mine. For...
Summary: recently published results on the LAVA-M synthetic bug dataset are exciting. However, I show that much simpler techniques can also do startlingly well on this dataset; we need to be cautious in our evaluations and not rely too much on getting a high score on a single benchmark. A New Record The LAVA synthetic bug corpora have been...
Every year the NYU School of Engineering hosts Cyber Security Awareness Week (CSAW) – the largest student-run security event in the country. This year, we're trying something new that combines two of my favorite things: security and open source. The inaugural Security: Open Source (SOS) workshop, held this November 10 at NYU Tandon will feature...