TL;DR: After noticing an annoying warning, I went on an absurd yak shave, and discovered that because of a tiny handful of Python packages built with an appealing-sounding but dangerous compiler option, more than 2,500 Python packages—some with more than a million downloads per month—could end up causing any program that uses them to compute...
As part of my ongoing attempts to create some nice datasets for training large code models for C/C++, I've recently been attempting to build every package in Debian Unstable from source using bear to log the compilation and generate a compile_commands.json database for each build. Since it's not possible, in general, to parse C/C++ code without...
I suspect a lot of people in academia end up having a lot of ideas and projects that went nowhere for any number of reasons – maybe there were insurmountable technical challenges, maybe the right person to work on it never materialized, or maybe it just got crowded out by other projects and never picked back up. Here are a couple of mine. For...
Summary: recently published results on the LAVA-M synthetic bug dataset are exciting. However, I show that much simpler techniques can also do startlingly well on this dataset; we need to be cautious in our evaluations and not rely too much on getting a high score on a single benchmark. A New Record The LAVA synthetic bug corpora have been...
Every year the NYU School of Engineering hosts Cyber Security Awareness Week (CSAW) – the largest student-run security event in the country. This year, we're trying something new that combines two of my favorite things: security and open source. The inaugural Security: Open Source (SOS) workshop, held this November 10 at NYU Tandon will feature...
I'm planning a longer post discussing how we evaluated the LAVA bug injection system, but since we've gotten approval to release the test corpora I wanted to make them available right away. The corpora described in the paper, LAVA-1 and LAVA-M, can be downloaded here: http://panda.moyix.net/~moyix/lava_corpus.tar.xz (101M) Quoting from the...
Using one of the test cases from the previous post, I examine what affects AFL's ability to find a bug placed by LAVA in a program. Along the way, I found what's probably a harmless bug in AFL, and some interesting factors that affect its performance. Although its interface is admirably simple, AFL can still require some tuning, and unexpected...
This is the second in a series of posts about evaluating and improving bug detection software by automatically injecting bugs into programs. Part one, which discussed the setting and motivation, is available here. Now that we understand why we might want to automatically add bugs to programs, let's look at how we can actually do it. We'll first...
This is the first in a series of posts about evaluating and improving bug detection software by automatically injecting bugs into programs. You can find part two, with technical details of our bug injection technique, here. In this series of posts, I'm going to describe how to automatically put bugs in programs, a topic on which we just published...
It's been a very long time coming, but over the holiday break I went through and created basic documentation for all 54 currently-available PANDA plugins. Each plugin now includes a manpage-style document named USAGE.md in its plugin directory. You can find a master list of each plugin and a link to its man page here:...
The PANDA Virtual machine has once again been updated, and you can download it from: http://laredo-13.mit.edu/~brendan/pandavm-20151002.ova Notable changes: We fixed a record/replay bug that was preventing Debian Wheezy and above from replaying properly. The QEMU GDB stub now works during replay, so you can break, step, etc. at various points...
System calls are of great interest to researchers studying malware, because they are the only way that malware can have any effect on the world – writing files to the hard drive, manipulating the registry, sending network packets, and so on all must be done by making a call into the kernel. In Windows, the system call interface is not publicly...
When I wrote about some of the lessons learned from PANDA Malrec's first 100 days of operation, one of the things I mentioned was that the storage requirements for the system were extremely high. In the four months since, the storage problem only got worse: as of last week, we were storing 24,000 recordings of malware, coming in at a whopping 2.4...
It's now been a little over 100 days since I started running malware samples in PANDA and making the executions publicly available. In that time, we've analyzed 10,794 pieces of malware, which generated: 10,794 record/replay logs, representing 226,163,195,948,195 instructions executed 10,794 packet captures, totaling 26GB of data and 33,968,944...
The PANDA virtual machine has been updated to the latest version of PANDA, which corresponds to commit ce866e1508719282b970da4d8a2222f29f959dcd. You can download it here: http://laredo-13.mit.edu/~brendan/pandavm-20150413.tar.bz2 Some notable changes: The taint system has been rewritten and is now available as the taint2 plugin. It is at least...
Summary: With help from GTISC, I have begun running 100 malware samples per day and posting the PANDA record & replay logs online at http://panda.gtisc.gatech.edu/malrec/. The goal is to lower the barriers to entry for doing dynamic malware research, and to make such research reproducible. Today, I spoke at the ACSAC Malware Memory Forensics...
Regin, a piece of state-sponsored malware that may have been used to attack telecoms and cryptographers, has recently come to light. There are several good writeups out there, and I encourage you to check them out. Getting access to samples in cases like this is often a challenge. Luckily, both The Intercept and VXShare (warning: both links...
By popular request, I've updated the PANDA VM to a more recent version of PANDA. Get it here: pandavm-20141005.tar.bz2 The version in the VM is based on Git revision 28787825aaf514da22e11650fdfca3ba82b9fc57. Enjoy!
Disclaimer: Although I think DRM is both stupid and evil, I don't advocate pirating music. Therefore, this post will stop short of providing a turnkey solution for ripping Spotify music, but it will fully describe the theory behind the technique and its implementation in PANDA. Don't be evil. Update 6/6/2014: The following post assumes you know...
tl;dr: PANDA now supports detached replays (you don't need the underlying VM image to run a replay), and they can be shared at a new site called PANDA Share. Hooray for reproducibility! One of the most inspiring developments of the past few years has been the push for open science, the movement to ensure that scientific publications, data, and...
I have just created a prebuilt Virtualbox VM for testing PANDA. It's a current Debian 7.1 install with the latest (as of 10/4/2013) version of PANDA and prerequisites installed. The username and password for the VM are "panda:panda", with root password "panda". Also included is a Debian i386 QCOW2 image (created by Aurelien Jarno) that can be...
I'm pleased to announce the initial release of a new open source dynamic analysis platform built on QEMU, named PANDA (Platform for Architecture-Neutral Dynamic Analysis). It has a number of features that combine to make it a uniquely powerful platform for analyzing software as it executes: Record and Replay: PANDA is capable of recording the...
I've just gotten word that the Virtuoso source code has been approved by the sponsor for public release, so I've uploaded version 1.0 to the Virtuoso Google Code site! Thanks to Tim Leek at MIT Lincoln Laboratory for seeing this project through the lengthy release review process! Also on Google Code, you can find an installation guide and a...
Over the summer I worked at Microsoft Research, which has a fantastically smart bunch of people working on really cool and interesting problems. I just noticed that they've posted the video of my end-of-internship talk, Monitoring Untrusted Modern Applications with Collective Record and Replay. Please take a look if you're curious about what it...
I've recently returned from Oakland, CA, where the 25 IEEE Symposium on Security and Privacy was held. There were a lot of excellent talks, and it was great to catch up with others in the security community. Now that the conference is over, I'm happy to release the paper and slides of our work, "Virtuoso: Narrowing the Semantic Gap in Virtual...