Kevin Gallo just announced Bash support on Windows. If you have never had to interact with the Windows Batch language, this might not seem like such a big deal. Surely Batch could not be substantially worse than Bash, right? Bash: a language that was neither designed, nor evolved. An adequate solution to a problem that has since become orders of...
This morning, TechCrunch broke the story that Mesos support is coming to Windows. This story is meant to coincide with Ben Hindman’s MesosCon keynote, in which there will be a real, end-to-end demo showing us scheduling work on a cluster with a mix of Linux and Windows nodes. For the vast majority of the project, I have been the only dedicated...
FoundationDB has been acquired by Apple. A notice on their community site explains that they have pulled download links, and their client libraries now return 404 on GitHub. To database customers, this is a good lesson: assuming FDB did not coordinate with customers ahead of time, this instantly cost at least some FDB customers millions of...
The performance of live site systems — everything from K/V stores to lock servers — is still measured principally in latency and throughput. Server I/O performance still matters here. It is impossible to do well on either of these metrics without a performant I/O subsystem. Oddly, while the last 10 years have seen remarkable improvements in the...
IBM was at the top of the news aggregators a couple days ago as irresponsible rumors spread that it was laying off more than 100,000 people. IBM eventually posted a scathing denial, though not quickly enough to stem the flow of Internet pundits who showed up in droves (at, e.g., Hacker News) to explain just how irrelevant IBM really is. This is a...
Bond is a performant serialization system developed and deployed across dozens of mission-critical, high-scale infrastructure projects internally here at Microsoft. Today the technical lead, Adam Sapek, is open sourcing the project on GitHub under the very permissive MIT license. Since there is going to be no official MSFT announcement, I would...
Update: some colleagues and ex-colleagues from the .NET framework team showed up on HN to comment about this issue. It’s worth reading. Prelude C# supports kind-of-macros via the very neat Expression Tree API. The gist is: You build a tree that represents some C# expression. When you want to execute that expression, C# basically treats it like a...
[Translation available in Japanese] So you want to learn OCaml. Where do you start? What do you do? I’ve been an OCaml beginner probably a dozen times — picking it up, dropping it, and picking it up again so many times I’ve lost count. This time it’s stuck, and I think it’s because the community has fundamentally changed. Here’s what worked for...
As a machine learning acolyte, I spent probably as much time trying to understand things like how and when to use machine learning as I did understanding the technical details of machine learning itself. Unfortunately, most of the discussion around machine learning is about the former. The latter gets almost no press by people who are in the...
In my college systems class we were required to implement malloc. I spent a week or so on it. No version control — I was both youthful and arrogant. After ironing out all the little systems bugs, I began cleaning up the directory to package up and send off for grading. I went to remove something in the same directory that also started with the...
I’m sitting here on the 5 hour 40 minute flight from SEA to PHL. I need to use a regular expression to do something. But, while normally I’d use the re module, I’ve forgetten the details of the API. And to top it off, I’m certainly not ponying up $14 for the crappy plane Internet! I guess that means I have three problems. Looks like I have to go...
For more than a decade, Akamai has guarded their users’ private RSA keys using a security-conscious variant of the malloc family. In effect, this allows their systems to maintain a second, more secure heap, which makes it significantly harder to execute a broad class of security vulnerabilities. Yesterday, Rich Salz disseminated a patch to...
Hacker School: day 266. (My batch ended on August 22, 2013, but as they say, never graduate.) After learning a bunch about concurrency primitives Yesterday, I decided it would be fun to have an operational understanding of their implementation. So I decided to boot up dtruss (which is like dtrace, but for OS X) and look at the syscall pattern...
Hacker School: day 265. (My batch ended on August 22, 2013, but as they say, never graduate.) I’m sort of embarrassed to admit that I never had a really good handle on how the basic concurrency primitives are implemented. I’m now pretty glad I started to dig into this, because most of the analogies people use to describe mutexes and semaphores...
UPDATE: I guess Apple has released a statement explaining that they’re not going to explain this issue, including how big of a deal it is. Ok, then I will. UPDATE 2: Well, looks like Chad beat me to posting the file to Hacker News. Heh. Tonight my friend Chad Brubaker[1] pointed me at an interesting problem. Apple has been rolling out an iOS...
Kara Swisher wrote her book AOL.com in 1998. In those days, the industry faced an epistemological crisis. The consumer Internet was new and ill-understood. A company worth billions at that time might have been worth nothing a year later. There was simply a limitation to what could be known. Neither the critics nor the advocates really had a good...
Hacker News is up in arms again today about the RapGenius fiasco. See RapGenius statement and HN comments. One response article argues that we need more “viable search engine competition” and the HN community largely seems to agree. In much of the discussion, there is a picaresque notion that the “search engine problem” is really just a product...
Looking back, I think the main advantage of getting a CS degree was that it gave me a lot of time to develop an intuition for how computers behave, what tools are useful for what things, and which problems are amenable to which approaches. Developing this intuition in a semi-directed environment like school is actually really useful because...
Hacker School: day 212. (My batch ended on August 22, 2013, but as they say, never graduate.) In a previous post I talked about how the algebraic structure of a statistic you’re aggregating can give you hints about how to distribute it across a cluster. Such is the premise of Twitter’s neat little library Algebird, which I have continued to poke...
Hacker School: day 208. (My batch ended on August 22, 2013, but as they say, never graduate.) When I was reverse engineering the Snapchat API, I spent a fair amount of time wondering if there was a quick way to prototype HTTP requests. After complaining to a few friends, it turns out that there is: the telnet utility[1] on unix systems. Goal:...
Hacker School: day 207. (My batch ended on August 22, 2013, but as they say, never graduate.) I’ve been playing with Brushfire, which is a machine learning library that distributes the learning of decision trees across a cluster. Most of ML is basically aggregating counts of stuff, and Brushfire is no different. What’s sort of interesting is the...
Hacker School: day 206. (My batch ended on August 22, 2013, but as they say, never graduate.) Networking is one of the classes I never had time for. Before today, I didn’t even have good answers for basic questions about IP addresses: What are IP addresses for? How do I get an IP address? Who assigns IP addresses? Who can see my IP address? Is it...
One of the serious disadvantages of working at a place like Microsoft is that everything is built for scale, even prototypes. When we roll something out, it gets used[*]. There is no slapping something up in Django “to see if it works” because it only works if it works for millions of clients immediately. When you build systems in this way, the...
The short answer to the question of why Twitter is IPO'ing now is that the timing is excellent, but the particulars of why this is true are actually really interesting. Next year is a comparatively bad time to file. Right now, the Jumpstart Our Business Startups (JOBS) act allows “emerging” companies like Twitter to file confidentially. Briefly...
I’ve recently been learning OCaml in my free time at the Hacker School space. Normally, if you want to define multiple variables in OCaml, you chain a bunch of lets together. This works in roughly the same way lisp’s let and let* work: (* OCaml function, computes (x+2)*2 *) let foo x = let y = x+2 in let z = y*2 in z...
The Hacker School space has an old Apple //e sitting around. Due to the fact that we are hackers, my friend Martin Törnwall and I decided to turn it into a lisp machine. (Full source available here.) The main obstacle was not developing the lisp itself. It was that developing software on the Apple //e was astonishingly painful: The Apple //e has...
I spent my first few weeks at Hacker School writing a Python compiler from basically scratch. The task of merely parsing a complete language like Python can be quite intimidating at the outset. I’ve found that many people simply assume it’s nearly impossible. I began to wonder if it was possible to write a parser so clear that it would seem...
I often get asked about the difference between statistics and machine learning. It is a tricky distinction because some things that were invented for ML (e.g., PAC theory) also get a lot of play in statistics journals, and vice-versa. To say they’re completely equivalent (which is what I often hear) is probably a bit too strong. I tend to think...
The modern axiomatization of probability theory (proposed in 1933 by Andrey Kolmogorov) was designed to provide a measure-theoretic probability calculus, that is, a definition of the rules for constructing and manipulating mathematical statements involving probabilities. Unfortunately, this axiomatization only tells us how to manipulate...
Finding the \( p \)-th frequency moment, denoted \( F_p \), is one of the most well-studied problems in streaming algorithms, with a broad set of applications ranging from traffic monitoring on networks, to efficient entropy estimation, to database query optimization. In the streaming setting, this task amounts to computing \( F_p(\mathbf{x}) =...