Introduction I self host everything but email. I wrote about this here, here, or here. As a summary, at home, I run a 3 node Proxmox cluster with several services, powering a home network with Mikrotik router, Mikrotik switches, and UniFi WiFi, as well as an external VPS. This article is about two things: Why I still bother and what it has...
Introduction It’s been a while since I wrote about Bridge Four, my Scala 3 distributed data processing system from scratch. In this article, we’ll be discussing some major changes around Bridge Four’s state management, its new-and-improved consistency guarantees, and other features and improvements I’ve added since. In case you haven’t read the...
Introduction My home server is a Proxmox cluster. Recently, one of the host’s SSDs indicated it needed a replacement. I run TrueNas SCALE on it by passing through all my hard drives via LSI HBA so that zfs has access to the raw hardware, which makes the migration to a new SSD a bit tricker. For added difficulty, this process assumes the SSD is a...
Introduction Part 2 can be found here! Having found myself recently fun-employed in the great tech layoffs of 2023 (cue ominous thunder in the background) [1], I found myself in a bit of a conundrum: For landing a job, one needs to pass interviews. For passing interviews, one needs to jump through ridiculous hoops and, one of my favorite phrases,...
Introduction In 2019, I built a home server. It was pretty fun. Consumer hardware running Debian, all contained in a 19" Rack. It continued to hum along, basically unchanged (except for some RAM), serving a PiHole (DNS + DHCP), a UniFi controller for WiFi, smb shares for network drives (including TimeMachine), data backups, duplicati for system...
Introduction Sometimes, I come across software that makes me wonder: “How didn’t I know about this before‽”. QGIS is such software. A Free and Open Source Geographic Information System https://www.qgis.org/en/site/ As a bit of background - I’ve always been a fan of playing with geospatial data, as evidenced most recently in my Tiny Telematics...
Introduction This is Part 2. Part 1 can be found here, where we discussed goals and architecture, built a custom Raspbian Linux image, implemented the client part of the application in Python, using gpsd, with concepts borrowed from scala. Or, take it from the GitHub page: Track your vehicle’s live location offline by using little more than a...
Introduction This article talks about building our own Raspbian image, reading gpsd data, dealing with issues withing GPS and NMEA, caching everything with redis, trying strongly typed domain modeling in Python (of all places) and using concepts borrowed from scala and Haskell to first scaffold function signatures, then building them...
Introduction While my last post on the matter was pretty critical on the topic of Scala (and functional programming in general), this post aims to point out the things I actually like, after having used Scala more or less daily for 4 months or so. I’ll go through some cherry-picked features and design patterns that stand out to me as actually,...
Introduction This is an article that’s relatively rough around the edges, somewhat on purpose: It documents my experience, as a Data Engineer, learning “proper” Scala (and Functional Programming), for better or for worse (and what I’ve learned by doing that), why Spark doesn’t count, and why I’m not sure it matters. Presumably all there is to...
Introduction I’m not exactly known as the most straightforward person when it comes to using tech at home to solve problems nobody ever had. Writing a Telegram Bot to control a Raspberry Pi from afar (to observe Guinea Pigs) might give you an idea with regards to what I’m talking about. Background So, when my SO and I started looking to move out...
Introduction I am not known to always spend my time super-wisely; sometimes, I get these odd obsessions with problems that aren’t really problems. But hear me out on this one! We were planning to take a short vacation, and while not at home, there are still things that would be nice to observe from afar, namely our two guinea pigs, Porkchop and...
Introduction A Journey through the Project In the last iteration of this project, I walked you through my journey of throwing a Raspberry Pi into my garden beds to monitor sun and water exposure. In this article, we’ll improve the prototype into something that can feasibly used both indoors and outdoors. Just as the previous article, this here...
Introduction (A journey through the project) Like many others during a certain time period around 2020/2021 (in case you’re reading this on holotape, captured in an arctic vault, during your “Ancient American History” class in the year 3000 - I am talking about Covid-19), both my SO and I found joy in gardening and growing vegetables. While the...
Introduction Google Play Music joined its brethren in the Google Graveyard of cancelled products in late 2020. Having used Google Play Music many years ago to back up all my MP3 rips (I collect CDs and Vinyl and have done so for about 17 years), Google sent me several friendly emails, asking me to transfer my music to YouTube Music and/or use...
Introduction One question I do get in earnest quite frequently is why I put up with running GNU/Linux distributions for development work. This article aims to clear some of the confusion, without getting lost in technical details or long ramblings about the shortcomings of systemd. I will also refrain from comparing it to Windows too much - I use...
Throw Away Code Michal Vaner recently published an article on using Rust, rather than Python or Perl, for throw away code and prototyping. Michal argues that for common throw away scripts, such as testing odd API behavior, testing a hypothesis in a paper, checking for broken unicode in files (things that one should hack together over a lunch...
Introduction In Part 2 of our comparison of Python and go from a Data Engineering perspective, we’ll finally take a look at Apache Beam and Google Dataflow and how the go SDK and the Python SDK differ, what drawbacks we’re dealing with, how fast it is by running extensive benchmarks,and how feasible it is to make the switch. You can find Part 1...
Introduction Exploring golang - can we ditch Python for go? And have we (as in “folks that work with a lot of data regularly”) finally found a use case for go? Part 1 explores high-level differences between Python and go and gives specific examples on the two languages, aiming to answer the question based on Apache Beam and Google Dataflow as a...
Introduction For the past 4 or so years, I’ve been using WordPress to push content to this blog. I originally chose it because my knowledge of anything to do with Web and Mobile Development is spotty at best - and because “back in the day"™, it was the only CMS I was somewhat familiar with. Maybe you’ve seen the sluggish mess that the blog is -...
Most of us have seen, debugged, and solved a lot of tech issues in our lives - from understanding that cheap PSU mean a lot of smoke, through off-by-one errors in your favorite programming language, all the way to finding segfaults in grep. This is the story on how a broken stick of DDR-4 memory was hiding in plain sight for almost a year. Humble...
Introduction The amount of time my outdoor cameras are being set off by light, wind, cars, or anything other than a human is insane. Overly cautious security cameras might be a feature, but an annoying one at that. I needed a solution for this problem, without going completely overboard. Something simple, elegant, yet effective. Folks, meet what...
Introduction In 2017, I wrote about how to build a basic, Open Source, Hadoop-driven Telematics application (using Spark, Hive, HDFS, and Zeppelin) that can track your movements while driving, show you how your driving skills are, or how often you go over the speed limit - all without relying on 3rd party vendors processing and using that data on...
Introduction In this article, we’ll take a look at whether Apache Hadoop still a viable option in 2019, with Cloud driven data processing an analytics on the rise. When I first started using the Apache Hadoop Ecosystem, things around the magic buzzwords of “Big Data” and “Machine Learning” were quite different compared to what happened since. In...
Introduction In this article, I’ll document my process of building a home server - or NAS - for local storage, smb drives, backups, processing, git, CD-rips, and other headless computing. Why a Home Server? The necessity of a NAS these days can be questioned, given the cheap - or free - availability of cloud storage. However, getting into a...
Introduction In the last iteration of this article, we analyzed the top 100 subreddits and tried to understand what makes a reddit post successful by using Google’s Cloud ML tool set to analyze popular pictures. In this article, we will be extending the last article’s premise - to analyze picture-based subreddits with Dataflow - by using Google’s...
Introduction In this article (and its successors), we will use a fully serverless Cloud solution, based on Google Cloud, to analyze the top Reddit posts of the 100 most popular subreddits. We will be looking at images, text, questions, and metadata._ We aim to answer the following questions: What are the top posts? What is the content of the top...
Introduction In this article, we will use Heron, the distributed stream processing and analytics engine from Twitter, together with Google’s NLP toolkit, Nominatim and some Machine Learning as well as Google’s BigTable, BigQuery, and Data Studio to plot Twitter user’s assumed location across the US. We will show how much your Twitter profile...
Introduction This article will talk about how organizations can make use of the wonderful thing that is commonly referred to as “Data Lake” - what constitutes a Data Lake, how probably should (and shouldn’t) use it to gather insights and why evaluating technologies is just as important as understanding your data. When organizations talk about the...
A V8 doesn’t need computers Cars these days are quite the experience. Or to use Mike Skinner’s words, they are full of “Space Invader”-technology. Even entry-level models have more wiring, computers and sensors built in than a medium-sized plane. Well, while that might be a slight exaggeration, but a modern car collects data. A lot of it....
Introduction This article is part 2 of an upcoming article series, Storm vs. Heron. In the last part of the series, we looked at how to transform your existing Storm topologies to Twitter’s new distributed streaming- and analytics-framework, Heron. In this part of the series, we will actually see why you would want to do this. This part will see...
Introduction This article is part 1 of an upcoming article series, Storm vs. Heron. When upgrading your existing Apache Storm topologies to be compatible with Twitter’s newest distributed stream processing engine, Heron, you can just follow the instructions over at Heron’s page, as Heron aims to be fully compatible with existing Storm topologies....
Introduction This article explains how to edit a structured number of records in HBase by combining Hive on M/R2 and sed - and how to do it properly with Hive. HBase is an impressive piece of software - massively scalable, super-fast and built upon Google’s BigTable… and if anyone knows “How To Big Data”, it ought to be these guys. But whether...
This is my blog. I’m a software engineer that works in “data” (I believe they call that “Data Platform Engineer”) and I usually work on distributed data processing systems, mostly in Scala, Python, and some go. If you got here via the link on my CV, I naturally mean “I develop high-performance, enterprise-grade, world-scale distributed...