Running Dask in Databricks

from blog Posts on Hi, I'm Ben 🛸, 2 Nov 2023 | ↗ original

I should probably admit that there’s a bit of a contradiction between two thoughts that I have: I really love spark I really hate spark Spark is one of the most powerful dataframe libraries on the planet. It can process multiple petabytes of data. But it’s also overkill and unwieldy for most jobs. For smaller datasets, tools like Polars or Duckdb...

This is a short summary. ↗ Open original to view full content

A Data Engineering Perspective on Go vs. Python (Part 2 - Dataflow)

Christian Hollinger | original ↗

Apache Spark

Tao of Mac | original ↗

Data Lakes: Some thoughts on Hadoop, Hive, HBase, and Spark

Christian Hollinger | original ↗

Stuff that bothers me: “100x faster than Hadoop”

Home on Erik Bernhardsson | original ↗

A Data Engineering Perspective on Go vs. Python (Part 1)

Christian Hollinger | original ↗

Goodbye, Data Science

r y x, r | original ↗

Scala, Spark, Books, and Functional Programming: An Essay

Christian Hollinger | original ↗

Transform boring charts into beautiful information with Stable Diffusion (SDXL)

Karim Jedda | original ↗

Bukalapak - Fireside Chat with the Data Science team

Eugene Yan | original ↗

Functional Programming and Big Data

ntietz.com blog | original ↗

More from Posts on Hi, I'm Ben 🛸

Data Contracts as Therapy

27 Jan 2025 | original ↗

I think I’ve heard people say data is the new gold at least twenty times. Having worked in data for a while, I’m pretty sure what they mean by that is that extracting and processing is a laborious and often violent process. It’s possible that my take is inspired by working as a data engineer, and therefore being the victim of other people’s data...

Fun & Torture with Github Actions

13 Nov 2024 | original ↗

I’ve been working hard on a new project called Wimsey lately. I’ll save writing about the project itself for another day, but it got me thinking about github actions (or really any CICD workflow tool). And crucially, how my process for setting up a new CICD flow always winds up being: Write a nice clean yaml config Feel like everything is very...

Fun with Hy and Pandas

2 Oct 2024 | original ↗

I don’t keep it much of a secret- I love functional programming. Or maybe I’m just burned form spending hours of my life chasing back inheritance to see where an object variable was defined. Either was, when I saw that hy lang v1.0 was release the other day, I was pretty kean to try it out! One of the downsides of new or more experimental...

Libraries over tools

20 Aug 2024 | original ↗

Especially in the dark realm of data engineering, there’s a huge range of neat low-code/no-code UI tools. I don’t want to complain about those today, but I do want to talk about why libraries (as opposed to low-code UI) are really awesome. Low code is good code! One thing that I think get’s missed out, is that low-code can still be code. Plotly...

An Imaginary Language

8 May 2024 | original ↗

I don’t know where abouts we sit on the wave of yaml-domain-specific-languages. I really hope it’s the peak, and that things will simmer down. I like yaml a lot as a configuration language, but every time I was to work in a domain specific language pretending to be configuration, it sends chills down my spine. A big part of it is that github...

Running Dask in Databricks

Related

More from Posts on Hi, I'm Ben 🛸