Eugene Yan

Building AI Reading Club: Features & Behind the Scenes

12 Jan 2025 | original ↗

Exploring how an AI-powered reading experience could look like.

2024 Year in Review

22 Dec 2024 | original ↗

A peaceful year of steady progress on my craft and health

A Spark of the Anti-AI Butlerian Jihad (on Bluesky)

8 Dec 2024 | original ↗

How the sharing of 1M Bluesky posts surfaced the strong anti-AI sentiment on Bluesky.

Some Paradoxical Rules of Writing

1 Dec 2024 | original ↗

With regard to writing, there are many rules and also no rules at all.

How to Run a Paper Club and Learn With Your Peers

24 Nov 2024 | original ↗

Description of post here (150 chars)

A Minimal Mac Setup Guide

17 Nov 2024 | original ↗

Setting up my new MacBook Pro from scratch

39 lessons from Industry ML Conferences in 2024

3 Nov 2024 | original ↗

ML systems, production & scaling, execution & collaboration, building for users, conference etiquette.

AlignEval: Building an App to Make Evals Easy, Fun, and Automated

27 Oct 2024 | original ↗

Look at and label your data, build and evaluate your LLM-evaluator, and optimize it against your labels.

Hackathon Judge - Weights & Biases LLM-Evaluator Hackathon

22 Sept 2024 | original ↗

Being a human judge at the Weights & Biases LLM-as-a-Judge Hackathon

Building the Same App Using Various Web Frameworks

8 Sept 2024 | original ↗

FastAPI, FastHTML, Next.js, SvelteKit, and thoughts on how coding assistants influence builders' choices.

Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)

18 Aug 2024 | original ↗

Use cases, techniques, alignment, finetuning, and critiques against LLM-evaluators.

How to Interview and Hire ML/AI Engineers

7 Jul 2024 | original ↗

What to interview for, how to structure the phone screen, interview loop, and debrief, and a few tips.

AIE World's Fair 2024 Keynote - What We Learned from a Year of LLMs

27 Jun 2024 | original ↗

Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs.

Netflix PRS 2024 - Applying LLMs to Recommendation Experiences

31 May 2024 | original ↗

Challenges and lessons from deploying LLM experiences: evals, scalability, guardrails.

Prompting Fundamentals and How to Apply them Effectively

26 May 2024 | original ↗

Structured input/output, prefilling, n-shots prompting, chain-of-thought, reducing hallucinations, etc.

What We've Learned From A Year of Building with LLMs

12 May 2024 | original ↗

From the tactical nuts & bolts to the operational day-to-day to the long-term business strategy.

Building an AI Coach to Help Tame My Monkey Mind

7 Apr 2024 | original ↗

Building an AI coach with speech-to-text, text-to-speech, an LLM, and a virtual number.

Task-Specific LLM Evals that Do & Don't Work

31 Mar 2024 | original ↗

Evals for classification, summarization, translation, copyright regurgitation, and toxicity.

Don't Mock Machine Learning Models In Unit Tests

25 Feb 2024 | original ↗

How unit testing machine learning code differs from typical software practices

How to Generate and Use Synthetic Data for Finetuning

11 Feb 2024 | original ↗

Overcoming the bottleneck of human annotations in instruction-tuning, preference-tuning, and pretraining.

Language Modeling Reading List (to Start Your Paper Club)

7 Jan 2024 | original ↗

Some fundamental papers and a one-sentence summary for each; start your own paper club!

2023 Year in Review

31 Dec 2023 | original ↗

An expanded charter, lots of writing and speaking, and finally learning to snowboard.

Push Notifications: What to Push, What Not to Push, and How Often

24 Dec 2023 | original ↗

Sending helpful & engaging pushes, filtering annoying pushes, and finding the frequency sweet spot.

Out-of-Domain Finetuning to Bootstrap Hallucination Detection

5 Nov 2023 | original ↗

How to use open-source, permissive-use data and collect less labeled samples for our tasks.

Reflections on AI Engineer Summit 2023

15 Oct 2023 | original ↗

The biggest deployment challenges, backward compatibility, multi-modality, and SF work ethic.

AI Engineer Summit 2023 Keynote - Building Blocks for LLM Systems

9 Oct 2023 | original ↗

Evals, retrieval-augmented generation, guardrails, and collecting feedback; all that good stuff.

Evaluation & Hallucination Detection for Abstractive Summaries

3 Sept 2023 | original ↗

Reference, context, and preference-based metrics, self-consistency, and catching hallucinations.

How to Match LLM Patterns to Problems

13 Aug 2023 | original ↗

Distinguishing problems with external vs. internal LLMs, and data vs non-data patterns

Patterns for Building LLM-based Systems & Products

30 Jul 2023 | original ↗

Evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.

Obsidian-Copilot: An Assistant for Writing & Reflecting

11 Jun 2023 | original ↗

Writing drafts via retrieval-augmented generation. Also reflecting on the week's journal entries.

Some Intuition on Attention and the Transformer

21 May 2023 | original ↗

What's the big deal, intuition on query-key-value vectors, multiple heads, multiple layers, and more.

Open-LLMs - A list of LLMs for Commercial Use

7 May 2023 | original ↗

It started with a question that had no clear answer, and led to eight PRs from the community.

Interacting with LLMs with Minimal Chat

30 Apr 2023 | original ↗

Should chat be the main UX for LLMs? I don't think so and believe we can do better.

More Design Patterns For Machine Learning Systems

23 Apr 2023 | original ↗

9 patterns including HITL, hard mining, reframing, cascade, data flywheel, business rules layer, and more.

Raspberry-LLM - Making My Raspberry Pico a Little Smarter

16 Apr 2023 | original ↗

Generating Dr. Seuss headlines, fake WSJ quotes, HackerNews troll comments, and more.

Experimenting with LLMs to Research, Reflect, and Plan

9 Apr 2023 | original ↗

Also, shortcomings in document retrieval and how to overcome them with search & recsys techniques.

LLM-powered Biographies

19 Mar 2023 | original ↗

Asking LLMs to generate biographies to get a sense of how they memorize and regurgitate.

How to Write Data Labeling/Annotation Guidelines

12 Mar 2023 | original ↗

Writing good instructions to achieve high precision and throughput.

Content Moderation & Fraud Detection - Patterns in Industry

26 Feb 2023 | original ↗

Collecting ground truth, data augmentation, cascading heuristics and models, and more.

Mechanisms for Effective Technical Teams

5 Feb 2023 | original ↗

End of week debrief, weekly business review, monthly learning sessions, and quarter review.

Mechanisms for Effective Machine Learning Projects

22 Jan 2023 | original ↗

Pilot & copilot, literature review, methodology review, and timeboxing.

Goodbye Roam Research, Hello Obsidian

15 Jan 2023 | original ↗

How to migrate and sync notes & images across devices

What To Do If Dependency Teams Can’t Help

8 Jan 2023 | original ↗

Seeking first to understand, earning trust, and preparing for away team work.

2022 in Review & 2023 Goals

24 Dec 2022 | original ↗

Travelled, wrote, and learned a lot, L5 -> L6, gave a keynote at RecSyS, and started a meetup.

Autoencoders and Diffusers: A Brief Comparison

11 Dec 2022 | original ↗

A quick overview of variational and denoising autoencoders and comparing them to diffusers.

Text-to-Image: Diffusion, Text Conditioning, Guidance, Latent Space

27 Nov 2022 | original ↗

The fundamentals of text-to-image generation, relevant papers, and experimenting with DDPM.

RecSys 2022: Recap, Favorite Papers, and Lessons

2 Oct 2022 | original ↗

My three favorite papers, 17 paper summaries, and ML and non-ML lessons.

RecSys 2022 Keynote - Is the Juice Worth the Squeeze?

23 Sept 2022 | original ↗

Invited keynote at the Workshop for Online Recommender Systems and User Modeling (ORSUM)

Writing Robust Tests for Data & Machine Learning Pipelines

4 Sept 2022 | original ↗

Or why I should write fewer integration tests.

Simplicity is An Advantage but Sadly Complexity Sells Better

14 Aug 2022 | original ↗

Pushing back on the cult of complexity.

Uncommon Uses of Python in Commonly Used Libraries

31 Jul 2022 | original ↗

Some off-the-beaten uses of Python learned from reading libraries.

Why You Should Write Weekly 15-5s

26 Jun 2022 | original ↗

15 minutes a week to document your work, increase visibility, and earn trust.

Design Patterns in Machine Learning Code and Systems

12 Jun 2022 | original ↗

Understanding and spotting patterns to use code and components as intended.

What I Wish I Knew About Onboarding Effectively

22 May 2022 | original ↗

Mindset, 100-day plan, and balancing learning and taking action to earn trust.

Bandits for Recommender Systems

8 May 2022 | original ↗

Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more.

How to Measure and Mitigate Position Bias

17 Apr 2022 | original ↗

Introducing randomness and/or learning from inherent randomness to mitigate position bias.

Counterfactual Evaluation for Recommendation Systems

10 Apr 2022 | original ↗

Thinking about recsys as interventional vs. observational, and inverse propensity scoring.

Traversing High-Level Intent and Low-Level Requirements

20 Mar 2022 | original ↗

How they differ and why they work better in different situations.

Data Science Project Quick-Start

6 Mar 2022 | original ↗

Hard-won lessons on how to start data science projects effectively.

Mailbag: How to Define a Data Team's Vision and Roadmap

18 Feb 2022 | original ↗

I'm heading into a team lead role and would like to define the vision and roadmap.

Red Flags to Look Out for When Joining a Data Team

13 Feb 2022 | original ↗

What to consider for in terms of data, roadmap, role, manager, tooling, etc.

How to Keep Learning about Machine Learning

19 Jan 2022 | original ↗

Beyond getting that starting role, how does one continue growing in the field?

The Data Scientist Show - Building end-to-end ML systems

2 Dec 2021 | original ↗

Daliana and I had a 2hr chat on all things data science and machine learning.

2021 Year in Review

28 Nov 2021 | original ↗

Met most of my goals, adopted a puppy, and built ApplyingML.com.

Informal Mentors Grew into ApplyingML.com!

25 Nov 2021 | original ↗

More than two dozen interviews with ML Practitioners sharing their stories and advice

5 Lessons I Learned from Writing Online (Guest post by Susan Shu)

7 Nov 2021 | original ↗

Susan shares 5 lessons she gained from writing online in public over the past year.

What I Learned from Writing Online - For Fellow Non-Writers

17 Oct 2021 | original ↗

Write before you're ready, write for yourself, quantity over quality, and a few other lessons.

RecSys 2021 - Papers and Talks to Chew on

3 Oct 2021 | original ↗

Simple baselines, ideas, tech stacks, and packages to try.

The First Rule of Machine Learning: Start without Machine Learning

19 Sept 2021 | original ↗

Why this is the first rule, some baseline heuristics, and when to move on to machine learning.

MLOps Community - System Design for RecSys & Search

15 Sept 2021 | original ↗

An overview of system design, candidate retrieval, and ranking, with industry examples.

Reinforcement Learning for Recommendations and Search

5 Sept 2021 | original ↗

Focusing on long-term rewards, exploration, and frequently updated item.

Amazon Science - Eugene Yan and the Art of Writing about Science

2 Aug 2021 | original ↗

Why the Amazon applied scientist takes the time to break down his work for readers.

Bootstrapping Labels via ___ Supervision & Human-In-The-Loop

1 Aug 2021 | original ↗

How to generate labels from scratch with semi, active, and weakly supervised learning.

Mailbag: How to Bootstrap Labels for Relevant Docs in Search

20 Jul 2021 | original ↗

Building semantic search; how to calculate recall when relevant documents are unknown.

SF Big Analytics - System Design for RecSys & Search

13 Jul 2021 | original ↗

Why real-time RecSys? What does the system design look like in industry? How to build an MVP?

Influencing without Authority for Data Scientists

4 Jul 2021 | original ↗

Show them the data, the Socratic method, earning trust, and more.

System Design for Recommendations and Search

27 Jun 2021 | original ↗

Breaking it into offline vs. online environments, and candidate retrieval vs. ranking steps.

Patterns for Personalization in Recommendations and Search

13 Jun 2021 | original ↗

A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.

Towards Data Science - Author Spotlight with Eugene Yan

2 Jun 2021 | original ↗

My favourite project, how I write weekly and how you can start, and content I would like to see more of.

The Metagame of Applying Machine Learning

2 May 2021 | original ↗

How to go from knowing machine learning to applying it at work to drive impact.

Search: Query Matching via Lexical, Graph, and Embedding Methods

25 Apr 2021 | original ↗

An overview and comparison of the various approaches, with examples from industry search systems.

My Impostor Syndrome Stories (Guest Post by Susan Shu)

18 Apr 2021 | original ↗

Even high achieving individuals experience impostor syndrome; here's how Susan learned to manage it.

How to Live with Chronic Imposter Syndrome

11 Apr 2021 | original ↗

More education, achievements, and awards don't shoo away imposter syndrome. Here's what might help.

Planning Your Career: Values and Superpowers

4 Apr 2021 | original ↗

What do you deeply care about? What do you excel at? Build a career out of that.

Bukalapak - Fireside Chat with the Data Science team

28 Mar 2021 | original ↗

We discussed about how to build and run data teams and engage better with business.

TalkPython - What ML can Teach Us About Life

26 Mar 2021 | original ↗

Mike and I take a philosophical detour on Talk Python and discuss life lessons from machine learning.

Choosing Problems in Data Science and Machine Learning

21 Mar 2021 | original ↗

Short vs. long-term gain, incremental vs. disruptive innovation, and resume-driven development.

Seven Habits that Shaped My Last Decade

14 Mar 2021 | original ↗

I wish I started sooner. All have improved my life and several have compounding effects.

How to Write Design Docs for Machine Learning Systems

7 Mar 2021 | original ↗

Pointers to think through your methodology and implementation, and the review process.

How to Write Better with The Why, What, How Framework

28 Feb 2021 | original ↗

Three documents I write (one-pager, design doc, after-action review) and how I structure them.

Feature Stores: A Hierarchy of Needs

21 Feb 2021 | original ↗

Access, serving, integrity, convenience, autopilot; use what you need.

How to Win a Data Hackathon (Hacklytics 2021)

14 Feb 2021 | original ↗

What the top teams did to win the 36-hour data hackathon. No, not machine learning.

DataTalksClub - Building an ML System; Behind the Scenes

7 Feb 2021 | original ↗

Design and architecture, tech stack, methodology, results, and lessons learned.

Growing and Running Your Data Science Team

31 Jan 2021 | original ↗

What I learned about hiring and training, and fostering innovation, discipline, and camaraderie.

You Don't Really Need Another MOOC

24 Jan 2021 | original ↗

Stop procrastinating, go off the happy path, learn just-in-time, and get your hands dirty.

DataTalksClub - The Importance Of Writing In A Tech Career

17 Jan 2021 | original ↗

Why did I start writing? What's my writing process? What's the writing culture at Amazon like?

Mailbag: How to Get Experienced DS Resume Noticed by Recruiters?

16 Jan 2021 | original ↗

How to increase the chances of getting called up by recruiters?

Real-time Machine Learning For Recommendations

10 Jan 2021 | original ↗

Why real-time? How have China & US companies built them? How to design & build an MVP?

2021 Roadmap: Sharing, Helping, and Living More

3 Jan 2021 | original ↗

A public roadmap to track and share my progress; nothing mission or work-related.

2020 Retrospective: New Country, New Role, New Habit

20 Dec 2020 | original ↗

Wrapping up 2020 with writing and site statistics, graphs, and a word cloud.

Catch the Flying Daggers

11 Dec 2020 | original ↗

A short story on flying daggers and life's challenges.

How I’m Reflecting on 2020 and Planning for 2021

6 Dec 2020 | original ↗

Time to clear the cache, evaluate existing processes, and start new threads.

Alexey Grigorev on His Career, Data Science, and Writing

29 Nov 2020 | original ↗

How he switched from engineering to data science, what "senior" means, and how writing helps.

Mailbag: What's the Architecture for your Blog?

24 Nov 2020 | original ↗

How did you set up your site and what's an easy way to replicate it?

What Machine Learning Can Teach Us About Life - 7 Lessons

22 Nov 2020 | original ↗

Data cleaning, transfer learning, overfitting, ensembling, and more.

How to Prevent or Deal with a Data Science Role or Title Mismatch

15 Nov 2020 | original ↗

Interview questions you should ask and how to evolve your job scope.

Applied / Research Scientist, ML Engineer: What’s the Difference?

8 Nov 2020 | original ↗

A personal take on their deliverables and skills, and what it means for the industry and your team.

Chip Huyen on Her Career, Writing, and Machine Learning

1 Nov 2020 | original ↗

Setbacks she faced, overcoming them, and how writing changed her life.

Data Discovery Platforms and Their Open Source Solutions

25 Oct 2020 | original ↗

What questions do they answer? How do they compare? What open-source solutions are available?

Why I switched from Netlify back to GitHub Pages

21 Oct 2020 | original ↗

DNS server snafus led to email & security issues. Also, limited free build minutes monthly.

Why Have a Data Science Portfolio and What It Shows

18 Oct 2020 | original ↗

Not 'How to build a data science portfolio', but 'Whys' and 'Whats'.

How to Install Google Scalable Nearest Neighbors (ScaNN) on Mac

14 Oct 2020 | original ↗

Step-by-step walkthrough on the environment, compilers, and installation for ScaNN.

How Prototyping Can Help You to Get Buy-In

11 Oct 2020 | original ↗

Building prototypes helped get buy-in when roadmaps & design docs failed.

Is Writing as Important as Coding?

4 Oct 2020 | original ↗

As careers grow, how does the balance between writing & coding change? Hear from 4 tech leaders.

RecSys 2020: Takeaways and Notable Papers

27 Sept 2020 | original ↗

Emphasis on bias, more sequential models & bandits, robust offline evaluation, and recsys in the wild.

Appreciating the Present

26 Sept 2020 | original ↗

What if the alternative was nothingness?

CareerFair - Day-to-day as an Applied Scientist at Amazon

21 Sept 2020 | original ↗

What's an average day like? What's great about the role? How's working in Amazon?

Routines and Tools to Optimize My Day (Guest Post by Susan Shu)

20 Sept 2020 | original ↗

For years I've refined my routines and found tools to manage my time. Here I share it with readers.

How to Accomplish More with Less - Useful Tools & Routines

13 Sept 2020 | original ↗

My tools for organization and creation, autopilot routines, and Maker's schedule

Migrating Site Comments to Utterances

7 Sept 2020 | original ↗

A step-by-step of how to migrate from json comments to Utterances.

How to Test Machine Learning Code and Systems

6 Sept 2020 | original ↗

Checking for correct implementation, expected learned behaviour, and satisfactory performance.

Mailbag: Parsing Fields from PDFs—When to Use Machine Learning?

4 Sept 2020 | original ↗

Should I switch from a regex-based to ML-based solution on my application?

Datacast Podcast - Effective Data Science with Eugene Yan

3 Sept 2020 | original ↗

My chat with James Le about my experience, leadership, agile, ML in production, writing, and more.

How Reading Papers Helps You Be a More Effective Data Scientist

30 Aug 2020 | original ↗

Why read papers, what papers to read, and how to read them.

Mailbag: I'm Now a Senior DS—How should I Approach this?

27 Aug 2020 | original ↗

Becoming a senior after three years and dealing with imposter syndrome.

Embrace Beginner's Mind; Avoid The Wrong Way To Be An Expert

23 Aug 2020 | original ↗

How not to become an expert beginner and to progress through beginner, intermediate, and so on.

NLP for Supervised Learning - A Brief Survey

16 Aug 2020 | original ↗

Examining the broad strokes of NLP progress and comparing between models

9 Aug 2020 | original ↗

Why (and why not) be more end-to-end, how to, and Stitch Fix and Netflix's experience

Adding a Checkbox & Download Button to a FastAPI-HTML app

5 Aug 2020 | original ↗

Updating our FastAPI app to let users select options and download results.

What I Did Not Learn About Writing In School

2 Aug 2020 | original ↗

Surprising lessons I picked up from the best books, essays, and videos on writing non-fiction.

Georgia Tech's OMSCS FAQ (based on my experience)

26 Jul 2020 | original ↗

Why OMSCS? How can I get accepted? How much time needed? Did it help your career? And more...

How to Set Up a HTML App with FastAPI, Jinja, Forms & Templates

23 Jul 2020 | original ↗

I couldn't find any guides on serving HTML with FastAPI, thus I wrote this to plug the hole on the internet.

Why You Need to Follow Up After Your Data Science Project

19 Jul 2020 | original ↗

Ever revisit a project & replicate the results the first time round? Me neither. Thus I adopted these habits.

What I Do During A Data Science Project To Deliver Success

12 Jul 2020 | original ↗

It's not enough to have a good strategy and plan. Execution is just as important.

How to Update a GitHub Profile README Automatically

11 Jul 2020 | original ↗

I wanted to add my recent writing to my GitHub Profile README but was too lazy to do manual updates.

The 85% Rule: When Giving It Your 100% Gets You Less than 85%

9 Jul 2020 | original ↗

I thought giving it my all led to maximum outcomes; then I learnt about the 85% rule.

My Notes From Spark+AI Summit 2020 (Application-Specific Talks)

5 Jul 2020 | original ↗

Part II of the previous write-up, this time on applications and frameworks of Spark in production

My Notes From Spark+AI Summit 2020 (Application-Agnostic Talks)

28 Jun 2020 | original ↗

Sharing my notes & practical knowledge from the conference for people who don't have the time.

Mailbag: Qns on the Intersection of Data Science and Business

21 Jun 2020 | original ↗

Does DS have business requirements? When does it make sense to split DS and DE??

How to Set Up a Python Project For Automation and Collaboration

21 Jun 2020 | original ↗

After this article, we'll have a workflow of tests and checks that run automatically with each git push.

Why Are My Airflow Jobs Running “One Day Late”?

17 Jun 2020 | original ↗

A curious discussion made me realize my expert blind spot. And no, Airflow is not late.

What I Do Before a Data Science Project to Ensure Success

15 Jun 2020 | original ↗

Haste makes waste. Diving into a data science problem may not be the fastest route to getting it done.

What I Love about Scrum for Data Science

7 Jun 2020 | original ↗

Initially, I didn't like it. But over time, it grew on me. Here's why.

How to Apply Crocker's Law for Feedback and Growth

31 May 2020 | original ↗

Crocker's Law, cognitive dissonance, and how to receive (uncomfortable) feedback better.

A Practical Guide to Maintaining Machine Learning in Production

25 May 2020 | original ↗

Can maintaining machine learning in production be easier? I go through some practical tips.

6 Little-Known Challenges After Deploying Machine Learning

18 May 2020 | original ↗

I thought deploying machine learning was hard. Then I had to maintain multiple systems in prod.

How to Write: Advice from David Perell and Sahil Lavingia

9 May 2020 | original ↗

An expansion of my Twitter thread that went viral.

A Hackathon Where the Dinkiest Idea Won. Why?

3 May 2020 | original ↗

What I Learnt about evaluating ideas from first-hand participation in a hackathon.

Serendipity: Accuracy’s Unpopular Best Friend in Recommenders

26 Apr 2020 | original ↗

What I learned about measuring diversity, novelty, surprise, and serendipity from 10+ papers.

How to Give a Kick-Ass Data Science Talk

18 Apr 2020 | original ↗

Why you should give a talk and some tips from five years of speaking and hosting meet-ups.

Commando, Soldier, Police and Your Career Choices

12 Apr 2020 | original ↗

Should I join a start-up? Which offer should I accept? A simple metaphor to guide your decisions.

Stop Taking Regular Notes; Use a Zettelkasten Instead

5 Apr 2020 | original ↗

Using a Zettelkasten helps you make connections between notes, improving learning and memory.

Writing is Learning: How I Learned an Easier Way to Write

28 Mar 2020 | original ↗

Writing begins before actually writing; it's a cycle of reading -> note-taking -> writing.

Simpler Experimentation with Jupyter, Papermill, and MLflow

15 Mar 2020 | original ↗

Automate your experimentation workflow to minimize effort and iterate faster.

My Journey from Psych Grad to Leading Data Science at Lazada

27 Feb 2020 | original ↗

How hard work, many failures, and a bit of luck got me into the field and up the ladder.

DataScience SG Meetup - RecSys, Beyond the Baseline

14 Jan 2020 | original ↗

Comparing baselines (matrix factorization) against novel approaches using graphs & NLP.

Beating the Baseline Recommender with Graph & NLP in Pytorch

13 Jan 2020 | original ↗

Beating the baseline using Graph & NLP techniques on PyTorch, AUC improvement of ~21% (Part 2 of 2).

Building a Strong Baseline Recommender in PyTorch, on a Laptop

6 Jan 2020 | original ↗

Building a baseline recsys based on data scraped off Amazon. Warning - Lots of charts! (Part 1 of 2).

OMSCS CS6200 (Introduction to OS) Review and Tips

15 Dec 2019 | original ↗

OMSCS CS6200 (Introduction to OS) - Moving data from one process to another, multi-threaded.

DataScience SG x ODSC Meetup - Applying ML to Healthcare

9 Oct 2019 | original ↗

In-depth sharing on how to put machine learning systems into production.

OLX Prod Tech 2019 Keynote - Asia's Tech Giants & SuperApps

3 Oct 2019 | original ↗

Keynote on how Asia's tech giants scale and their SuperApp strategy.

OMSCS CS6750 (Human Computer Interaction) Review and Tips

3 Sept 2019 | original ↗

OMSCS CS6750 (Human Computer Interaction) - You are not your user! Or how to build great products.

Goodbye Wordpress, Hello Jekyll!

25 Aug 2019 | original ↗

Moving off wordpress and hosting for free on GitHub. And gaining full customization!

OMSCS CS6440 (Intro to Health Informatics) Review and Tips

4 Aug 2019 | original ↗

OMSCS CS6440 (Intro to Health Informatics) - A primer on key tech and standards in healthtech.

OMSCS CS7646 (Machine Learning for Trading) Review and Tips

11 May 2019 | original ↗

OMSCS CS7646 (Machine Learning for Trading) - Don't sell your house to trade algorithmically.

What does a Data Scientist really do?

30 Apr 2019 | original ↗

No, you don't need a PhD or 10+ years of experience.

DATAx - A Production ML system for SEA's Biggest Hospital Group

6 Mar 2019 | original ↗

How we built an ML system to predict hospitalization costs at admission; sharing at DATAx Conference.

Data Science and Agile (Frameworks for Effectiveness)

2 Feb 2019 | original ↗

Taking the best from agile and modifying it to fit the data science process (Part 2 of 2).

Data Science and Agile (What Works, and What Doesn't)

26 Jan 2019 | original ↗

A deeper look into the strengths and weaknesses of Agile in Data Science projects (Part 1 of 2).

DataScience SG Meetup - Panel On the Different Roles in Data

17 Jan 2019 | original ↗

What's the difference between a data scientist, data engineer, and ML engineer? A panel at Google.

OMSCS CS6601 (Artificial Intelligence) Review and Tips

20 Dec 2018 | original ↗

OMSCS CS6601 (Artificial Intelligence) - First, start with the simplest solution, and then add intelligence.

GovTech Conference - Data Science and Agile—Can or Not?

28 Oct 2018 | original ↗

Yes, Agile can be adopted by data science teams. Moderating a panel at GovTech STACK.

OMSCS CS6460 (Education Technology) Review and Tips

25 Aug 2018 | original ↗

OMSCS CS6460 (Education Technology) - How to scale education widely through technology.

OMSCS CS7642 (Reinforcement Learning) Review and Tips

30 Jul 2018 | original ↗

OMSCS CS7642 (Reinforcement Learning) - Landing rockets (fun!) via deep Q-Learning (and its variants).

Big Data & Analytics Summit - Data Science Challenges @ Lazada

21 Jun 2018 | original ↗

Technical challenges easy compared to business and people issues. Sharing at the BDA Summit.

Building a Strong Data Science Team Culture

12 May 2018 | original ↗

Culture >> Hierarchy, Process, Bureaucracy.

INSEAD Lunchtime Talks - How Lazada uses Data

25 Apr 2018 | original ↗

And my idiosyncratic journey to VP of Data Science at Lazada (Alibaba). A Lunchtime chat at INSEAD.

OMSCS CS7641 (Machine Learning) Review and Tips

27 Dec 2017 | original ↗

OMSCS CS7641 (Machine Learning) - Revisiting the fundamentals and learning new techniques.

My first 100 days as Data Science Lead

25 Sept 2017 | original ↗

How being a Lead / Manager is different from being an individual contributor.

SMU - What is Data Analytics and How do I get into it?

26 Aug 2017 | original ↗

What is data science, how to pick it up, and how to enter the field? A discussion with SMU undergrads.

OMSCS CS6300 (Software Development Process) Review and Tips

13 Aug 2017 | original ↗

OMSCS CS6300 (Software Development Process) - Java and collaboratively developing an Android app.

Tech in Asia - My Journey in Data Science and Advice for others

26 Jul 2017 | original ↗

Sharing about why data science, data science myths, a typical day, and more with TIA.

SMU Masters in IT - How to get started in Data Science

26 Jun 2017 | original ↗

Tools and skills to pick up and how to practice them. An Invited Talk with Masters in IT candidates.

How to get started in Data Science

25 Jun 2017 | original ↗

Tools and skills to pick up, and how to practice them.

OMSCS CS6476 (Computer Vision) Review and Tips

15 May 2017 | original ↗

OMSCS CS6476 Computer Vision - Performing computer vision tasks with ONLY numpy.

One way to help a data science team innovate successfully

19 Feb 2017 | original ↗

If things are not failing, you're not innovating enough. - Elon Musk

Product Categorization API Part 3: Creating an API

13 Feb 2017 | original ↗

Or how to put machine learning models into production.

Image search is now live!

14 Jan 2017 | original ↗

A web app to find similar products based on image.

Product Classification API Part 2: Data Preparation

11 Dec 2016 | original ↗

Cleaning up text and messing with ascii (urgh!)

Strata x Hadoop 2016 - How Lazada Ranks Products

9 Dec 2016 | original ↗

How Lazada ranks products to improve customer experience and conversion at Strata 2016.

Image classification API is now live!

27 Nov 2016 | original ↗

A simple web app to classify fashion images into Amazon categories.

I'm going back to school

2 Nov 2016 | original ↗

Got accepted into Georgia Tech's Computer Science Masters!

SortMySkills is now live!

23 Oct 2016 | original ↗

A card sorting game to discover youl passion by identifying skills you like and dislike.

Product Classification API Part 1: Data Acquisition

11 Oct 2016 | original ↗

Parsing json and formatting product titles and categories.

Thoughts on Functional Programming in Scala Course (Coursera)

31 Jul 2016 | original ↗

Learning Scala from Martin Odersky, father of Scala.

First post!

6 Jul 2016 | original ↗

Time to start writing.

DataKind Singapore’s Latest Project Accelerator

17 Sept 2015 | original ↗

Guest post of how DataKind SG worked with NGOs to frame their problems and suggests solutions

DataScience SG Meetup - How we got top 3% in Kaggle

20 Jun 2015 | original ↗

Sharing about my first data science competition at DataScience SG.

Related blogs