open to senior AI roles

Dhiraj
Chaudhary

AI & Data Engineer at KKR
browse projects · open · ⌘K for everything
dhiraj@dev — zsh — 80×24
dhiraj@dev:~$
dhiraj@dev:~$ cat about.md

About

I'm an AI & data engineer at KKR. I lead the Core Data Platforms team — a team I started as one person and have grown to five — supporting the Insurance Data Engineering org. We own the platforms the org's data runs on: 10,000+ jobs spanning ETL frameworks, serverless infrastructure, and the lakehouse, along with the AI workflows that automate the operations around them. And — proudly — I was the first in the org to ship LLM-powered systems to production.

The work I find most interesting is deciding where an LLM actually belongs. Plenty of problems don't need one — but when they do, the craft is in grounding the system until it's trustworthy in production, and designing it so the AI does the heavy lifting while a human keeps the final say.

Outside of work, I build my own AI products — Get New Resume, TimeBrew, InspireInbox, and more.

ai / llm
LangGraphLiteLLMOpenRouterRAGevalsClaudeGPT-4
data
SparkScalaIcebergHadoopHiveETL
cloud
AWS LambdaStep FunctionsGlueEMRDynamoDBServerless
languages
PythonSQLScalaTypeScriptBash
web
Next.jsReactFastAPIFlaskTailwind
$ git log --oneline experience/
2022 — present

Sr Software / Data Engineer

KKR (Global Atlantic)
2021

Software Engineer Intern

Tarifica
dhiraj@dev:~$ ls ~/work

Work

KKR (Global Atlantic)·Sr Software / Data Engineer·builds & leads Core Data Platforms (1 → 5)2022 — present
automated-incident-resolutionai

When a production data job fails, this multi-agent system investigates the way an engineer would — logs, the underlying database, whatever the failure calls for — pinpoints the root cause, and shows up with the fix already drafted: a GitHub PR for code bugs, a one-click remediation for everything else. It remembers how each job's past incidents were resolved, so recurring failures get faster to fix every time. ~90% accurate when replayed against 200+ real historical failures.

langgraphlitellmclaude
POC · ~90% on 200+ real failures
tpa-file-resolverai

One of the first AI systems I shipped to production, years before LLM tooling went mainstream. Insurance partners hand-build their data files in Excel, so columns, types, and date formats drift — and pipelines break. This catches the drift, proposes a fix as a reviewable diff, and ingests on approval. No code change, no asking the partner to resend. Still running: ~120 files a month across ~50 deals.

llmpythonhitl
2–3 days → hours · ~120 files/mo
data-quality-agentai

Writing good data-quality rules takes expertise that doesn't scale, so most tables had none. One agent fixes coverage: it profiles a table, infers its domain, and authors 10–100 tailored rules, test-run against real data before recommending. A second agent runs daily and reasons over the results like an analyst would — flagging a column off its growth trajectory or correlations that quietly broke. Plain-English verdicts, already catching issues nobody was looking for.

langgraphtoolsdq
POC · 15 tables · rolling out
deal-codegenai

Every new insurance deal needs 15–20 pipeline modules built against the same canonical medallion model. This system generates them, grounded in prior deals' code. Integration time dropped ~60% — and it shipped before Cursor or Copilot existed.

llmpythoncodegen
−60% integration · pre-Cursor
text2sqlai

The org's first Text2SQL system, built in 2024 before the pattern was common. Plain-English questions become SQL grounded in real schemas, column context, and curated sample queries, so the output matches actual table structure instead of plausible guesses. It proved the pattern internally and was later merged into what became the org-wide production tool.

ragllmsql
2024 · grew into an org-wide tool
llm-migrationsai

A precursor to the Claude Code era: LLM pipelines I built to power a major technology transformation — 2,000+ Sybase stored procedures to Redshift, 500 SAS jobs to Spark, ~1,200 Tableau dashboards repointed and rewritten. The clever part was validation: an LLM compared rendered dashboard images, old source vs. new, and reported every visual difference — catching bugs and drift no one had time to spot by eye. Migrations budgeted in months landed in days, saving roughly $1M in labor.

llmredshiftspark
months → days · ~$1M saved
doc-translationai

The documents were too sensitive to hand to a vendor, so we built translation in-house. The pipeline parses every text and layout element from PDF/PPTX, translates through open-source models inside the VPC, and reassembles a document that looks identical to the source. A round-trip eval loop catches meaning drift. Productionized, and critical during the company's expansion into Japan.

nlpon-premevals
EN ↔ JA · zero data leaves the VPC
spark-etl-frameworkplatform

The Insurance Data Engineering org's common tool for building data pipelines — 50+ plug-in components (get RDBMS, put RDBMS, create CSV, send email...) that let engineers assemble complex pipelines with minimal code. Shared logging, incident creation, and central control mean a fix made once propagates to every job. 10,000+ jobs run on it today.

sparkscalaplatform
10k+ jobs · −80% runtime
json-ingestionplatform

An existing JSON ingestion pipeline — 300–400k tiny files per load — was taking 17–18 hours, sometimes a full day. I rebuilt the slow parts: a Spark trick I couldn't find documented anywhere (using an RDD as a distributed work queue to move files: 5 hours → 2 minutes), and RDD-level normalization that fixed list-vs-dict key drift before the DataFrame ever existed. Today 95% of runs finish in 10–15 minutes.

sparkrddpython
17–18 hrs → 10–15 min · 300k+ files
serverless-platformplatform

One of the lead architects of KKR's move off 24/7 EC2 onto Glue, Lambda, and EMR. Designed the architecture, proved feasibility, built the v0, then enabled teams across the org to migrate — with custom tooling where AWS fell short. 8,000+ man-hours saved; error detection ~40% faster.

awsserverlessplatform
8k+ hrs saved · −40% detection · real-time
data-fabricplatform

Land any source database — Oracle, Snowflake, DB2, Redshift — into Apache Iceberg as one cheap, queryable fabric. Adding a source means describing it in JSON, not writing a connector. 500+ tables unified across 4 teams.

icebergsparkself-serve
500+ tables · 4 teams
variance-trackerplatform

A full-stack platform for tracking and signing off data variances across multiple databases — built solo in Flask, shipped to AWS after passing the enterprise Architecture Review Board. 50+ stakeholders rely on it; approval cycles cut ~70%.

flaskawsfull-stack
50+ stakeholders · −70% approvals
dhiraj@dev:~$ ls ~/products

Live products

AI products I've designed, built, and shipped end to end — live on the internet, outside of work — plus the autonomous agent that runs one of them.

live

Get New Resume getnewresume.com

Turns a resume + job description into a tailored resume, ATS match score, and cover letter — with a zero-fabrication constraint. 4-stage LLM pipeline, multi-model routing, typed/validated outputs.

Next.js · Lambda · OpenRouter · DynamoDB
running

Milly getnewresume.com

The autonomous Claude agent that runs Get New Resume's back office on a live EC2 box — reads incoming email, decomposes it into tasks, spawns worker sub-agents, and ships a daily briefing. Role-based (dispatcher · babysitter · briefer), token-budgeted, self-healing via cron.

Claude · Node · EC2 · SES/S3/SQS
live

TimeBrew timebrew.news

Personalized AI news briefings in a 'Morning Brew' voice. A 3-stage Step Functions pipeline (curator → editor → dispatcher) across 14 Lambdas, timezone-aware scheduling, ~$0.02 per briefing.

Step Functions · Perplexity · GPT-4 · SES
live

InspireInbox inspireinbox.com

LLM-powered motivational platform — personalized content tuned to your growth goals, with scheduled delivery, feedback collection, and analytics.

FastAPI · React
live

Notion Crafts notioncrafts.com

A web app hosting interactive widgets (clocks, timers, counters) and icon packs you can embed in Notion. Python/Flask backend on EC2 behind Nginx + Cloudflare.

Flask · EC2 · Nginx
dhiraj@dev:~$ ls ~/labs

Labs & open source

Apps, experiments, a published Python package, and undergrad research — the things I build to learn.

jotted

Keyboard-first 'mission control' for running many AI agents at once — drag-drop task lanes per agent, a Cmd-K command palette, and PM-grade timeline / radar / kanban views. Built it to manage my own parallel Claude runs.

next.jszustandagents
wip · cmd-k · vim nav
drink-water

Native iOS hydration tracker (SwiftUI) — animated progress styles, 90-day history, smart reminders, streaks, and full VoiceOver/dark-mode support. Built App-Store-ready, MVVM + OSLog.

swiftswiftuiios
app-store-ready
student-circle

Serverless student social platform — FastAPI on AWS Lambda with Cognito auth, API Gateway, and S3, plus a Vite/React front end. Multi-stage dev/staging/prod infra via the Serverless Framework.

fastapilambdareact
serverless · cognito
proboabfunc

A Python package (published to PyPI) for statistical operations and plots over binomial & gaussian distributions, built with OOP on pandas / NumPy / matplotlib.

pythonpypistats
published · pip install
research-papers

Undergrad research: a Flood-It solver algorithm in Python (presented at the Mathematics Association of America) and a bio-robotics study on hybrid artificial/biological structures (SJC research symposium).

algorithmsresearchmath
2 papers · presented
dhiraj@dev:~$ ./contact.sh

Get in touch

Building something that needs solid data & AI plumbing?

dhirajc963@gmail.com
$ cat contact.json
{
  "email": "dhirajc963@gmail.com",
  "github": "github.com/dhirajc963",
  "linkedin": "linkedin.com/in/dhiraj-kumarcdry",
  "location": "New York"
}