open to senior AI roles

Dhiraj
Chaudhary

AI & Data Engineer at KKR

↵ view work a ai & agents g github

←→browse projects · ↵ open · ⌘K for everything

dhiraj@dev — zsh — 80×24

dhiraj@dev:~$

›

dhiraj@dev:~$ cat about.md

About

I'm an AI & data engineer at KKR. I lead the Core Data Platforms team — a team I started as one person and have grown to five — supporting the Insurance Data Engineering org. We own the platforms the org's data runs on: 10,000+ jobs spanning ETL frameworks, serverless infrastructure, and the lakehouse, along with the AI workflows that automate the operations around them. And — proudly — I was the first in the org to ship LLM-powered systems to production.

The work I find most interesting is deciding where an LLM actually belongs. Plenty of problems don't need one — but when they do, the craft is in grounding the system until it's trustworthy in production, and designing it so the AI does the heavy lifting while a human keeps the final say.

Outside of work, I build my own AI products — Get New Resume, TimeBrew, InspireInbox, and more.

ai / llm

LangGraphLiteLLMOpenRouterRAGevalsClaudeGPT-4

data

SparkScalaIcebergHadoopHiveETL

cloud

AWS LambdaStep FunctionsGlueEMRDynamoDBServerless

languages

PythonSQLScalaTypeScriptBash

web

Next.jsReactFastAPIFlaskTailwind

$ git log --oneline experience/

2022 — present

Sr Software / Data Engineer

KKR (Global Atlantic)

2021

Software Engineer Intern

Tarifica

dhiraj@dev:~$ ls ~/work

Work

KKR (Global Atlantic)·Sr Software / Data Engineer·builds & leads Core Data Platforms (1 → 5)2022 — present

automated-incident-resolutionai▸

When a production data job fails, this multi-agent system investigates the way an engineer would — logs, the underlying database, whatever the failure calls for — pinpoints the root cause, and shows up with the fix already drafted: a GitHub PR for code bugs, a one-click remediation for everything else. It remembers how each job's past incidents were resolved, so recurring failures get faster to fix every time. ~90% accurate when replayed against 200+ real historical failures.

langgraphlitellmclaude

POC · ~90% on 200+ real failures

tpa-file-resolverai▸

One of the first AI systems I shipped to production, years before LLM tooling went mainstream. Insurance partners hand-build their data files in Excel, so columns, types, and date formats drift — and pipelines break. This catches the drift, proposes a fix as a reviewable diff, and ingests on approval. No code change, no asking the partner to resend. Still running: ~120 files a month across ~50 deals.

llmpythonhitl

2–3 days → hours · ~120 files/mo

data-quality-agentai▸

Writing good data-quality rules takes expertise that doesn't scale, so most tables had none. One agent fixes coverage: it profiles a table, infers its domain, and authors 10–100 tailored rules, test-run against real data before recommending. A second agent runs daily and reasons over the results like an analyst would — flagging a column off its growth trajectory or correlations that quietly broke. Plain-English verdicts, already catching issues nobody was looking for.

langgraphtoolsdq

POC · 15 tables · rolling out

deal-codegenai▸

Every new insurance deal needs 15–20 pipeline modules built against the same canonical medallion model. This system generates them, grounded in prior deals' code. Integration time dropped ~60% — and it shipped before Cursor or Copilot existed.

llmpythoncodegen

−60% integration · pre-Cursor

text2sqlai▸

The org's first Text2SQL system, built in 2024 before the pattern was common. Plain-English questions become SQL grounded in real schemas, column context, and curated sample queries, so the output matches actual table structure instead of plausible guesses. It proved the pattern internally and was later merged into what became the org-wide production tool.

ragllmsql

2024 · grew into an org-wide tool

llm-migrationsai▸

A precursor to the Claude Code era: LLM pipelines I built to power a major technology transformation — 2,000+ Sybase stored procedures to Redshift, 500 SAS jobs to Spark, ~1,200 Tableau dashboards repointed and rewritten. The clever part was validation: an LLM compared rendered dashboard images, old source vs. new, and reported every visual difference — catching bugs and drift no one had time to spot by eye. Migrations budgeted in months landed in days, saving roughly $1M in labor.

llmredshiftspark

months → days · ~$1M saved

doc-translationai▸

The documents were too sensitive to hand to a vendor, so we built translation in-house. The pipeline parses every text and layout element from PDF/PPTX, translates through open-source models inside the VPC, and reassembles a document that looks identical to the source. A round-trip eval loop catches meaning drift. Productionized, and critical during the company's expansion into Japan.

nlpon-premevals

EN ↔ JA · zero data leaves the VPC

spark-etl-frameworkplatform▸

The Insurance Data Engineering org's common tool for building data pipelines — 50+ plug-in components (get RDBMS, put RDBMS, create CSV, send email...) that let engineers assemble complex pipelines with minimal code. Shared logging, incident creation, and central control mean a fix made once propagates to every job. 10,000+ jobs run on it today.

sparkscalaplatform

10k+ jobs · −80% runtime

json-ingestionplatform▸

An existing JSON ingestion pipeline — 300–400k tiny files per load — was taking 17–18 hours, sometimes a full day. I rebuilt the slow parts: a Spark trick I couldn't find documented anywhere (using an RDD as a distributed work queue to move files: 5 hours → 2 minutes), and RDD-level normalization that fixed list-vs-dict key drift before the DataFrame ever existed. Today 95% of runs finish in 10–15 minutes.

sparkrddpython

17–18 hrs → 10–15 min · 300k+ files

serverless-platformplatform▸

One of the lead architects of KKR's move off 24/7 EC2 onto Glue, Lambda, and EMR. Designed the architecture, proved feasibility, built the v0, then enabled teams across the org to migrate — with custom tooling where AWS fell short. 8,000+ man-hours saved; error detection ~40% faster.

awsserverlessplatform

8k+ hrs saved · −40% detection · real-time

data-fabricplatform▸

Land any source database — Oracle, Snowflake, DB2, Redshift — into Apache Iceberg as one cheap, queryable fabric. Adding a source means describing it in JSON, not writing a connector. 500+ tables unified across 4 teams.

icebergsparkself-serve

500+ tables · 4 teams

variance-trackerplatform▸

A full-stack platform for tracking and signing off data variances across multiple databases — built solo in Flask, shipped to AWS after passing the enterprise Architecture Review Board. 50+ stakeholders rely on it; approval cycles cut ~70%.

flaskawsfull-stack

50+ stakeholders · −70% approvals

dhiraj@dev:~$ ls ~/products

Live products

AI products I've designed, built, and shipped end to end — live on the internet, outside of work — plus the autonomous agent that runs one of them.

● live

Get New Resume ▸ getnewresume.com

Turns a resume + job description into a tailored resume, ATS match score, and cover letter — with a zero-fabrication constraint. 4-stage LLM pipeline, multi-model routing, typed/validated outputs.

Next.js · Lambda · OpenRouter · DynamoDB

● running

Milly ▸ getnewresume.com

The autonomous Claude agent that runs Get New Resume's back office on a live EC2 box — reads incoming email, decomposes it into tasks, spawns worker sub-agents, and ships a daily briefing. Role-based (dispatcher · babysitter · briefer), token-budgeted, self-healing via cron.

Claude · Node · EC2 · SES/S3/SQS

● live

TimeBrew ▸ timebrew.news

Personalized AI news briefings in a 'Morning Brew' voice. A 3-stage Step Functions pipeline (curator → editor → dispatcher) across 14 Lambdas, timezone-aware scheduling, ~$0.02 per briefing.

Step Functions · Perplexity · GPT-4 · SES

● live

InspireInbox ▸ inspireinbox.com

LLM-powered motivational platform — personalized content tuned to your growth goals, with scheduled delivery, feedback collection, and analytics.

FastAPI · React

● live

Notion Crafts ▸ notioncrafts.com

A web app hosting interactive widgets (clocks, timers, counters) and icon packs you can embed in Notion. Python/Flask backend on EC2 behind Nginx + Cloudflare.

Flask · EC2 · Nginx

dhiraj@dev:~$ ls ~/labs

Labs & open source

Apps, experiments, a published Python package, and undergrad research — the things I build to learn.

jotted▸

Keyboard-first 'mission control' for running many AI agents at once — drag-drop task lanes per agent, a Cmd-K command palette, and PM-grade timeline / radar / kanban views. Built it to manage my own parallel Claude runs.

next.jszustandagents

wip · cmd-k · vim nav

drink-water▸

Native iOS hydration tracker (SwiftUI) — animated progress styles, 90-day history, smart reminders, streaks, and full VoiceOver/dark-mode support. Built App-Store-ready, MVVM + OSLog.

swiftswiftuiios

app-store-ready

student-circle▸

Serverless student social platform — FastAPI on AWS Lambda with Cognito auth, API Gateway, and S3, plus a Vite/React front end. Multi-stage dev/staging/prod infra via the Serverless Framework.

fastapilambdareact

serverless · cognito

proboabfunc▸

A Python package (published to PyPI) for statistical operations and plots over binomial & gaussian distributions, built with OOP on pandas / NumPy / matplotlib.

pythonpypistats

published · pip install

research-papers▸

Undergrad research: a Flood-It solver algorithm in Python (presented at the Mathematics Association of America) and a bio-robotics study on hybrid artificial/biological structures (SJC research symposium).

algorithmsresearchmath

2 papers · presented

dhiraj@dev:~$ ./contact.sh

Get in touch

Building something that needs solid data & AI plumbing?

↵ dhirajc963@gmail.com

$ cat contact.json

{
  "email": "dhirajc963@gmail.com",
  "github": "github.com/dhirajc963",
  "linkedin": "linkedin.com/in/dhiraj-kumarcdry",
  "location": "New York"
}