Get New Resume ▸ getnewresume.com
Turns a resume + job description into a tailored resume, ATS match score, and cover letter — with a zero-fabrication constraint. 4-stage LLM pipeline, multi-model routing, typed/validated outputs.
I'm an AI & data engineer at KKR. I lead the Core Data Platforms team — a team I started as one person and have grown to five — supporting the Insurance Data Engineering org. We own the platforms the org's data runs on: 10,000+ jobs spanning ETL frameworks, serverless infrastructure, and the lakehouse, along with the AI workflows that automate the operations around them. And — proudly — I was the first in the org to ship LLM-powered systems to production.
The work I find most interesting is deciding where an LLM actually belongs. Plenty of problems don't need one — but when they do, the craft is in grounding the system until it's trustworthy in production, and designing it so the AI does the heavy lifting while a human keeps the final say.
Outside of work, I build my own AI products — Get New Resume, TimeBrew, InspireInbox, and more.
When a production data job fails, this multi-agent system investigates the way an engineer would — logs, the underlying database, whatever the failure calls for — pinpoints the root cause, and shows up with the fix already drafted: a GitHub PR for code bugs, a one-click remediation for everything else. It remembers how each job's past incidents were resolved, so recurring failures get faster to fix every time. ~90% accurate when replayed against 200+ real historical failures.
One of the first AI systems I shipped to production, years before LLM tooling went mainstream. Insurance partners hand-build their data files in Excel, so columns, types, and date formats drift — and pipelines break. This catches the drift, proposes a fix as a reviewable diff, and ingests on approval. No code change, no asking the partner to resend. Still running: ~120 files a month across ~50 deals.
Writing good data-quality rules takes expertise that doesn't scale, so most tables had none. One agent fixes coverage: it profiles a table, infers its domain, and authors 10–100 tailored rules, test-run against real data before recommending. A second agent runs daily and reasons over the results like an analyst would — flagging a column off its growth trajectory or correlations that quietly broke. Plain-English verdicts, already catching issues nobody was looking for.
Every new insurance deal needs 15–20 pipeline modules built against the same canonical medallion model. This system generates them, grounded in prior deals' code. Integration time dropped ~60% — and it shipped before Cursor or Copilot existed.
The org's first Text2SQL system, built in 2024 before the pattern was common. Plain-English questions become SQL grounded in real schemas, column context, and curated sample queries, so the output matches actual table structure instead of plausible guesses. It proved the pattern internally and was later merged into what became the org-wide production tool.
A precursor to the Claude Code era: LLM pipelines I built to power a major technology transformation — 2,000+ Sybase stored procedures to Redshift, 500 SAS jobs to Spark, ~1,200 Tableau dashboards repointed and rewritten. The clever part was validation: an LLM compared rendered dashboard images, old source vs. new, and reported every visual difference — catching bugs and drift no one had time to spot by eye. Migrations budgeted in months landed in days, saving roughly $1M in labor.
The documents were too sensitive to hand to a vendor, so we built translation in-house. The pipeline parses every text and layout element from PDF/PPTX, translates through open-source models inside the VPC, and reassembles a document that looks identical to the source. A round-trip eval loop catches meaning drift. Productionized, and critical during the company's expansion into Japan.
The Insurance Data Engineering org's common tool for building data pipelines — 50+ plug-in components (get RDBMS, put RDBMS, create CSV, send email...) that let engineers assemble complex pipelines with minimal code. Shared logging, incident creation, and central control mean a fix made once propagates to every job. 10,000+ jobs run on it today.
An existing JSON ingestion pipeline — 300–400k tiny files per load — was taking 17–18 hours, sometimes a full day. I rebuilt the slow parts: a Spark trick I couldn't find documented anywhere (using an RDD as a distributed work queue to move files: 5 hours → 2 minutes), and RDD-level normalization that fixed list-vs-dict key drift before the DataFrame ever existed. Today 95% of runs finish in 10–15 minutes.
One of the lead architects of KKR's move off 24/7 EC2 onto Glue, Lambda, and EMR. Designed the architecture, proved feasibility, built the v0, then enabled teams across the org to migrate — with custom tooling where AWS fell short. 8,000+ man-hours saved; error detection ~40% faster.
Land any source database — Oracle, Snowflake, DB2, Redshift — into Apache Iceberg as one cheap, queryable fabric. Adding a source means describing it in JSON, not writing a connector. 500+ tables unified across 4 teams.
A full-stack platform for tracking and signing off data variances across multiple databases — built solo in Flask, shipped to AWS after passing the enterprise Architecture Review Board. 50+ stakeholders rely on it; approval cycles cut ~70%.
AI products I've designed, built, and shipped end to end — live on the internet, outside of work — plus the autonomous agent that runs one of them.
Turns a resume + job description into a tailored resume, ATS match score, and cover letter — with a zero-fabrication constraint. 4-stage LLM pipeline, multi-model routing, typed/validated outputs.
The autonomous Claude agent that runs Get New Resume's back office on a live EC2 box — reads incoming email, decomposes it into tasks, spawns worker sub-agents, and ships a daily briefing. Role-based (dispatcher · babysitter · briefer), token-budgeted, self-healing via cron.
Personalized AI news briefings in a 'Morning Brew' voice. A 3-stage Step Functions pipeline (curator → editor → dispatcher) across 14 Lambdas, timezone-aware scheduling, ~$0.02 per briefing.
LLM-powered motivational platform — personalized content tuned to your growth goals, with scheduled delivery, feedback collection, and analytics.
A web app hosting interactive widgets (clocks, timers, counters) and icon packs you can embed in Notion. Python/Flask backend on EC2 behind Nginx + Cloudflare.
Apps, experiments, a published Python package, and undergrad research — the things I build to learn.
Keyboard-first 'mission control' for running many AI agents at once — drag-drop task lanes per agent, a Cmd-K command palette, and PM-grade timeline / radar / kanban views. Built it to manage my own parallel Claude runs.
Native iOS hydration tracker (SwiftUI) — animated progress styles, 90-day history, smart reminders, streaks, and full VoiceOver/dark-mode support. Built App-Store-ready, MVVM + OSLog.
Serverless student social platform — FastAPI on AWS Lambda with Cognito auth, API Gateway, and S3, plus a Vite/React front end. Multi-stage dev/staging/prod infra via the Serverless Framework.
A Python package (published to PyPI) for statistical operations and plots over binomial & gaussian distributions, built with OOP on pandas / NumPy / matplotlib.
Undergrad research: a Flood-It solver algorithm in Python (presented at the Mathematics Association of America) and a bio-robotics study on hybrid artificial/biological structures (SJC research symposium).
Building something that needs solid data & AI plumbing?
↵ dhirajc963@gmail.com