available · open to interesting work | loc bengaluru, ist | local --:-- srijanshukla18@gmail.com
[page] /resume · builder who ships

Resume.

senior SRE/ AI builder/ 3× founder· 8+ yrs infra · 9 yrs shipping

I build reliable agent systems, developer tooling, and production infrastructure. Strong in Python and Go. Comfortable from 0→1 product work through production hardening and platform reliability.

based
bengaluru, ist
github
srijanshukla18 · 140★
linkedin
contact
[01]

tldr

#tldr

Senior SRE turned AI builder. 8+ years in infrastructure and reliability across AWS, Azure, and GCP, with hands-on ownership of large-scale systems, incident response, observability, and cost optimization.

In parallel, I build applied AI products with real users. Open-source MCPs and AI dev tools at 140+ GitHub stars. Founder energy, operator discipline.

[02]

focus

04 #focus
  • LLM workflows and tooling
  • Distributed systems & platform reliability
  • Developer ergonomics and observability
  • Product-minded infrastructure decisions
[03]

experience

07 · 2016 → present #experience
[01]
Independent AI Builder
self · 2023 - Present

Building and shipping applied-LLM products and infra with real adoption. RAG, knowledge graphs, agent systems.

  • claude-memory-viz: embedding + clustering visualizer for Anthropic's Claude MCP memory (95★).
  • xray: MCP for progressive code intelligence via ast-grep (43★).
  • contextgraph: decision audit ledger for AI agents; full-stack SDK + server + UI.
  • alpha: IAM policy rightsizing agent with AI risk signals and instant rollback.
  • logsieve: log dedup sidecar using Drain3, production-ready Helm chart, Go.
  • wiki-in-a-box: offline Wikipedia with hybrid no-index RAG.
  • ita-kg: Income Tax Act knowledge graph + RAG for legal lookup.
  • murmur: voice interface for Claude Code / Codex CLI via whisper.cpp with Metal accel.
  • kubectl-smart: CLI that turns Kubernetes debugging into signal prioritization.
[02]
Senior SRE
SteelEye · Jul 2023 - Present
  • Primary SRE for a tier-1 client; deployments, prod debugging, and maintenance.
  • Led infra for a major POC; implemented Azure RBAC with managed identities and SDK-level deep dives.
  • Cut AWS spend by ~$800K/yr as a core infra contributor on a cloud-agnostic migration.
  • Elasticsearch snapshot archival saving ~$8,900/mo in storage.
  • Multi-tenant monitoring with Grafana / Prometheus / Loki / Tempo, with retention strategy.
  • Co-built ops automation platform in Go + Temporal for long-running maintenance workflows.
  • Packaged Ansible as a Kubernetes Job via Helm for env bootstrapping (secrets, RBAC, base config).
  • Mentored juniors; incident management within the team.
[03]
Software Engineer / Sr. SRE
Last9 · Dec 2021 - Apr 2023
  • Owned infra for a metrics+events ingest peaking at 350M events/min; tuned throughput, reliability, cost.
  • AWS + GCP perf tuning and capacity planning; ~20% cost reduction via rightsizing.
  • Guided Kubernetes orchestration choices; documented FMEA for critical paths.
  • Streamlined onboarding via VPC peering and AWS PrivateLink; shipped metrics + log pipelines end-to-end.
  • Hardened backups + security controls mapped to SOC 2; improved incident readiness.
[04]
Sr. Software Engineer (Infrastructure)
TNG Innovation Labs · Jan 2019 - Nov 2021
  • Scaled data pipelines to 10M+ packets/day; owned infra, observability, release pipelines.
  • System design + interim tech lead + interim database engineer.
  • Led HTTP → MQTT transition; significant cost and performance wins.
  • MySQL → AWS RDS migration with near-zero downtime + DR plan.
[05]
Founding Engineer
Peekstreets · Feb 2024 - Jul 2025

AI infra and backend for a public-equity research SaaS.

  • RAG over multi-source datasets with retrieval caching and query planning.
[06]
Founding Engineer
LiQR (QR-Menu) · Jul 2020 - Sep 2020

Delivered MVP during COVID.

  • Flutter app + Python backend on AWS via Terraform; shipped to multiple restaurants in month one.
[07]
Founder & CTO
Deventree Solutions · May 2016 - Dec 2018

Telematics platform + cost-efficient location services.

  • Scaled to ~20k devices, ~4M messages/day; self-hosted reverse geocoder cut infra ~80%.
  • Owned frontend, backend, infra, database; led a small engineering team.
  • Shipped products on Android, Rails, Node.js, Angular, React Native.
  • Solved a complex timetabling problem with a genetic algorithm.
[04]

skills

06 groups #skills
applied ai
RAG · embeddings · LLM eval · prompt + agent design · MCPs · FAISS · pgvector · Neo4j / Cypher · ast-grep · whisper.cpp
infra
Kubernetes · Docker · Terraform · GitHub Actions · GitOps (Flux, Helm) · eBPF
cloud
AWS (EKS, EC2, RDS, S3, EFS, SNS, IAM, VPC, Lambda) · Azure · GCP
languages
Python · Go · C · Bash · JavaScript · SQL
data & ops
PostgreSQL · Redis · Elasticsearch · Kafka · Prometheus · VictoriaMetrics · Grafana · Loki · Tempo · cost opt · incident response
databases
MySQL · MongoDB · PostgreSQL · Redis · Memcached
[05]

education & certifications

03 #education
  • B.Tech, Computer Science & Engineering · Christ University, Bangalore · 2013 - 2017
  • Certified Kubernetes Administrator (CKA) · Linux Foundation · Sep 2021
  • Y Combinator Startup School · Jun 2017
[06]

talks

02 #talks
  • Emerging architectures for AI agents - One2N Mitramandal Ep 3 · podcast-talk on models, harnesses, context, long-running control loops, multi-agent boards, SKILL.md, MCPs, memory systems, harness-level safety · Apr 2026
  • Lightning Talk: Slow down Disk I/O - Flash talk on stopping rm from nuking your SSD; deep dive into I/O throttling with rsync, ionice, and cgroups v2 · Jan 2023