A quantified-self setup with one AI agent and two readouts

I’ve been running a personal quantified-self setup for the past few weeks. The whole thing is two JSON files, one Hermes Agent in a WhatsApp chat, and two static HTML pages on this site. It works in a way that nothing I’ve tried before has.

I think the reason is straightforward. The bottleneck on quantified self was never the dashboards or the visualizations, it was data capture. And LLMs in a chat app are now good enough at parsing free-form text that capture has collapsed to “tell the agent what you did.”

The setup

I have a single WhatsApp thread with Hermes Agent. Throughout the day it nudges me every 15 minutes - but only while I’m awake - with one of two prompts:

“What did you do in the last 15 minutes?”
“Did you eat anything since last time?”

I reply in whatever shape I want. Voice notes, “half veg burger and a cold brew”, “deep work on contextgraph”, “nothing, scrolling”. Hermes parses the message, routes it to the right skill, estimates whatever needs estimating, and appends one of two JSON files:

time-audit-public.json - 96 fifteen-minute slots per day, each tagged with what I was doing and a productivity level 0-3
calories-audit-public.json - calorie entries, activity credits, sedentary TDEE, and the daily deficit number

An entry looks like this:

{
  "at": "2026-05-16T17:57:41+05:30",
  "date": "2026-05-16",
  "time": "17:57",
  "food": "half veg burger, Japanese cold brew with condensed milk, half slice marble cake",
  "calories": 550,
  "estimated": true,
  "confidence": "medium",
  "corrected_at": null
}

The JSON files get committed to the srijanshukla.com repo. Astro builds two pages from them at deploy time:

/time - every 15-minute block of every day, laid out as 96 cells per row
/calories - daily deficit, calorie bar, and activity credits

No app to open. No form to fill in. No database. The pipeline is essentially whatsapp → hermes → skills → json → static page. That’s it.

The agent is the product

The part I underplayed at first was Hermes itself.

This is not one giant prompt pretending to be a quantified-self app. It is a general agent with two small skills:

time-audit owns the wake state, the 15-minute cron, the 0-3 productivity grammar, missed slots, and the public export
food-audit owns the food nudge, rough calorie estimation, corrections, activity credits, sedentary TDEE, and the deficit math

That boundary is doing most of the work. The LLM is allowed to understand messy human text, but the skill owns the contract. It knows where the files live. It knows the command to call. It knows that food-audit shares the time-audit awake state. It knows not to turn every calorie reply into diet coaching.

A normal app would make me adapt to its form. Hermes adapts to the message, then hands off to deterministic scripts. The result is weirdly sturdy: free-form capture at the edge, boring append-only JSON at the core.

That is the real beauty of the system. The agent is not a chatbot bolted onto a tracker. The agent is the interface, the router, the memory, and the operator. The pages are just readouts.

Why this works now

I’ve tried quantified self before. So has everyone. The classic failure mode is: you buy the wearable / install the app / open the spreadsheet, you use it for two weeks, you stop. The data either lies (because the wearable inferred wrong) or stops (because the manual entry friction beat your discipline).

A personal AI agent fixes capture in a way I didn’t expect. The friction floor isn’t “open the app and tap a button” anymore - it’s “type a sentence to the chat thread that’s already on your phone screen.” I was going to type something to someone anyway. Now some of those someones are the agent.

The estimation piece matters too. If I had to look up every food’s calories I’d quit by day three. Hermes guesses, marks the guess as estimated: true, and moves on. I can override later if I care. The guess is honest about being a guess - which turns out to be the whole point.

Why only two

I deliberately chose to track exactly two things. The temptation to track more is constant. Sleep quality, mood, weight, training load, focus minutes, money, books, social interactions, screen time. I have notes-app drafts for each of these. None of them are live, and the reason is structural rather than philosophical.

The agent is a chat, not a database. A single WhatsApp thread can hold two or three nudge streams before it becomes annoying. Past that I’ll mute it, and the system will die the same way every quantified-self system dies - silently, with the user pretending it just lost their interest.

Two readouts cover the day. Time tells me what I did. Calories tell me what I put in. Sleep is implicit in /time as gaps in exported blocks, not as a literal cell state. Almost everything else - mood, energy, focus - is downstream of those two. Tracking downstream feels like measuring symptoms.

Simple is the feature. The original quantified-self movement died of bloat. Sixty metrics, dashboards no one read, wearables that flattered you. I want the opposite: a setup so small I can’t get bored of it, with readouts so blunt I can’t argue with them.

If a third sensor ever earns its place it will be because the existing two created a gap I can’t reason around. Until then, the list is closed.

The honesty tax

This is the design pattern I’m proudest of. Both readouts make the gaps in the log visible as first-class data.

On /time, missed nudges show up as amber-outlined cells only when there is no logged block for that bucket. Not hidden, not interpolated - visibly missed. If I have a string of amber cells through Sunday afternoon, the page just says so.

On /calories, every calorie entry still has an estimated: true/false flag in the JSON. The page has an “honesty tax” meter that shows the ratio. As I write this it’s at 43% - almost half my calorie log is rough guesses. The meter exists to erode trust in the readout itself when guesses pile up. The more I estimate, the less you (and I) should trust the numbers.

I haven’t seen this pattern elsewhere. Most tracking apps hide their uncertainty. This one surfaces it as a metric.

What it doesn’t do

A few things I have deliberately not built:

No coaching. The page never tells me to do anything. No “you should…”, no green-when-good badges, no streak guilt-tripping. Strava is the anti-pattern here.
No analysis. Calories are calories. I don’t break them into protein/carb/fat because the WhatsApp pipeline can’t reliably capture that, and I don’t care enough to make it.
No reactivity. The pages are statically rendered at deploy time. Reload and you see the last commit. No live charts, no websockets, no “now syncing…” cheerfulness.
No fake placeholder data. Empty days render as empty rows. The page handles two days of data and sixty days of data the same way.

The pages

If you want to look at the actual things:

/time - time audit. Each day is a row of 96 resolved 15-minute buckets. Empty means no exported block, missed nudges are amber, productive blocks fill in.
/calories - calories audit. Today’s deficit, the calorie bar, and activity credited on top of sedentary TDEE.
/quantified-self - the parent page that ties it together.

The data files are in the same repo, public, generated from append-only private logs. Privacy is by field selection - Hermes strips raw source text before commit, so the JSON has cleaned labels and numbers but not the original WhatsApp messages.

What’s next

Honestly, nothing structural. The temptation to add features is the failure mode I’m trying to avoid. The setup needs to run for six months without me touching it before I would consider adding a third sensor. If it is still alive in November and there is a gap I cannot reason around, I will think about it then.

In the meantime I am watching the readouts. The /time 45-day matrix is the one I expected to be useful and turned out to be humbling - my real schedule looks nothing like the schedule in my head. That alone has been worth the build.

If you build something similar I would love to see it. The interesting question now is not how to track but what is worth tracking, and the answers should be small.