Most multi-agent demos look clean for the same reason distributed systems diagrams look clean: the hard part is hidden. You see two agents. You see messages moving between them. You see the final state. What you usually do not see is the part that actually determines whether the setup is usable: how do two independent agents on two machines share durable state without sharing runtime state?
That was the problem I was trying to solve. Not task delegation. Not “agent A asks agent B to run a command.” I wanted something stricter and more durable: two agent instances, two laptops, shared skills, a shared user profile, and a lightweight way to exchange notes without turning the whole setup into infrastructure work.
My first instinct was a shared filesystem. It felt obvious. Mount one directory on both machines, create an inbox for each agent, write one JSON file per message, and call it a day.
That was the shortcut. And it backfired.
The short version is this: the shared mount made the architecture look simple while making the actual state uncertain. Once I saw that directory listings could stay stale for up to 30 minutes, I stopped thinking of it as a bus. It was a cache invalidation problem wearing a filesystem costume. Git ended up being the simpler system.
Let’s get into it.
The Problem I Was Actually Solving
The goal was narrow:
- two agent instances on two different machines
- local runtime on each machine stays local
- durable knowledge can move between them
- skills should propagate cleanly
- a shared user profile should converge over time
- low-frequency communication is fine
That last constraint mattered. I did not need real-time chat. A 12 hour cadence was acceptable. That removed a lot of pressure from the design. I did not need sockets, queues, webhooks, or a central database. I just needed something both agents could trust.
The First Design: A Shared Filesystem Bus
The initial layout was intentionally small:
bus/
inbox-a/
inbox-b/
archive-a/
archive-b/
PROTOCOL.txt
The protocol was equally simple:
- one new file = one new message
- sender writes only into the other agent’s inbox
- unread = file still in inbox
- read = receiver moves file to archive
- messages are JSON
- skills are not copied inside messages
- runtime state is never shared directly
A message looked like this:
{
"id": "init-bus",
"ts": "2026-04-12T00:53:30Z",
"from": "agent-a",
"to": "agent-b",
"type": "note",
"body": "Bus initialized on shared mount. Use this bus for messages; use the skills repo for skills sync."
}
This design had a lot going for it.
- It was easy to explain.
- It had a clear read marker.
- It avoided sharing the whole runtime directory.
- It kept transport and payload dead simple.
That boundary was doing most of the work. I was explicitly not trying to share the full agent home directory because that is where the real corruption risk lives: sessions, caches, logs, SQLite files, auth, and local process state. Only the communication surface was shared.
On paper, this was exactly the kind of system I like.
Why It Failed
The bus worked at the protocol layer and failed at the storage layer.
The shared directory was exposed through a mounted remote filesystem. From the shell, it looked close enough to a normal mount that the design felt reasonable. But the behavior was not filesystem-native. Directory visibility was cached.
That was the bug.
A reply could exist remotely and still not appear in the local inbox listing on the other machine. I ended up in the worst possible state for agent communication: not broken enough to fail loudly, not reliable enough to trust.
I verified the problem the annoying way:
- one machine wrote a reply
- the remote object listing showed the file existed
- the other machine’s inbox still looked empty
- clearing the local VFS cache did not immediately fix visibility
- the mount configuration exposed a
dir-cache-timeof 30 minutes, which I later broke down in What Happened When I Tried to Use S3 Like Dropbox on macOS
So the practical behavior was this:
- worst-case visibility lag: about 30 minutes
- message transport: maybe successful
- inbox state: uncertain
- debugging story: terrible
That distinction matters. The hardest part was not sending the message. The hardest part was knowing whether the other machine had actually seen it.
Once that happened, the whole idea of “shared state through a shared directory” stopped being useful. A bus is only a bus if both sides can trust what they are reading.
I Briefly Considered Real NFS
At that point, the obvious thought was: fine, stop pretending object storage is a filesystem and use a real network filesystem.
That led to the usual path:
- EFS or another NFS-backed share
- private networking
- VPN or Tailscale subnet routing
- more mounts
- more network hops
- more infrastructure to babysit
Could that have worked? Probably.
Was it the right solution for a low-frequency agent bus? No.
The system was supposed to make two agents easier to coordinate. If the communication layer needs a VPC, a subnet router, and an always-on mount path just to move a note every few hours, the design has already lost.
The Second Design: Git as the Bus
The design got much simpler once I stopped optimizing for the illusion of a live shared filesystem.
The new model was:
- one canonical repo for shared skills
- one private repo for cross-machine bus messages and shared durable profile data
- explicit pull before read
- explicit commit and push after write
The bus repo layout looked like this:
bus-repo/
inbox-a/
inbox-b/
archive-a/
archive-b/
user-sync/
USER.md
README.md
The protocol became:
git pullbefore checking inbox- read inbox
- if processing a message, move it to archive
git add -A && git commit && git push- if sending a message, write it to the other inbox
git add -A && git commit && git push
That one decision did a lot of work.
Git replaced the fake immediacy of the mount with explicit synchronization points. There was no more pretending a local directory listing represented global truth. State only changed when you pulled. Remote truth only changed when you pushed.
For this use case, that is exactly what I wanted.
Separating Skills From Messages
There was one more cleanup that made the system better: do not mix code sync and message sync in the same repo just because you can.
The cleaner split was:
- skills repo = canonical source of reusable procedures
- bus repo = inboxes, archives, and shared durable user profile
That made the message bus smaller and easier to reason about.
The bus did not carry skill payloads. It only carried intent:
- “pull latest skills”
- “shared USER.md updated”
- “ack”
- “heartbeat”
- “state”
That boundary matters because it keeps the transport dumb. If messages start containing the actual code you are syncing, the bus stops being a bus and starts becoming a second replication system.
USER.md Needed an LLM Merge, Not a Git Merge
The shared user profile had a different failure mode.
A line-based merge is the wrong abstraction for profile memory. These files are not source code. They are compressed durable facts and preferences. Two agents can express the same fact differently, and a mechanical merge will keep both versions even when one should clearly subsume the other.
The better rule was:
git pull- read shared
user-sync/USER.md - read local
USER.md - merge semantically in-model
- keep only durable, high-signal facts
- dedup manually
- write back the canonical result
- commit and push if the shared file changed
That worked because the data was semantic, not structural.
The Automation Layer
Once the transport was explicit, automation became boring in a good way.
The final shape was:
- one unified skills sync job on a daily schedule
- one bus sync job on a lower-frequency schedule
- immediate writes only when something important changes
The skills sync job pulled from multiple local skill sources, filtered out bundled or generated junk, and pushed only user-authored skills to the canonical skills repo. The bus sync job handled inbox/archive movement and shared profile convergence.
That is the right kind of boring. No mount debugging. No invisible cache state. No guessing whether the other side is looking at the same directory view.
Why Git Won
Git won for reasons that had nothing to do with speed.
It won because it had the right failure model.
With the shared mount, the failure mode was ambiguity:
- is the file missing?
- is the listing stale?
- did the write land remotely?
- did one machine cache an empty directory?
With Git, the failure mode is explicit:
- you pulled or you did not
- you pushed or you did not
- there is a commit or there is not
- the other machine has fetched it or it has not
That framing matters. For low-frequency cross-machine agent coordination, explicit synchronization is usually better than fake shared state.
The Rule I Would Use Again
If I had to compress the whole project into one rule, it would be this:
Do not share live runtime state between agent instances. Share only durable surfaces.
In practice that means:
- keep sessions local
- keep logs local
- keep caches local
- keep auth local
- sync skills explicitly
- sync profile memory explicitly
- treat messages as durable artifacts, not ephemeral transport
The shared filesystem design taught me something useful even though I abandoned it. The naive version of multi-agent coordination always reaches for a shared directory first because it feels simple. But simplicity at the interface layer is not simplicity at the systems layer.
The moment your message bus can lie to you for 30 minutes, it is not a bus anymore.
Git was slower on paper and simpler in practice. That is the trade-off that actually mattered.