Can you read this thread: https://x.com/jamesacowling/status/2042656748738351136, and turn it into a reusable skill.
X/Twitter requires JavaScript to render, so direct fetching won't work. Let me try alternative approaches to get the thread content.
Let me try searching the web for the thread content instead.
The first search found an older thread about principal engineering, not matching that tweet ID. Let me search more specifically.
I found the thread via fxtwitter. Let me get the full content.
Got tweet 1/5. Now let me find the rest of the thread.
Here's the text: Time for a big systems advice thread!
In distributed systems there's no magic "push everything to prod at once" button. Every service gets pushed independently and nodes within a service get updated incrementally. If you mess up forwards/backwards compatibility you can fail irrecoverably.
So how to avoid this?
1/5: Decouple data and code changes. Never push out a release that changes how data is stored at the same time as the code that uses this new data. If there's a bug and you need to roll back to the old version of your code it won't be able to handle the new data in the new format. Instead push out a release that first changes the data in a way that’s compatible with both the old and new code (e.g., optional fields etc), when that’s stable push out the new code that uses it, then when that’s stable you can change the data to remove backwards compatibility. This is known as a “migration” in the database world and yes it’s annoying, but yes you need to do it.
1:31 PM · Apr 10, 2026
·
173
Views
Relevant
James Cowling
@jamesacowling
·
12m
2/5: Don’t change two services at once. If service A talks to service B, you can’t just add a new API to both of them and push them out. What if someone pushes A but not B? What if there’s a bug in B that needs to be rolled back? Just like with data changes, API changes need to be made in a forwards and backwards compatible way. Engineers forget to do this all the time.
James Cowling
@jamesacowling
·
12m
3/5: Only allow one version “step” to exist in prod at any time. It’s common to have most of your nodes at version 5 but a few are still at version 4 because they haven’t finished migrating. Never ever allow someone to push to version 6 while version 4 is still running. Otherwise it’s too hard for engineers to reason about which version is “stable” when making multi-step migrations. You need monitoring and alerting for this plus protections agains corner cases, e.g., if a node was offline and then came back online after missing a code push.
James Cowling
@jamesacowling
·
12m
4/5: Codify forwards/backwards compatibility in your release process by pushing some nodes to the new version, run them for a while and keep track of monitoring, then roll them back before doing the full release again. If someone messed up a migration you want to smoke this out on some staging nodes or at small scale while an operator is watching not once your entire system is down.
James Cowling
@jamesacowling
·
12m
5/5: Design cleanly composed systems with simple APIs, thin clients, type safety, information hiding, and well-articulated guarantees. Avoid anything fancy or ambiguous in your APIs. This is the only way you can feasibly maintain a large distributed system with multiple versions of every service running at once. It’s also an area where skilled humans significantly outperform LLMs, like it’s not even close. Don’t turn off your brain just yet.
James Cowling
@jamesacowling
·
12m
Follow-up note to
@convex
customers: you don't have to think about most of these things, e.g., we force you to finish a schema migration before pushing the code that uses it, we handle version skew internally for you, etc, but you still do have to think about what happens if an old client shows up and calls a function that doesn't exist anymore. This is rarely a problem on the web but is an issue for mobile apps. We'd like to make this problem go away too but for now if you're a mobile developer you probably already know you should be careful about deprecating APIs, since client code can live for a long time.
Thanks for pasting the thread. Let me check how skills are structured in your setup, then create one.
Now I understand the structure. I'll create a skill that reviews code changes against the 5 distributed systems safety principles from that thread.
Now create the slash command so it can be invoked with /dist-review.
The skill is live. Here's what was created:
Skill: /Users/tarunsachdeva/.claude/skills/distributed-systems-review/skill.md
Command: /Users/tarunsachdeva/.claude/commands/dist-review.md
You can invoke it with /dist-review (optionally passing file paths or a PR number). It reviews code changes against James Cowling's 5 principles:
Each principle gets a PASS / WARN / FAIL rating with an explanation. It defaults to reviewing your current git diff if no arguments are given.
Here's the text:
Review code changes against five critical distributed systems safety principles.
Based on James Cowling's (@jamesacowling) systems advice thread.
In distributed systems there's no "push everything to prod at once" button. Every
service gets pushed independently and nodes within a service get updated
incrementally. If you mess up forwards/backwards compatibility you can fail
irrecoverably.
Analyze the staged changes (or specified files/PR) and evaluate against each of
these five principles. For each principle, output one of:
Never push a release that changes how data is stored at the same time as the code
that uses the new data. If there's a bug and you need to roll back the old code,
it won't be able to handle data in the new format.
Safe pattern (3-step migration):
Red flags to look for:
If service A talks to service B, you can't add a new API to both and push them
simultaneously. What if A deploys but B doesn't? What if B has a bug and needs
rollback?
Red flags to look for:
Never allow version N+2 to be pushed while version N is still running. It's too
hard for engineers to reason about which version is "stable" when making multi-step
migrations.
Red flags to look for:
Push some nodes to the new version, run them, keep track of monitoring, then roll
them back before doing the full release. Smoke out migration bugs at small scale
while an operator is watching, not once the entire system is down.
Red flags to look for:
Use simple APIs, thin clients, type safety, information hiding, and
well-articulated guarantees. Avoid anything fancy or ambiguous in APIs. This is the
only way to feasibly maintain a large distributed system with multiple versions of
every service running at once.
Red flags to look for:
Structure your review as:
## Distributed Systems Safety Review ### 1. Data/Code Decoupling: [PASS|WARN|FAIL] [explanation] ### 2. Single-Service Changes: [PASS|WARN|FAIL] [explanation] ### 3. One Version Step: [PASS|WARN|FAIL] [explanation] ### 4. Release Process Safety: [PASS|WARN|FAIL] [explanation] ### 5. API Cleanliness: [PASS|WARN|FAIL] [explanation] ### Summary [overall assessment and recommended actions]
git diff --cached),git diff).$ARGUMENTS
Want to make any final edits? I'm going to publish this conversation online.
Looks good to me — no edits needed. It's ready to use as /dist-review.