Summary

Can you read this thread: https://x.com/jamesacowling/status/2042656748738351136, and turn it into a reusable skill.

X/Twitter requires JavaScript to render, so direct fetching won't work. Let me try alternative approaches to get the thread content.

Let me try searching the web for the thread content instead.

The first search found an older thread about principal engineering, not matching that tweet ID. Let me search more specifically.

I found the thread via fxtwitter. Let me get the full content.

Got tweet 1/5. Now let me find the rest of the thread.

Request interrupted by user for tool use

Here's the text: Time for a big systems advice thread!

In distributed systems there's no magic "push everything to prod at once" button. Every service gets pushed independently and nodes within a service get updated incrementally. If you mess up forwards/backwards compatibility you can fail irrecoverably.

So how to avoid this?

1/5: Decouple data and code changes. Never push out a release that changes how data is stored at the same time as the code that uses this new data. If there's a bug and you need to roll back to the old version of your code it won't be able to handle the new data in the new format. Instead push out a release that first changes the data in a way that’s compatible with both the old and new code (e.g., optional fields etc), when that’s stable push out the new code that uses it, then when that’s stable you can change the data to remove backwards compatibility. This is known as a “migration” in the database world and yes it’s annoying, but yes you need to do it.
1:31 PM · Apr 10, 2026
·
173
Views
Relevant

James Cowling

@jamesacowling
·
12m
2/5: Don’t change two services at once. If service A talks to service B, you can’t just add a new API to both of them and push them out. What if someone pushes A but not B? What if there’s a bug in B that needs to be rolled back? Just like with data changes, API changes need to be made in a forwards and backwards compatible way. Engineers forget to do this all the time.
James Cowling

@jamesacowling
·
12m
3/5: Only allow one version “step” to exist in prod at any time. It’s common to have most of your nodes at version 5 but a few are still at version 4 because they haven’t finished migrating. Never ever allow someone to push to version 6 while version 4 is still running. Otherwise it’s too hard for engineers to reason about which version is “stable” when making multi-step migrations. You need monitoring and alerting for this plus protections agains corner cases, e.g., if a node was offline and then came back online after missing a code push.
James Cowling

@jamesacowling
·
12m
4/5: Codify forwards/backwards compatibility in your release process by pushing some nodes to the new version, run them for a while and keep track of monitoring, then roll them back before doing the full release again. If someone messed up a migration you want to smoke this out on some staging nodes or at small scale while an operator is watching not once your entire system is down.
James Cowling

@jamesacowling
·
12m
5/5: Design cleanly composed systems with simple APIs, thin clients, type safety, information hiding, and well-articulated guarantees. Avoid anything fancy or ambiguous in your APIs. This is the only way you can feasibly maintain a large distributed system with multiple versions of every service running at once. It’s also an area where skilled humans significantly outperform LLMs, like it’s not even close. Don’t turn off your brain just yet.
James Cowling

@jamesacowling
·
12m
Follow-up note to
@convex
customers: you don't have to think about most of these things, e.g., we force you to finish a schema migration before pushing the code that uses it, we handle version skew internally for you, etc, but you still do have to think about what happens if an old client shows up and calls a function that doesn't exist anymore. This is rarely a problem on the web but is an issue for mobile apps. We'd like to make this problem go away too but for now if you're a mobile developer you probably already know you should be careful about deprecating APIs, since client code can live for a long time.

Thanks for pasting the thread. Let me check how skills are structured in your setup, then create one.

Now I understand the structure. I'll create a skill that reviews code changes against the 5 distributed systems safety principles from that thread.

Now create the slash command so it can be invoked with /dist-review.

The skill is live. Here's what was created:

Skill: /Users/tarunsachdeva/.claude/skills/distributed-systems-review/skill.md
Command: /Users/tarunsachdeva/.claude/commands/dist-review.md

You can invoke it with /dist-review (optionally passing file paths or a PR number). It reviews code changes against James Cowling's 5 principles:

Decouple data and code changes — 3-step migration pattern
Don't change two services at once — forward/backward compatible API changes
Only one version step in prod — no skipping migration steps
Codify compatibility in the release process — canary deploys, rollback testing
Design cleanly composed systems — simple APIs, type safety, information hiding

Each principle gets a PASS / WARN / FAIL rating with an explanation. It defaults to reviewing your current git diff if no arguments are given.

Here's the text:

name: distributed-systems-review
description: Review code changes for distributed systems safety — checks forward/backward compatibility, safe migrations, API versioning, and multi-service deployment risks.
metadata:
author: tarunsachdeva
version: "1.0.0"
cli-contract-version: "1"
argument-hint: [optional file paths or PR number to review]
source: "https://x.com/jamesacowling/status/2042656748738351136"

Distributed Systems Release Safety Review

Review code changes against five critical distributed systems safety principles.
Based on James Cowling's (@jamesacowling) systems advice thread.

Triggers

"review for distributed systems safety"
"check release compatibility"
"migration safety review"
"/dist-review"

Context

In distributed systems there's no "push everything to prod at once" button. Every
service gets pushed independently and nodes within a service get updated
incrementally. If you mess up forwards/backwards compatibility you can fail
irrecoverably.

Review Checklist

Analyze the staged changes (or specified files/PR) and evaluate against each of
these five principles. For each principle, output one of:

PASS — no issues found
WARN — potential concern worth verifying
FAIL — this change violates the principle and needs to be fixed before deploy

1. Decouple Data and Code Changes

Never push a release that changes how data is stored at the same time as the code
that uses the new data. If there's a bug and you need to roll back the old code,
it won't be able to handle data in the new format.

Safe pattern (3-step migration):

Deploy a release that changes data in a way compatible with both old and new code (e.g. optional fields)
When stable, deploy the new code that uses the new data format
When stable, remove backwards compatibility from the data layer

Red flags to look for:

Schema migrations bundled with application logic changes in the same changeset
New required fields added without defaults or fallback handling
Data format changes alongside the code that reads/writes the new format
ORM model changes coupled with business logic changes

2. Don't Change Two Services at Once

If service A talks to service B, you can't add a new API to both and push them
simultaneously. What if A deploys but B doesn't? What if B has a bug and needs
rollback?

Red flags to look for:

New RPC/API endpoints added to a service alongside a client that calls them
Breaking changes to request/response schemas without versioning
Tight coupling between service deployments (e.g. "deploy A then immediately deploy B")
Removal of API endpoints that other services may still call

3. Only Allow One Version Step in Prod

Never allow version N+2 to be pushed while version N is still running. It's too
hard for engineers to reason about which version is "stable" when making multi-step
migrations.

Red flags to look for:

Multi-step migrations that assume previous migrations have completed
Missing version checks or guards in migration scripts
No monitoring/alerting for version skew across nodes
Migrations that skip intermediate states

4. Codify Forward/Backward Compatibility in the Release Process

Push some nodes to the new version, run them, keep track of monitoring, then roll
them back before doing the full release. Smoke out migration bugs at small scale
while an operator is watching, not once the entire system is down.

Red flags to look for:

No canary or staged rollout strategy
Missing rollback plan or rollback not tested
No health checks or monitoring for the new version
Changes that are difficult or impossible to roll back

5. Design Cleanly Composed Systems

Use simple APIs, thin clients, type safety, information hiding, and
well-articulated guarantees. Avoid anything fancy or ambiguous in APIs. This is the
only way to feasibly maintain a large distributed system with multiple versions of
every service running at once.

Red flags to look for:

Complex, overloaded API endpoints doing too many things
Leaky abstractions exposing internal implementation details
Missing or weak type definitions for API contracts
Implicit contracts or undocumented behavior between services

Output Format

Structure your review as:

## Distributed Systems Safety Review

### 1. Data/Code Decoupling: [PASS|WARN|FAIL]
[explanation]

### 2. Single-Service Changes: [PASS|WARN|FAIL]
[explanation]

### 3. One Version Step: [PASS|WARN|FAIL]
[explanation]

### 4. Release Process Safety: [PASS|WARN|FAIL]
[explanation]

### 5. API Cleanliness: [PASS|WARN|FAIL]
[explanation]

### Summary
[overall assessment and recommended actions]

How to Run

If given file paths or a PR number, review those specific changes.
If no arguments, review the current staged git changes (git diff --cached),
or if nothing is staged, review unstaged changes (git diff).
Read relevant files to understand the full context of each change.
Apply each of the 5 checks above.
Output the structured review.

$ARGUMENTS

Want to make any final edits? I'm going to publish this conversation online.

Looks good to me — no edits needed. It's ready to use as /dist-review.

Create Skill to Review Distributed Systems Safety

Create Skill to Review Distributed Systems SafetyPublic

Summary

Distributed Systems Release Safety Review

Triggers

Context

Review Checklist

1. Decouple Data and Code Changes

2. Don't Change Two Services at Once

3. Only Allow One Version Step in Prod

4. Codify Forward/Backward Compatibility in the Release Process

5. Design Cleanly Composed Systems

Output Format

How to Run