Implement the following plan:
Gitgres: PostgreSQL as a Git Backend
Context
Store git objects and refs in Postgres tables. Two implementations: pure SQL (PL/pgSQL functions, works on managed Postgres) and a C extension (native types, faster parsing). A libgit2-based bridge program registers Postgres as the storage backend so standard git push/fetch/clone work against the database.
Project Structure
gitgres/
Makefile # top-level, delegates to sub-makes
sql/
schema.sql # tables, indexes
functions/
object_hash.sql
object_read_write.sql
tree_parse.sql
commit_parse.sql
ref_manage.sql
views/
queryable.sql # materialized views
ext/ # Postgres C extension
Makefile # PGXS
gitgres.c # PG_MODULE_MAGIC, init
git_oid_type.c # 20-byte OID type with operators
sha1_hash.c # C SHA1 via OpenSSL
tree_parse.c # C tree entry parser
gitgres.control
sql/gitgres--0.1.sql # CREATE TYPE, CREATE FUNCTION
backend/ # libgit2 pluggable backend
Makefile
main.c # CLI entry point (serves repos)
odb_postgres.c # git_odb_backend implementation
odb_postgres.h
refdb_postgres.c # git_refdb_backend implementation
refdb_postgres.h
writepack_postgres.c # packfile receive handling
import/
gitgres-import.sh # shell script: git plumbing + psql
test/
test_helper.rb
schema_test.rb
object_hash_test.rb
object_store_test.rb
tree_parse_test.rb
commit_parse_test.rb
ref_test.rb
roundtrip_test.rb # push then clone, diff working trees
Schema
Depends on pgcrypto (available on all managed Postgres). OIDs stored as bytea (20 bytes raw).
sql
PL/pgSQL Functions
git_object_hash(type, content) -- SHA1 of <type> <size>\0<content> using pgcrypto digest()
git_object_write(repo_id, type, content) -- compute hash, INSERT ON CONFLICT DO NOTHING, return oid
git_object_read(repo_id, oid) -- return (type, size, content)
git_object_read_prefix(repo_id, prefix, prefix_len) -- abbreviated OID lookup
git_tree_entries(content) -- parse binary tree into (mode, name, entry_oid) rows
git_commit_parse(content) -- parse commit into tree_oid, parent_oids[], author fields, committer fields, message
git_ref_update(repo_id, name, new_oid, old_oid, force) -- compare-and-swap with SELECT FOR UPDATE
git_ls_tree_r(repo_id, tree_oid, prefix) -- recursive tree walk returning (mode, path, oid, type)
Materialized Views
sql
libgit2 Backend (C)
This is the bridge. Implement git_odb_backend and git_refdb_backend structs that store/retrieve from Postgres via libpq. libgit2 handles all protocol work (pack negotiation, delta resolution, ref advertisement).
ODB Backend (odb_postgres.c)
Implements these callbacks from git_odb_backend:
read(data, len, type, backend, oid) -- SELECT type, size, content FROM objects WHERE repo_id=$1 AND oid=$2
read_header(len, type, backend, oid) -- SELECT type, size FROM objects WHERE ...
read_prefix(full_oid, data, len, type, backend, short_oid, prefix_len) -- prefix match on oid column
write(backend, oid, data, len, type) -- INSERT INTO objects ... ON CONFLICT DO NOTHING
exists(backend, oid) -- SELECT 1 FROM objects WHERE ...
exists_prefix(full_oid, backend, short_oid, prefix_len) -- prefix existence check
foreach(backend, cb, payload) -- SELECT oid FROM objects WHERE repo_id=$1, call cb for each
writepack(writepack_out, backend, odb, progress_cb, payload) -- return a git_odb_writepack that accumulates pack bytes, then on commit delegates to libgit2's indexer to extract objects and calls write for each
free(backend) -- close PG connection
The backend struct holds a PGconn* and the repo_id.
Refdb Backend (refdb_postgres.c)
Implements these callbacks from git_refdb_backend:
exists(exists_out, backend, ref_name) -- SELECT 1 FROM refs WHERE ...
lookup(ref_out, backend, ref_name) -- SELECT oid, symbolic FROM refs WHERE ..., construct git_reference
iterator(iter_out, backend, glob) -- SELECT name, oid, symbolic FROM refs WHERE name LIKE ..., return custom iterator
write(backend, ref, force, who, message, old_id, old_target) -- CAS update using a transaction with SELECT FOR UPDATE
rename(ref_out, backend, old_name, new_name, force, who, message) -- UPDATE refs SET name=$1
del(backend, ref_name, old_id, old_target) -- DELETE with CAS check
has_log / ensure_log / reflog_read / reflog_write / reflog_rename / reflog_delete -- operate on the reflog table
lock / unlock -- use Postgres advisory locks (pg_advisory_xact_lock)
free(backend) -- close connection
CLI Entry Point (main.c)
A program that serves one or more repos. For testing, it can run as a one-shot helper that git invokes. Longer term it could serve HTTP smart protocol or the git:// protocol.
Initial approach: implement a git-remote-gitgres helper. When you do git clone gitgres::postgres://localhost/mydb/myrepo, git invokes git-remote-gitgres postgres://localhost/mydb/myrepo. The helper:
- Parses the connection string and repo name
- Connects to Postgres
- Creates a libgit2 repo with the Postgres ODB and refdb backends
- Speaks the transport protocol over stdin/stdout
libgit2 has git_transport_register for custom transports, or we can use the simpler git_remote_callbacks approach.
C Extension (ext/)
The Postgres extension, separate from the libgit2 backend. Adds performance for SQL queries.
git_oid type: 20-byte fixed binary, hex I/O, btree + hash operator classes. Replaces bytea columns with a proper type so you can write WHERE oid = 'abc123...'.
git_object_hash_c(type, content) RETURNS git_oid: C SHA1 via OpenSSL, ~10x faster than pgcrypto
git_tree_entries_c(content) RETURNS SETOF (mode, name, oid): C pointer arithmetic instead of PL/pgSQL byte walking
Built with PGXS, links against OpenSSL.
Import Script (import/gitgres-import.sh)
Shell script that imports an existing git repo into the database using git plumbing commands and psql.
Usage: gitgres-import <repo-path> <conninfo> <repo-name>
git -C <repo> rev-list --objects --all to get all object OIDs
- Pipe through
git -C <repo> cat-file --batch to get type + content
- Compute hashes (or trust git), INSERT into objects table via psql COPY
git -C <repo> show-ref to get all refs, INSERT into refs table
Build Order
-
Schema + object hash + read/write functions -- the foundation. Verify that git_object_hash(3, 'hello') matches git hash-object --stdin.
-
Tree + commit parsing functions -- test against real objects from a git repo using git cat-file.
-
Import script -- import a small test repo (~50 commits). Verify object count matches git rev-list --objects --all | wc -l.
-
Materialized views -- REFRESH after import. Write example queries against commits and trees.
-
libgit2 ODB backend -- implement read/write/exists. Test by creating a libgit2 repo with the Postgres backend, writing objects through it, verifying they appear in the database.
-
libgit2 refdb backend -- implement CRUD + iterator. Test ref creation and lookup.
-
git-remote-gitgres helper -- wire up libgit2 transport so git push/clone work.
-
Roundtrip test -- push a repo, clone it elsewhere, diff the working trees.
-
C extension -- git_oid type, C hash, C tree parse. Benchmark against PL/pgSQL.
-
Scale test -- import homebrew-core.
Testing
Minitest. Each test connects to a gitgres_test database, runs schema setup in a BEGIN, rolls back in teardown. SQL functions tested by comparing output against git CLI on the same objects.
Roundtrip integration test: create temp repo with commits, push via the backend, clone into new directory, diff -r.
Key Decisions
- bytea(20) for OIDs: half the storage of hex text, faster comparisons. The C extension upgrades this to a proper
git_oid type.
- Content stored without git header: type and size in columns, OID recomputed from type+content on write for verification.
- libgit2 for the protocol bridge: it handles pack negotiation, delta resolution, ref advertisement, capabilities. We only implement storage.
- writepack delegates to libgit2 indexer: when receiving a push, we let libgit2 parse the packfile and extract objects, then store them individually. We don't need our own pack parser.
- pgcrypto for pure SQL SHA1: available on all managed Postgres services.
- No delta compression in pack generation initially: libgit2 handles this when building packs for fetch.
If you need specific details from before exiting plan mode (like exact code snippets, error messages, or content you generated), read the full transcript at: /Users/andrew/.claude/projects/-Users-andrew-code-gitgres/0d86fef0-4a2d-473c-bb7e-29e502603055.jsonl