Now I have a thorough understanding of both the forges module and the git-pkgs CLI tool. Let me now design the plan. Let me first check what the GitHub API provides for listing org repos, and look at the other SDKs' org listing capabilities.
Let me now think about the design and write the plan.
Key decisions:
-
Forges module changes: Add ListRepositories(ctx, owner) returning []Repository to the Forge interface. Each backend implements it with pagination.
-
Orgs CLI tool: New Go module in this directory that:
- Takes an org name + forge domain
- Uses forges to list repos
- Clones each repo (shallow)
- Runs git-pkgs indexing on each
- Stores per-repo databases in a data directory
- Maintains a meta SQLite DB that tracks repos and aggregated data
- Provides org-level queries
For the CLI commands, I'm thinking git orgs init to bootstrap the org by listing and indexing all repos, git orgs stats for aggregated metrics, git orgs search to find which repos use a specific package, and git orgs vulns to surface vulnerabilities across the organization.
I'd also want git orgs licenses for compliance checking, git orgs list to see what's being tracked, and git orgs sync to refresh the indexes.
For the meta database structure, I really just need a repos table with basic info like the full name, owner, language, and last indexed timestamp. The key insight is that each repo already has its own complete dependency database, so the meta DB doesn't need to duplicate that—it just tracks which repos exist and where to find their databases. Then I can leverage SQLite's ability to attach multiple databases at query time, letting me run cross-repo queries by attaching each repo's database and querying across them all at once.
But there's a practical limit—SQLite caps attached databases at around 10 by default, though it's configurable up to 125. For organizations with lots of repos, that won't work. A better approach is to have the meta database pull aggregated snapshots from each repo's database during sync, storing the dependency data alongside a repo reference. That way org-level queries just hit a single database.
Defining the schema...
Now I'm thinking through the sync workflow: for each repo, I'd open its local database, extract the latest dependencies and stats, then upsert them into the meta database. For the directory structure, since this is a CLI tool in the orgs repo, I'm organizing it with a main entry point, command modules for each operation, and internal packages for database and workspace management.
But the user's actual goal is investigatory—they want to explore using the forges module to aggregate stats across multiple repos. So rather than over-engineering the full CLI, I should focus on the core pieces: leveraging the new ListRepositories functionality, building a minimal CLI that can list org repos, clone and index them, and surface aggregated stats. One constraint I'm noticing is that git-pkgs keeps its internals private, so I'd need to either shell out to existing CLI commands, duplicate logic, or work with the public modules like manifests and registries. Shelling out seems like the most pragmatic approach for now.
Actually, there's a better path: I can clone the repos, call git pkgs init on each one to populate their databases, then read the .git/pkgs.sqlite3 files directly since the schema is stable and versioned. This gives me full data access without reimplementing internal code. Now I need to think through the Forge interface change—adding ListRepositories will break all existing implementations, so I should consider a separate interface to avoid that.
I'm defining an OrgForge interface with the ListRepositories method, then having the Client check if a forge implements it before calling the method. But actually, since all four backends can support listing repositories, I should just add it directly to the main interface instead.
The forges module is still in v0.x, so interface changes are expected. Looking at how each SDK handles this: GitHub has separate methods for listing org and user repositories, GitLab uses different endpoints for groups versus users, and Gitea has dedicated methods for org and user repos as well.
For Bitbucket, it's a simple GET request to the repositories endpoint. The tricky part is that the caller shouldn't need to know whether the owner is an organization or user—I should handle that abstraction in the implementation. I could try the org endpoint first and fall back to user on failure, or add an optional OwnerType field to ListOptions to let callers specify. The simplest approach is probably to just attempt both endpoints transparently.
Now I'm thinking through the interface design: should ListRepositories live on the Forge interface taking just the owner, while the Client method takes the full URL and parses it out? That would keep the pattern consistent with how FetchRepository and FetchTags work.
For the CLI itself, I'll start with a focused two-phase approach: first add the ListRepositories method to the Forge interface and implement it across all four backends with tests, then build out the orgs CLI command.
The CLI will need a meta database to track repositories and dependency snapshots, with commands to initialize by listing repos from the forges and cloning them, gather organization-level dependency statistics, search for which repos use specific packages, and list all tracked repositories.