Implement the following plan:
Go REUSE parsing library
Parsing library for the REUSE specification v3.3 at github.com/git-pkgs/reuse. Extracts SPDX license and copyright information from files, REUSE.toml, .reuse/dep5, and .license sidecars. No CLI, no linting output, just parsing.
Context
git-pkgs needs to extract licensing information from repositories. The REUSE spec (fsfe.org) standardises where and how license/copyright info lives in a project. The Python reference implementation (fsfe/reuse-tool) is GPL-licensed and has complex dependencies. A focused Go parsing library lets git-pkgs consume REUSE data without shelling out to Python.
Test fixtures
Add fsfe/reuse-example as a git submodule at testdata/reuse-example for conformance testing. It has a compliant main branch with LICENSES/, SPDX headers in source, .license sidecars for images.
Also create testdata/fake_repository/ modelled on the reuse-tool's test fixtures, with various license files, source files with headers, and edge cases.
Dependencies
github.com/BurntSushi/toml for REUSE.toml parsing (Go stdlib has no TOML parser)
- SPDX expressions stored as raw strings; no dependency on git-pkgs/spdx (consumers validate if needed)
- No dep5 library needed; write a minimal parser for the Debian copyright format
Files
All under /Users/andrew/code/git-pkgs/reuse/.
reuse.go - Package docs and top-level types
Core types that other files use:
go
extract.go - SPDX tag extraction from file contents
Port of Python's extract.py. The core parsing engine.
ExtractReuseInfo(text string) ReuseInfo - find SPDX-License-Identifier, SPDX-FileCopyrightText, SPDX-FileContributor tags in text
ExtractFromFile(path string) (ReuseInfo, error) - read a file and extract
FilterIgnoreBlocks(text string) string - strip REUSE-IgnoreStart/End regions
- Regex patterns ported from Python: copyright notice pattern, license identifier pattern, contributor pattern
extract_test.go
- Tags in various comment styles (// # /* -- etc.)
- REUSE-IgnoreStart/End filtering
- Multiple copyright notices and licenses in one file
- SPDX-SnippetBegin/End handling
- Empty files, binary files (return empty ReuseInfo)
- Malformed tags
toml.go - REUSE.toml parsing
Port of Python's global_licensing.py (ReuseTOML parts).
go
ParseReuseTOML(content string) (*ReuseTOML, error)
ParseReuseTOMLFile(path string) (*ReuseTOML, error)
(a *Annotation) Matches(path string) bool - glob matching with * and ** support
(t *ReuseTOML) ReuseInfoOf(path string) (ReuseInfo, PrecedenceType, bool) - find matching annotation for a path
toml_test.go
- Valid REUSE.toml with single and multiple annotations
- Glob pattern matching (* vs ** vs escaped)
- Precedence values (closest, aggregate, override)
- Missing version field, invalid TOML, wrong version number
- Last-match-wins when multiple annotations match
- Copyright and license as string vs array
dep5.go - .reuse/dep5 parsing
Minimal Debian copyright format 1.0 parser (no external dep).
go
ParseDep5(content string) (*Dep5, error)
ParseDep5File(path string) (*Dep5, error)
(d *Dep5) ReuseInfoOf(path string) (ReuseInfo, bool) - find matching paragraph for a path
dep5_test.go
- Valid dep5 with multiple file paragraphs
- Wildcard patterns (*, ?)
- Multi-line copyright fields
- Missing required fields
- Continuation lines
covered.go - Covered file logic
Determines which files need licensing info and which are excluded per the spec.
IsCoveredFile(path string) bool - checks against exclusion patterns
IsIgnoredDir(name string) bool - .git, .hg, LICENSES, .reuse
IsIgnoredFile(name string) bool - LICENSE*, COPYING*, *.license, REUSE.toml, .spdx
CoveredFiles(root string) ([]string, error) - walk directory returning covered files
covered_test.go
- Exclusion of LICENSE, COPYING, .git, LICENSES/, .reuse/
- .license sidecar files excluded
- REUSE.toml excluded
- SPDX documents excluded
- Zero-byte files excluded
- Symlinks excluded
- Normal source files included
project.go - Project-level parsing
Ties everything together. Given a project root, find and parse all licensing info.
go
OpenProject(root string) (*Project, error) - discover REUSE.toml or dep5, scan LICENSES/
(p *Project) ReuseInfoOf(path string) (ReuseInfo, error) - resolve all sources with precedence:
- Check for .license sidecar
- Check REUSE.toml override annotations
- Extract from file header
- Check REUSE.toml closest/aggregate annotations
- Check dep5
(p *Project) AllReuseInfo() (map[string]ReuseInfo, error) - walk all covered files
project_test.go
- Uses testdata/reuse-example submodule for real-world conformance
- Uses testdata/fake_repository for edge cases
- Project with REUSE.toml
- Project with dep5
- Project with neither (header-only)
- Precedence resolution between sources
- .license sidecar overrides
glob.go - REUSE.toml glob matching
Custom glob implementation matching the REUSE.toml spec:
-
* matches everything except /
-
** matches everything including /
-
\\* is a literal asterisk
-
Forward slashes only
-
GlobMatch(pattern, path string) bool
glob_test.go
*.go matches foo.go but not dir/foo.go
**/*.go matches dir/foo.go and dir/sub/foo.go
src/** matches everything under src/
- Escaped asterisks
- Edge cases
Infrastructure files
go.mod - module github.com/git-pkgs/reuse, go 1.25.6, deps on BurntSushi/toml and git-pkgs/spdx
LICENSE - MIT, copied from vers
README.md - usage examples
.gitignore - *.test, *.prof
.gitmodules - testdata/reuse-example submodule
.github/workflows/ci.yml - matching other modules (with submodules: true)
.github/workflows/zizmor.yml - matching other modules
Build order
reuse.go - types
glob.go + tests - glob matching (no deps on other files)
extract.go + tests - SPDX tag extraction (no deps on other files)
dep5.go + tests - dep5 parsing
toml.go + tests - REUSE.toml parsing (depends on glob.go)
covered.go + tests - covered file logic
project.go + tests - ties it all together
- Add submodule, infrastructure, README
Verification
go test -v -race ./... - all tests pass
go vet ./... - clean
- Tests against reuse-example submodule confirm real-world parsing works
If you need specific details from before exiting plan mode (like exact code snippets, error messages, or content you generated), read the full transcript at: /Users/andrew/.claude/projects/-Users-andrew-code-git-pkgs/ced24506-3bc7-4de2-84fd-76adc53e1512.jsonl