# Trace: How the system identifies change types

Agent: claude-code | Model: Claude Opus 4.6 | Project: git-ai

---

## User

how does the git hook work in this? what triggers recroding the agent activity, what is recorded and does it use notes or refs?

## Assistant

## Assistant

### Tool: Task

**Args:**
```json
{
  "description": "Explore git hook mechanism",
  "prompt": "Research how the git hook works in this codebase. I need to understand:\n\n1. What git hook is used and how it's installed\n2. What triggers recording agent activity\n3. What data is recorded\n4. Whether it uses git notes, refs, or some other mechanism for storage\n\nSearch thoroughly through the codebase - look at hook installation, hook scripts, any recording/logging logic, and how data is stored in git. Look at all relevant source files.",
  "subagent_type": "Explore"
}
```

## Assistant

**Result** (success):
```
Now let me create a comprehensive research summary based on all my findings:

## Git Hook Research - Comprehensive Summary

Based on a thorough exploration of the codebase, here's how the git hook system works and what activity it records:

### 1. Git Hooks Installation & Management

**Hook Types Managed:**
- The system manages **8 core git hooks** through `/Users/andrew/code/git-ai/src/commands/git_hook_handlers.rs`:
  - `pre-commit`
  - `prepare-commit-msg`
  - `post-commit`
  - `pre-rebase`
  - `post-checkout`
  - `post-merge`
  - `pre-push`
  - `post-rewrite`
  - `reference-transaction`

**Installation Process:**
- Located in `/Users/andrew/code/git-ai/src/mdm/hook_installer.rs` and `/Users/andrew/code/git-ai/src/commands/install_hooks.rs`
- The `install-hooks` command checks IDE/coding agent configurations (Claude Code, Cursor, Windsurf, etc.)
- On **Unix**: hooks are installed as symlinks to the git-ai binary
- On **Windows**: hooks are binary copies of the git-ai executable
- The hooks path is configured via `core.hooksPath` git config
- Each IDE has its own installer implementation in `/Users/andrew/code/git-ai/src/mdm/agents/`

**Hook Forwarding Mechanism:**
- When git invokes a hook, it calls the git-ai wrapper binary
- The wrapper parses the git command and dispatches to appropriate hook handlers in `/Users/andrew/code/git-ai/src/commands/hooks/`
- Hook implementations exist for: commit, push, merge, rebase, cherry-pick, fetch, checkout, reset, stash, and switch operations

### 2. Activity Recording - What Gets Triggered

**Pre-Commit Hook** (`commit_hooks.rs`):
- Triggers `checkpoint` operation to track file states before commit
- Stores baseline authorship data (who wrote which lines)
- Captures agent/AI interactions if applicable

**Post-Commit Hook** (`commit_hooks.rs` + `post_commit.rs`):
- **MAIN RECORDING POINT**: Runs after successful commit
- Reads the working log from the parent commit
- Updates prompts/transcripts to latest versions
- Computes file-level and line-level AI attribution statistics
- Creates authorship logs
- Triggers metrics emission (background flush)

**Post-Push Hook** (`push_hooks.rs`):
- Pushes authorship notes to remote
- Triggers metrics flush and CAS (Content-Addressed Storage) upload if configured

**Other Hooks**:
- Handle state changes during rebase, cherry-pick, merge, reset, stash operations
- Maintain consistency of working logs across git operations

### 3. Data Recorded - Checkpoints & Authorship Logs

**Storage Location:**
- **In-repo storage**: `.git/ai/` directory containing:
  - `/working_logs/`: Checkpoint data keyed by commit SHA
  - `/rewrite_log`: JSONL file tracking commit rewrites (amend, rebase, etc.)
  - `/logs/`: Sentry event logs

**What's Stored in Each Checkpoint:**
Located in `/Users/andrew/code/git-ai/src/authorship/working_log.rs`:
- `kind`: Whether change is Human, AiAgent, or AiTab
- `diff`: The unified diff of changes
- `author`: Human-readable author identifier
- `entries`: Per-file working log entries containing:
  - File path
  - blob_sha: SHA256 of file content
  - `attributions`: Line-level ownership data (which agent/human wrote what)
  - `line_attributions`: Detailed line-by-line attribution tracking
- `timestamp`: UNIX timestamp of when checkpoint was created
- `transcript`: Optional AI conversation transcript (tool/model dependent)
- `agent_id`: Tool identifier (claude, cursor, codex, etc.) and model name
- `agent_metadata`: Tool-specific metadata (transcript_path, chat_session_path, etc.)
- `line_stats`: Per-checkpoint additions/deletions counts
- `git_ai_version`: Version of git-ai that created the checkpoint

**Git Notes Storage:**
Located in `/Users/andrew/code/git-ai/src/git/refs.rs` and `/sync_authorship.rs`:
- **Primary mechanism**: Git notes (`refs/notes/ai`)
- **What's stored in notes**: Complete AuthorshipLog (serialized JSON) attached to each commit
- **Note content includes**:
  - All line-by-line AI attribution data
  - Prompt session details (prompts, messages)
  - File-level and line-level statistics
  - Author information
  - Commit metadata
- **Syncing**: Notes are pushed/pulled to/from remotes via:
  - Refspec: `refs/notes/ai:refs/notes/ai`
  - Tracking refs for each remote: `refs/notes/ai-remote/<remote-name>`
- **Stash handling**: Authorship logs stored in separate notes ref `refs/notes/ai-stash` during stash operations

**Rewrite Log:**
- JSONL file storing git operation events:
  - Commit creation/amendment
  - Rebase operations (start, complete, abort)
  - Cherry-pick operations (start, complete, abort)
  - Merge/squash events
  - Reset events
  - Stash events
  - Authorship sync events
- Used to maintain attribution consistency across git history rewrites

### 4. Metrics Recording - Activity Telemetry

**Storage**: Observability logs at `~/.git-ai/internal/logs/{PID}.log`

**Event Types Recorded:**
From `/Users/andrew/code/git-ai/src/metrics/events.rs`:

1. **CommittedValues** - Recorded on successful commit with AI changes:
   - `human_additions`: Lines authored by human
   - `git_diff_added_lines`, `git_diff_deleted_lines`: Total diff stats
   - `tool_model_pairs`: Which AI tool/model was used
   - `mixed_additions`: Lines with mixed human/AI authorship
   - `ai_additions`: Lines added by AI
   - `ai_accepted`: Lines user accepted from AI
   - `total_ai_additions`, `total_ai_deletions`: Cumulative AI stats per tool
   - `time_waiting_for_ai`: Total time waiting for AI response
   - `first_checkpoint_ts`: First checkpoint timestamp for this commit
   - `commit_subject`, `commit_body`: Commit message content

2. **AgentUsageValues** - Rate-limited telemetry (min interval: 2.5 minutes per prompt):
   - Tool/model information
   - Base commit SHA
   - Prompt/session identification
   - Repository URL and branch
   - Usage event attributes

3. **CheckpointValues** - Checkpoint event data (when checkpoints are created)

4. **InstallHooksValues** - Hook installation success/failure metrics

**Metrics Pipeline:**
- Events written to observability logs immediately
- Background flush process (`flush-logs` command) uploads batches (max 250 events per envelope)
- Flushes debounced to max once per 60 seconds to avoid storms
- Uploaded to ClickHouse or stored in SQLite for later upload
- Also includes Sentry error/performance events

### 5. No Direct Persistent Metrics Database During Hooks

**Important**: During hook execution, metrics are NOT directly persisted to a database. Instead:
- Events are serialized to observability logs
- Background `flush-logs` process (spawned as subprocess) handles upload
- This keeps hooks fast and non-blocking
- Metrics DB (`MetricsDatabase`) is only consulted for rate-limiting agent usage events

### 6. Configuration & Prompt Storage Modes

The system supports 3 prompt storage modes (from `/post_commit.rs`):
- **Local**: Prompts stored in SQLite database only (stripped from git notes)
- **Notes**: Prompts stored in git notes with secrets redacted
- **Default**: Attempt CAS upload, never keep full messages in git notes, redact secrets

### Key Files Reference

| Component | Location |
|-----------|----------|
| Hook installation | `/src/mdm/hook_installer.rs`, `/src/commands/install_hooks.rs` |
| Hook handlers | `/src/commands/git_hook_handlers.rs` |
| Per-command hooks | `/src/commands/hooks/*.rs` (commit_hooks, push_hooks, etc.) |
| Checkpoint data | `/src/authorship/working_log.rs` |
| Post-commit processing | `/src/authorship/post_commit.rs` |
| Authorship notes | `/src/git/refs.rs`, `/src/git/sync_authorship.rs` |
| Metrics recording | `/src/metrics/events.rs`, `/src/metrics/mod.rs` |
| Observability/logging | `/src/observability/mod.rs` |
| Storage management | `/src/git/repo_storage.rs` |
| Rewrite tracking | `/src/git/rewrite_log.rs` |
agentId: afdd559a78627b939 (for resuming to continue this agent's work if needed)
<usage>total_tokens: 74193
tool_uses: 34
duration_ms: 65602</usage>
```

## Assistant

## Assistant

The system installs itself as the git hooks handler by setting `core.hooksPath` to point at the git-ai binary. It manages 8+ hooks (pre-commit, post-commit, pre-push, post-rewrite, etc.) and when git invokes any of them, the binary dispatches to the appropriate handler.

## What triggers recording

The main recording points:

- **Pre-commit**: creates a "checkpoint" capturing the file state and authorship baseline before the commit happens. Each checkpoint records the diff, who authored it (human vs AI agent vs AI tab completion), timestamps, and optionally the conversation transcript.

- **Post-commit**: the heavy lifting. It reads the working log from the parent commit, computes file-level and line-level AI attribution stats, builds an authorship log, and writes it out. It also triggers a background metrics flush.

- **Post-push**: pushes the authorship notes to the remote and flushes metrics.

- **Post-rewrite / post-merge / post-checkout / etc.**: maintain attribution consistency when history gets rewritten (rebase, amend, cherry-pick).

## What gets recorded

Each checkpoint in `.git/ai/working_logs/` stores:

- The kind of change (Human, AiAgent, AiTab)
- The unified diff
- Line-by-line attribution (which agent/human wrote which lines)
- Agent identity (tool like claude/cursor/codex + model name)
- Agent metadata (transcript paths, session info)
- Timestamp, line stats (additions/deletions), git-ai version

A rewrite log (`.git/ai/rewrite_log`, JSONL) tracks commit rewrites so attribution survives rebases and amends.

## Storage: git notes

Yes, it uses **git notes** under `refs/notes/ai`. The complete `AuthorshipLog` (serialized JSON) gets attached to each commit as a note. This includes all the line-level attribution data, prompt sessions (with secrets redacted depending on config), file stats, and author info.

Syncing to remotes uses the refspec `refs/notes/ai:refs/notes/ai`, with tracking refs at `refs/notes/ai-remote/<remote-name>`. Stash operations use a separate `refs/notes/ai-stash` ref.

There are three prompt storage modes configurable in post-commit: **local** (SQLite only, stripped from notes), **notes** (stored in git notes with secrets redacted), and a default mode that attempts a CAS upload and never keeps full messages in notes.

## User

how does it work out the kind of change?

## Assistant

### Tool: Task

**Args:**
```json
{
  "description": "Find checkpoint kind detection",
  "prompt": "Research how this codebase determines the \"kind\" of change - whether a change is Human, AiAgent, or AiTab. I need to understand the detection mechanism: how does it figure out if a change came from a human, an AI agent (like Claude Code, Cursor, etc.), or AI tab completion?\n\nLook at:\n1. The enum/type definition for the kind (Human/AiAgent/AiTab)\n2. The checkpoint creation logic that sets the kind\n3. Any process detection, file watching, or other mechanism used to determine what made the change\n4. How it detects which specific AI tool is active\n\nSearch thoroughly - look at checkpoint creation, agent detection, process inspection, file system watchers, etc.",
  "subagent_type": "Explore"
}
```

## Assistant

**Result** (success):
```
Perfect! Now I have a comprehensive understanding. Let me create a detailed report:

## Summary: How Git-AI Determines Change Kind (Human/AiAgent/AiTab)

I've thoroughly analyzed the codebase and can now explain the complete detection mechanism for determining whether a change is Human, AiAgent, or AiTab.

### 1. **Enum/Type Definition**

Located in `/Users/andrew/code/git-ai/src/authorship/working_log.rs` (lines 49-54):

```rust
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum CheckpointKind {
    Human,
    AiAgent,
    AiTab,
}
```

Each checkpoint stores this kind in the working log JSON, with serialization as `"human"`, `"ai_agent"`, or `"ai_tab"`.

### 2. **Detection Mechanism Overview**

The "kind" is determined **NOT by automatic process detection**, but by **explicit hook payloads** sent by the AI tool/IDE when it triggers a checkpoint. The flow is:

1. An IDE extension (Claude Code, Cursor, Copilot, etc.) sends a structured JSON hook payload
2. The payload is passed to the appropriate **AgentCheckpointPreset** handler
3. The preset inspects a `hook_event_name` field in the payload
4. Based on `hook_event_name` value, it returns `CheckpointKind::Human` or `CheckpointKind::AiAgent` (or `AiTab`)

### 3. **Hook Event Name Dispatch**

Located in `/Users/andrew/code/git-ai/src/commands/git_ai_handlers.rs` (lines 315-530), the `handle_checkpoint()` function accepts a preset argument and dispatches to the appropriate handler:

**Supported presets:**
- `claude` → ClaudePreset
- `codex` → CodexPreset
- `gemini` → GeminiPreset
- `continue-cli` → ContinueCliPreset
- `cursor` → CursorPreset
- `github-copilot` → GithubCopilotPreset
- `ai_tab` → AiTabPreset
- `agent-v1` → AgentV1Preset
- `droid` → DroidPreset
- `opencode` → OpenCodePreset

### 4. **Human vs AI Detection per Preset**

Each preset examines the `hook_event_name` field in the JSON payload and returns `CheckpointKind::Human` before the AI edit, or `CheckpointKind::AiAgent`/`AiTab` after the AI edit.

#### **Claude Code Preset** (`/Users/andrew/code/git-ai/src/commands/checkpoint_agent/agent_presets.rs`, lines 42-151):

- **Event name**: `hook_event_name` from JSON
- **Human checkpoint trigger**: `"PreToolUse"` → returns `CheckpointKind::Human` (line 126-137)
- **AI checkpoint trigger**: Any other event → returns `CheckpointKind::AiAgent` (line 140-150)
- **Transcript extraction**: Reads Claude JSONL file at `transcript_path`, parses Claude Code messages
- **Agent ID**: UUID extracted from transcript filename
- **Model**: Extracted from assistant messages in JSONL

#### **Cursor Preset** (`/Users/andrew/code/git-ai/src/commands/checkpoint_agent/agent_presets.rs`, lines 1167-1340):

- **Event name**: `hook_event_name` field
- **Human checkpoint trigger**: `"beforeSubmitPrompt"` → returns `CheckpointKind::Human` (line 1248-1263)
- **AI checkpoint trigger**: `"afterFileEdit"` → returns `CheckpointKind::AiAgent` (line 1333)
- **Transcript extraction**: Queries Cursor's SQLite database (`globalStorage/state.vscdb`), extracts from composer data
- **Agent ID**: conversation_id from hook payload
- **Model**: Extracted from hook input directly

#### **AiTab Preset** (`/Users/andrew/code/git-ai/src/commands/checkpoint_agent/agent_presets.rs`, lines 3303-3398):

This is the key differentiator for tab completion vs agent-based tools:

- **Event name**: `hook_event_name` field
- **Human checkpoint trigger**: `"before_edit"` → returns `CheckpointKind::Human` (line 3374-3384)
- **AI checkpoint trigger**: `"after_edit"` → returns `CheckpointKind::AiTab` (line 3387-3397) ← **Note: AiTab, not AiAgent**
- **Transcript**: None (tab completions don't have conversation history)
- **Agent ID**: Constructed as `"ai_tab-{completion_id}"` or `"ai_tab-{timestamp}"` (line 3365-3372)
- **Tool**: Comes directly from hook payload (e.g., "github-copilot-tab", "copilot-lab", etc.)
- **Model**: Comes directly from hook payload

### 5. **Checkpoint Creation Logic**

In `/Users/andrew/code/git-ai/src/commands/checkpoint.rs` (lines 115-534), the `run()` function:

1. Accepts `kind: CheckpointKind` as a parameter (line 118)
2. Creates a `Checkpoint` struct with that kind (line 393-398)
3. If not Human checkpoint and `agent_run_result` is available, also captures:
   - `transcript`: The AI conversation history (line 407)
   - `agent_id`: Tool, session ID, and model (line 408)
   - `agent_metadata`: Tool-specific metadata like transcript paths (line 409)

### 6. **What Determines Each Kind**

**`Human`**: 
- Returned when `hook_event_name` is `"PreToolUse"` (Claude), `"beforeSubmitPrompt"` (Cursor), or `"before_edit"` (AiTab)
- No transcript or AI metadata
- Captured BEFORE the AI tool starts its work
- Uses `will_edit_filepaths` to narrow scope

**`AiAgent`**:
- Returned when the hook_event_name indicates the AI completed its work (e.g., `"afterFileEdit"` for Cursor)
- Includes full transcript and conversation history
- Has `agent_id` with tool name, session ID, and model
- Used for agent-based tools like Claude Code, Cursor, GitHub Copilot (in composer mode)
- Uses `edited_filepaths` to track what was actually changed

**`AiTab`**:
- Returned when `hook_event_name` is `"after_edit"` from AiTab preset specifically
- No transcript (tab completion is not a conversation)
- Has agent_id but with `tool` field like "github-copilot-tab" or other tab completion engine
- Agent ID formatted as `"ai_tab-{completion_id}"`
- Used for inline tab completion suggestions (GitHub Copilot tab, Copilot Lab, etc.)

### 7. **Key Detection Insight**

The system **does NOT** auto-detect by:
- Process inspection or PID checking
- Environment variables (though some presets use them for testing, like `CODEX_HOME`)
- IDE detection from `vscode.env` (that's only done in the VSCode extension itself)
- File system watchers

Instead, it relies on **explicit hook integration**: each IDE extension must call `git-ai checkpoint <preset> --hook-input <json>` with the appropriate preset name and a structured JSON payload containing the hook event name.

### 8. **AgentId and Metadata Tracking**

From `/Users/andrew/code/git-ai/src/authorship/working_log.rs` (lines 42-47):

```rust
pub struct AgentId {
    pub tool: String,        // e.g., "cursor", "windsurf", "github-copilot-tab"
    pub id: String,          // Unique session/completion ID
    pub model: String,       // e.g., "gpt-4o", "claude-opus"
}
```

The tool name itself distinguishes between AI agents ("cursor", "claude") and tab completions ("github-copilot-tab", "codex").

This architecture makes the system **extensible** and **tool-agnostic**: any IDE extension can integrate by sending the correct hook payload with the appropriate event names.
agentId: a81300e54dd490295 (for resuming to continue this agent's work if needed)
<usage>total_tokens: 87382
tool_uses: 25
duration_ms: 47496</usage>
```

## Assistant

It's not automatic detection. The IDE extensions tell git-ai what kind of change is happening by calling `git-ai checkpoint <preset> --hook-input <json>`.

Each supported tool (Claude Code, Cursor, Copilot, etc.) has a **preset** that interprets a `hook_event_name` field in the JSON payload. The pattern is the same across all presets: two checkpoints bracket each AI edit.

**Before the AI edits** — the extension fires a "before" event (`PreToolUse` for Claude, `beforeSubmitPrompt` for Cursor, `before_edit` for tab completions). The preset returns `CheckpointKind::Human`. This captures the baseline state of files so git-ai knows what the human wrote before the AI touched anything.

**After the AI edits** — the extension fires an "after" event. The preset returns either `CheckpointKind::AiAgent` (for conversational tools like Claude Code, Cursor composer) or `CheckpointKind::AiTab` (for inline tab completions like Copilot tab).

The difference between AiAgent and AiTab is just which preset handles it. The `ai_tab` preset always returns `AiTab` and carries no transcript (tab completions aren't conversations). Agent presets return `AiAgent` and include the full conversation transcript, model name, and session ID.

So the before/after pair is what lets git-ai diff the two snapshots and attribute the lines in between to the right author. Lines that changed between a Human checkpoint and the subsequent AiAgent checkpoint were written by the AI. Lines changed outside that window are human-authored.

Each preset also extracts tool-specific metadata differently — Claude's reads the JSONL transcript file, Cursor's queries its SQLite database, tab completions just pass through the completion ID. But the kind determination is always just a string match on `hook_event_name`.
