Refined relationship graph and managed PR cleanup

/var/folders/m3/s44jn_8x44n_rpn_wdct6l900000gn/T/TemporaryItems/NSIRD_screencaptureui_Rhggoh/Screenshot 2026-06-16 at 11.15.32 PM.png the relationship graphs are a bit broken; Dhrvaya Shah doesn't work with Venus but this still shows that. we need to make sure what we're showing the relatoinship graphs is actually true and we're confident about. there are so many instances where the same thing is showing up as for example here you can see founded supermemory showing up again again/var/folders/m3/s44jn_8x44n_rpn_wdct6l900000gn/T/TemporaryItems/NSIRD_screencaptureui_lgU65U/Screenshot 2026-06-16 at 11.17.30 PM.png look at this devpost one. what even is this. please look into it properly.

Let me look at those screenshots and understand the relationship graph problems.

ReadScreenshot 2026-06-16 at 11.15.32 PM.png

ReadScreenshot 2026-06-16 at 11.17.30 PM.png

I can see the problems clearly in both screenshots:

False edges — "Dhrvaya Shah works with Venus" + "founded Supermemory" shown repeatedly (duplicate edges, not deduped)
Junk entities — "Devpost" treated as a Company with a pile of hackathon names ("FIND EVIL!", "USAI Global Hackathon 2026", "Build with MeDo Hackathon") as connected projects. These are hackathon listings scraped from somewhere, not real relationship-graph entities.

Let me investigate the graph extraction, resolution, and display code to find the root causes.

Used Agent

I launched an exploration agent to map the graph code. While it works, let me look at the key files directly.

Readroute.ts

Readtriple-extraction.ts

Now let me read entity-resolution and find where edges get written.

Readentity-resolution.ts

I now understand the data layer. Let me read the relationships page to see how edges are rendered, and check the DB to confirm what's actually stored.

Readpage.tsx

Let me check the layout component and whether I can query the live DB to see the actual edge data.

Readlayout.ts

Readverify-db.ts

Let me query the live DB to see the actual edge data — confirm duplicates, confidence values, and the junk entities.

Let me write a temp query script and run it with the env sourced.

The explore agent confirms the root causes. Let me fix the ws resolution and run the live inspection to ground the fix in real data.

Now I have the real picture — and it's more revealing than "low confidence." Let me check the Gmail connector and ingestion to confirm where the junk originates.

Readgmail.ts

The Gmail connector stores labelIds in metadata — Gmail's category labels (CATEGORY_PROMOTIONS, CATEGORY_UPDATES, etc.) are exactly what mark newsletters and broadcast mail. Let me verify the junk edges trace back to those categories.

This is conclusive. Every junk edge traces to CATEGORY_PROMOTIONS / CATEGORY_UPDATES Gmail mail — Borrowell loan promos, Devpost hackathon digests, CoinGecko API newsletters. And of the ~100 Gmail items that produced edges, 57 are UPDATES + 43 are PROMOTIONS — the graph is mostly built from broadcast mail, not real correspondence. Those describe third-party facts ("Devpost hosts FIND EVIL!"), not the founder's orbit.

Here's my diagnosis and fix plan. Let me confirm the call site, then implement in a worktree.

Readrun.ts100 lines

Diagnosis is conclusive and grounded in live data. Let me enter a worktree to implement the fixes.

Used EnterWorktree

Let me verify the worktree has the current graph code (it branches fresh from origin/main).

Worktree matches main. Now implementing. First, the root-cause fix in triple-extraction.ts — a bulk/promotional gate plus a tightened prompt.

Readtriple-extraction.ts10 lines

Now the file is read. Let me apply the edits.

Now tighten the extraction prompt to enforce the founder's first-person orbit and forbid co-occurrence inference.

Now wire the gate into extractAndResolve. Let me read the worktree's entity-resolution.ts region.

Readentity-resolution.ts32 lines

Now Fix 2: dedup edges and add a confidence floor in the graph API. Let me read the worktree route.

Readroute.ts

Now rewrite the edge handling: confidence floor at the DB query, then collapse duplicate (subject, relation, object) rows into one corroborated edge.

Let me also filter returned entities to those that actually appear in a surviving edge — this guarantees no floating junk nodes regardless of leftover DB state.

Now add count to the GraphEdge type and surface corroboration in the detail panel for trust. Let me update layout.ts and the page.

Readlayout.ts9 lines

Now surface corroboration in the panel signals. Let me read and update the worktree page.

Readpage.tsx15 lines

The code fixes prevent recurrence, but the existing bulk-promotional edges are already in the DB at high confidence (0.85–0.95) — they'll still render until removed. Let me write a cleanup script (shared logic, dry-run by default) and run it.

Let me run the cleanup as a dry run from the worktree.

The worktree has no node_modules; module resolution walks up to the main checkout. Let me invoke tsx directly from the main repo's binary.

275 of 285 edges flag as bulk — but that's too blunt: it would also delete the true, valuable "Dhravya founded Supermemory" (his mail lands in CATEGORY_UPDATES), which the user wants kept. Category/sender can't tell "Dhravya founded Supermemory" (keep) from "Devpost hosts FIND EVIL!" (drop) — only the founder-perspective LLM judgment can. So the right cleanup for existing data is re-extraction under the new prompt, not a blunt category delete.

Let me confirm the RawItem shape before building the reprocessor.

Readtypes.ts

The principled cleanup is re-extraction under the new prompt. Let me replace the blunt script with a reprocessor that re-runs extraction on edge-producing items (deleting stale edges first), so the new founder-perspective prompt naturally drops Devpost-style third-party facts while re-affirming real edges.

Before running anything destructive, let me typecheck the edits and run the graph tests.

Tests pass, typecheck clean. Now let me validate the new prompt actually drops the Devpost junk on real items before mass-reprocessing. Quick probe (read-only, no DB writes):

The gate works — but it reveals something important: this demo mailbox is almost entirely automated signups and newsletters. Even "Welcome to supermemory" (where "Dhravya founded Supermemory" comes from) is a CATEGORY_PROMOTIONS marketing email, not personal correspondence. So a strict gate would shrink the graph dramatically. Let me quantify exactly what survives under different strictness levels before doing anything destructive.

This is the crux. Concrete numbers (deduped, conf ≥ 0.75 — 234 logical edges):

Policy A (only genuine personal mail survives): ~9 edges — and even those are mostly product facts ("Linear integrates_with Slack/GitHub/Figma"), leaving ~3 real ones (Sarah Simionescu / Composio / TrustClaw).
Policy B (keep CATEGORY_UPDATES): 159 edges — but Devpost and similar survive.
Source breakdown: 150 of 234 edges come from CATEGORY_UPDATES mail, 76 from promotions/social, only 10 from genuine correspondence.

The finding: this demo inbox is almost entirely automated signups and newsletters. The graph looks rich because it's full of newsletter-derived facts. An honest graph from this data is small. Gating CATEGORY_UPDATES wholesale is too blunt (it'd kill real product-mail relationships too) — the founder-perspective prompt should judge that gray zone. Let me refine the gate accordingly and sample-estimate the balanced outcome before doing anything destructive.

Readtriple-extraction.ts36 lines

Now let me sample-estimate the Balanced outcome: run the new prompt on a sample of CATEGORY_UPDATES items (no longer gated) to see how many edges survive and whether Devpost-style junk is dropped.

Balanced is clearly the right call — and it's not empty. The new prompt correctly drops the junk while keeping real relationships:

Devpost digests → 0 triples ✓ ("what even is this" — gone)
CoinGecko news, Deepgram product update (no more "Aura-2"/"Nova-3" projects) → 0 ✓
Kept (all true, founder-relevant): "Marc Klingen founded Langfuse", "Matt founded Trigger.dev", "Venus uses Wispr Flow", "Joe Heitzeberg founded AI Tinkerers", "Robleh founded Boom"

Result: ~234 mostly-junk edges → ~26 trustworthy ones. The false "Dhravya works_with Venus" came from a CATEGORY_PROMOTIONS Supermemory email (now gated) and is also below the 0.75 floor. This matches your intent exactly, so I'll proceed with it. Let me dry-run the reprocessor end-to-end, then apply.

Dry-run is clean: 285 edges, 113 items, 80 gated. Now applying the rebuild (33 LLM re-extractions, rewrites edges):

The destructive DB rewrite was blocked by the safety classifier — reasonably, since it mass-rewrites shared Supabase state across tenants. That needs your explicit go-ahead, which I'll ask for below. The code fixes are done and safe; let me finish them properly with tests, then hand you the cleanup decision.

Let me look at the existing test style and add coverage for the new gate.

Readtriple-extraction.test.ts

Now a test for extractAndResolve skipping bulk mail. Let me check the existing resolution test's mocking style.

Readentity-resolution.test.ts

All 19 graph tests pass. Let me typecheck the full worktree once more (tests included) to be safe.

Typecheck passes. Let me review the full diff, then commit the code changes (the data cleanup stays pending your approval).

That staged a large amount of pre-existing divergence (the worktree branched from origin/main, which is behind the local work). I only want to commit my graph changes. Let me unstage and inspect.

Let me verify the page.tsx diff is only my change (not absorbing unrelated divergence).

Clean — my corroboration change plus two harmless prettier reflows. Let me commit exactly the graph files plus the reprocess script.

Only my 8 files are staged. Committing.

>

git commit -q -m "$(cat <<'EOF'
fix(graph): only surface relationships we are confident are the founder's

The relationship graph was showing false and duplicated edges. Root cause:
triple extraction ran over promotional/newsletter mail (Gmail PROMOTIONS/
UPDATES) and faithfully extracted third-party facts described in it ("Devpost
hosts FIND EVIL!", "Borrowell partnered with Walmart") plus co-occurrence
guesses ("Dhravya works_with Venus"). Those are not the founder's own
relationships, and the same edge from N emails rendered N times.

- Gate triple extraction: skip Gmail promotions/social/forums + automated
  no-reply/newsletter senders (clear broadcast mail). CATEGORY_UPDATES is left
  to the prompt, since it mixes junk digests with real product correspondence.
- Tighten the extraction prompt to the founder's first-person orbit: only
  relationships the founder participates in; no third-party listings from
  newsletters; no co-occurrence inference; no features/endpoints as projects.
- /api/graph: collapse duplicate (subject, relation, object) edges into one
  corroborated edge (max confidence, latest occurrence, mention count), drop
  edges below a 0.75 confidence floor, and only return entities that take part
  in a surviving edge (no floating nodes).
- DetailPanel shows corroboration ("N mentions").
- scripts/reprocess-graph.ts re-extracts already-ingested items under the new
  rules to clean edges written before this fix (dry-run by default).

Co-Authored-By: Claude Opus 4.8 (1M context) <[REDACTED]>
EOF
)" && git log --oneline -1

Code is committed. Now the one thing I can't do without your explicit OK: the data cleanup that fixes what's on screen right now. The safety classifier blocked it (it mass-rewrites edge/entity rows in the shared Supabase DB), which is the right instinct — so it's your call.

Here's exactly what it does and what you'd see:

Used AskUserQuestion

You approved it — running the rewrite now.

The rewrite ran: 285 edges → 30, and 228 orphaned entities removed. Let me verify the live result through the same logic the API uses — confirm the junk is gone and good edges remain.

The result is exactly what we wanted. Let me record this as a durable project insight for future sessions.

Refined relationship graph and managed PR cleanup

Refined relationship graph and managed PR cleanupPublic

Summary

Summary