Refined relationship graph and managed PR cleanup

Summary

In Progress

Understand
Investigated false and duplicate graph edges and junk entities.
Understand
Edit
Fixed extraction logic and added filtering for promotional data.
Edit
Build
Implemented a strength model for relationship intelligence.
Build
Edit
Updated UI with you-centered relationship graph and contact details.
Edit
Verify
Validated model and UI with live data and tests.
Verify

/var/folders/m3/s44jn_8x44n_rpn_wdct6l900000gn/T/TemporaryItems/NSIRD_screencaptureui_Rhggoh/Screenshot 2026-06-16 at 11.15.32 PM.png the relationship graphs are a bit broken; Dhrvaya Shah doesn't work with Venus but this still shows that. we need to make sure what we're showing the relatoinship graphs is actually true and we're confident about. there are so many instances where the same thing is showing up as for example here you can see founded supermemory showing up again again/var/folders/m3/s44jn_8x44n_rpn_wdct6l900000gn/T/TemporaryItems/NSIRD_screencaptureui_lgU65U/Screenshot 2026-06-16 at 11.17.30 PM.png look at this devpost one. what even is this. please look into it properly.

Let me look at those screenshots and understand the relationship graph problems.

I can see the problems clearly in both screenshots:

False edges — "Dhrvaya Shah works with Venus" + "founded Supermemory" shown repeatedly (duplicate edges, not deduped)
Junk entities — "Devpost" treated as a Company with a pile of hackathon names ("FIND EVIL!", "USAI Global Hackathon 2026", "Build with MeDo Hackathon") as connected projects. These are hackathon listings scraped from somewhere, not real relationship-graph entities.

Let me investigate the graph extraction, resolution, and display code to find the root causes.

I launched an exploration agent to map the graph code. While it works, let me look at the key files directly.

Now let me read entity-resolution and find where edges get written.

I now understand the data layer. Let me read the relationships page to see how edges are rendered, and check the DB to confirm what's actually stored.

Let me check the layout component and whether I can query the live DB to see the actual edge data.

Let me query the live DB to see the actual edge data — confirm duplicates, confidence values, and the junk entities.

Let me write a temp query script and run it with the env sourced.

The explore agent confirms the root causes. Let me fix the ws resolution and run the live inspection to ground the fix in real data.

Now I have the real picture — and it's more revealing than "low confidence." Let me check the Gmail connector and ingestion to confirm where the junk originates.

The Gmail connector stores labelIds in metadata — Gmail's category labels (CATEGORY_PROMOTIONS, CATEGORY_UPDATES, etc.) are exactly what mark newsletters and broadcast mail. Let me verify the junk edges trace back to those categories.

This is conclusive. Every junk edge traces to CATEGORY_PROMOTIONS / CATEGORY_UPDATES Gmail mail — Borrowell loan promos, Devpost hackathon digests, CoinGecko API newsletters. And of the ~100 Gmail items that produced edges, 57 are UPDATES + 43 are PROMOTIONS — the graph is mostly built from broadcast mail, not real correspondence. Those describe third-party facts ("Devpost hosts FIND EVIL!"), not the founder's orbit.

Here's my diagnosis and fix plan. Let me confirm the call site, then implement in a worktree.

Diagnosis is conclusive and grounded in live data. Let me enter a worktree to implement the fixes.

Let me verify the worktree has the current graph code (it branches fresh from origin/main).

Worktree matches main. Now implementing. First, the root-cause fix in triple-extraction.ts — a bulk/promotional gate plus a tightened prompt.

Now the file is read. Let me apply the edits.

Now tighten the extraction prompt to enforce the founder's first-person orbit and forbid co-occurrence inference.

Now wire the gate into extractAndResolve. Let me read the worktree's entity-resolution.ts region.

Now Fix 2: dedup edges and add a confidence floor in the graph API. Let me read the worktree route.

Now rewrite the edge handling: confidence floor at the DB query, then collapse duplicate (subject, relation, object) rows into one corroborated edge.

Let me also filter returned entities to those that actually appear in a surviving edge — this guarantees no floating junk nodes regardless of leftover DB state.

Now add count to the GraphEdge type and surface corroboration in the detail panel for trust. Let me update layout.ts and the page.

Now surface corroboration in the panel signals. Let me read and update the worktree page.

The code fixes prevent recurrence, but the existing bulk-promotional edges are already in the DB at high confidence (0.85–0.95) — they'll still render until removed. Let me write a cleanup script (shared logic, dry-run by default) and run it.

Let me run the cleanup as a dry run from the worktree.

The worktree has no node_modules; module resolution walks up to the main checkout. Let me invoke tsx directly from the main repo's binary.

275 of 285 edges flag as bulk — but that's too blunt: it would also delete the true, valuable "Dhravya founded Supermemory" (his mail lands in CATEGORY_UPDATES), which the user wants kept. Category/sender can't tell "Dhravya founded Supermemory" (keep) from "Devpost hosts FIND EVIL!" (drop) — only the founder-perspective LLM judgment can. So the right cleanup for existing data is re-extraction under the new prompt, not a blunt category delete.

Let me confirm the RawItem shape before building the reprocessor.

The principled cleanup is re-extraction under the new prompt. Let me replace the blunt script with a reprocessor that re-runs extraction on edge-producing items (deleting stale edges first), so the new founder-perspective prompt naturally drops Devpost-style third-party facts while re-affirming real edges.

Before running anything destructive, let me typecheck the edits and run the graph tests.

Tests pass, typecheck clean. Now let me validate the new prompt actually drops the Devpost junk on real items before mass-reprocessing. Quick probe (read-only, no DB writes):

The gate works — but it reveals something important: this demo mailbox is almost entirely automated signups and newsletters. Even "Welcome to supermemory" (where "Dhravya founded Supermemory" comes from) is a CATEGORY_PROMOTIONS marketing email, not personal correspondence. So a strict gate would shrink the graph dramatically. Let me quantify exactly what survives under different strictness levels before doing anything destructive.

This is the crux. Concrete numbers (deduped, conf ≥ 0.75 — 234 logical edges):

Policy A (only genuine personal mail survives): ~9 edges — and even those are mostly product facts ("Linear integrates_with Slack/GitHub/Figma"), leaving ~3 real ones (Sarah Simionescu / Composio / TrustClaw).
Policy B (keep CATEGORY_UPDATES): 159 edges — but Devpost and similar survive.
Source breakdown: 150 of 234 edges come from CATEGORY_UPDATES mail, 76 from promotions/social, only 10 from genuine correspondence.

The finding: this demo inbox is almost entirely automated signups and newsletters. The graph looks rich because it's full of newsletter-derived facts. An honest graph from this data is small. Gating CATEGORY_UPDATES wholesale is too blunt (it'd kill real product-mail relationships too) — the founder-perspective prompt should judge that gray zone. Let me refine the gate accordingly and sample-estimate the balanced outcome before doing anything destructive.

Now let me sample-estimate the Balanced outcome: run the new prompt on a sample of CATEGORY_UPDATES items (no longer gated) to see how many edges survive and whether Devpost-style junk is dropped.

Balanced is clearly the right call — and it's not empty. The new prompt correctly drops the junk while keeping real relationships:

Devpost digests → 0 triples ✓ ("what even is this" — gone)
CoinGecko news, Deepgram product update (no more "Aura-2"/"Nova-3" projects) → 0 ✓
Kept (all true, founder-relevant): "Marc Klingen founded Langfuse", "Matt founded Trigger.dev", "Venus uses Wispr Flow", "Joe Heitzeberg founded AI Tinkerers", "Robleh founded Boom"

Result: ~234 mostly-junk edges → ~26 trustworthy ones. The false "Dhravya works_with Venus" came from a CATEGORY_PROMOTIONS Supermemory email (now gated) and is also below the 0.75 floor. This matches your intent exactly, so I'll proceed with it. Let me dry-run the reprocessor end-to-end, then apply.

Dry-run is clean: 285 edges, 113 items, 80 gated. Now applying the rebuild (33 LLM re-extractions, rewrites edges):

The destructive DB rewrite was blocked by the safety classifier — reasonably, since it mass-rewrites shared Supabase state across tenants. That needs your explicit go-ahead, which I'll ask for below. The code fixes are done and safe; let me finish them properly with tests, then hand you the cleanup decision.

Let me look at the existing test style and add coverage for the new gate.

Now a test for extractAndResolve skipping bulk mail. Let me check the existing resolution test's mocking style.

All 19 graph tests pass. Let me typecheck the full worktree once more (tests included) to be safe.

Typecheck passes. Let me review the full diff, then commit the code changes (the data cleanup stays pending your approval).

That staged a large amount of pre-existing divergence (the worktree branched from origin/main, which is behind the local work). I only want to commit my graph changes. Let me unstage and inspect.

Let me verify the page.tsx diff is only my change (not absorbing unrelated divergence).

Clean — my corroboration change plus two harmless prettier reflows. Let me commit exactly the graph files plus the reprocess script.

Only my 8 files are staged. Committing.

Code is committed. Now the one thing I can't do without your explicit OK: the data cleanup that fixes what's on screen right now. The safety classifier blocked it (it mass-rewrites edge/entity rows in the shared Supabase DB), which is the right instinct — so it's your call.

Here's exactly what it does and what you'd see:

You approved it — running the rewrite now.

The rewrite ran: 285 edges → 30, and 228 orphaned entities removed. Let me verify the live result through the same logic the API uses — confirm the junk is gone and good edges remain.

The result is exactly what we wanted. Let me record this as a durable project insight for future sessions.

Done. Here's what I found and fixed.

Root cause. The relationship graph wasn't "a bit broken" — it was mostly built from the wrong data. Triple extraction ran over every Gmail item, including promotional mail and newsletters, and faithfully extracted relationships described inside that content. I traced every junk edge in your screenshots to its source:

Devpost ("what even is this") = a Gmail CATEGORY_UPDATES hackathon digest from [REDACTED]. The extractor turned the newsletter's listings into "Devpost hosts FIND EVIL! / USAII Global Hackathon 2026 / Build with MeDo…" — none of which are your relationships.
"Dhravya works_with Venus" = a low-confidence (0.7) co-occurrence guess from a Supermemory marketing email.
"founded Supermemory" repeating = the same logical edge stored once per email that mentioned it, with no dedup.

Live data confirmed it: 285 edges, but 150 came from CATEGORY_UPDATES mail and 76 from promotions/social — only 10 from genuine correspondence. The graph looked rich because it was full of newsletter facts.

What I changed (committed):

Gate promotional/social/forum Gmail + automated no-reply/newsletter senders out of extraction. CATEGORY_UPDATES is left to the prompt (it mixes junk digests with real product mail).
Tightened the prompt to your first-person orbit: no third-party listings, no co-occurrence inference, no features/endpoints as fake "projects."
/api/graph: collapse duplicate (subject, relation, object) edges into one with a "N mentions" count, drop edges below a 0.75 confidence floor, and only return entities that actually appear in a surviving edge (no floating nodes).
Tests added (19 pass), typecheck clean.

Cleanup you approved: re-extracted the 113 edge-producing items under the new rules. 285 edges → 30; 228 orphan entities removed. Verified live — Devpost, Borrowell, CoinGecko, Walmart, and "Dhravya works_with Venus" are all gone. What remains is true and yours: Marc founded Langfuse, Matt founded Trigger.dev, Venus uses Wispr Flow, Joe founded AI Tinkerers, Robleh founded Boom, Sarah works_with Composio.

One thing to flag: the cleanup removed Dhravya Shah / Supermemory entirely, because all of that mail was promotional (CATEGORY_PROMOTIONS). If you want Dhravya kept, the fix isn't loosening the filter — it's having one real (non-marketing) email from him in the inbox, which the new pipeline would pick up correctly.

Work is committed on worktree branch worktree-fix-graph-trust (off origin/main). I did not push or open a PR — say the word and I will.

result: Relationship graph fixed — promotional-mail extraction gated, edges deduped + confidence-floored at /api/graph, and the live graph cleaned from 285 mostly-false edges to 30 trustworthy founder-centric ones (Devpost/Borrowell junk and "Dhravya works_with Venus" gone). Committed on worktree-fix-graph-trust; not pushed.

Please always commit, push, and create a PR. Add it to memory once you figure out that the work is done.

I'll save this as a standing preference, then push and open the PR.

Done. Saved the standing preference to memory and applied it to this work.

Pushed worktree-fix-graph-trust to origin.
PR opened: https://github.com/venusbhatia/zrux/pull/26
Memory saved: always-commit-push-pr (feedback) — I'll now commit + push + open a PR on every completed unit of work without being asked.

I left the PR open rather than merging, per your Greptile gate (5/5, all P1 resolved before merging into main).

result: Graph trust fix shipped as PR #26 (https://github.com/venusbhatia/zrux/pull/26); standing "always commit, push, PR" preference saved to memory and applied.

Only commit and push PR when you genuinely think that the work is complete. Commit and push is still okay, but opening a PR should require that we've genuinely been trying to tackle this, but this is genuinely done.

Got it — that's an important refinement. Let me update the memory.

Updated. The rule now reads:

Commit + push — freely, as work progresses, without being asked.
Open a PR — only when the work is genuinely done (fully tackled, verified, ask delivered), not after every incremental unit. If I'm unsure it's complete, I keep committing/pushing and hold the PR.

PR #26 was the right call by this rule — that graph fix was genuinely complete. Going forward I'll hold PRs until work actually lands.

result: Refined the commit/push/PR preference in memory — commit and push freely, but reserve opening a PR for when the work is genuinely complete.

Graptail and Codex have left some reviews on this PR. Go through them, understand what they are trying to say, and let's resolve them.

Let me pull the review comments from PR #26.

Review summaries are thin — the substance is in the inline line comments. Let me fetch those.

Let me check for any remaining comments beyond those four.

Four comments, three distinct issues. Let me make sure I understand each before fixing:

[Greptile P1 + Codex P2] reprocess-graph.ts delete-before-extract — the script deletes an item's edges before the LLM extraction. If extraction throws (rate-limit/transient), the catch just continues, so those edges are permanently gone — and since the script finds items by their existing edges, that item becomes invisible to re-runs. Valid, and the most serious. Fix: extract first, delete+insert only on success.
[Codex P2] bulk gate too broad — isBulkPromotional runs for all high-signal items, including calendar meetings. A meeting organized by a no-reply@/notifications@ service address would be wrongly gated, dropping real meeting relationships. Fix: scope the gate to Gmail/email only.
[Greptile P2] count truncation — dedup happens after limit(1000), so the "N mentions" count can under-report for a triple with >1000 raw rows.

All three are legitimate. Let me fix them. Starting with the P1 (and the shared P2#1) — I'll refactor so the fallible LLM step runs before any deletion. First re-read the current files in the worktree.

Now I'll refactor: scope the gate to Gmail, and split the resolve+write loop into a reusable function so the migration can run the fallible LLM step before any deletion.

Now update the import to bring in Triple:

Now rewrite the reprocessor with safe ordering (extract first, delete+write only on success). Let me read the current loop.

Now the count-truncation note (P2). At per-tenant scale the 1000-row cap is far above realistic edge counts, but I'll make the bound honest in code rather than leave it silent. Let me read the route block.