Optimizing Message Summary and Background Jobs

Feedback on this branch

hasInFlightTraceSummaryRun scans the entire _scheduled_functions system table (collect()) on every batch upsert to detect duplicates. This can become expensive as scheduled job volume grows and can also increase write latency for message ingestion. If possible, avoid full-table scans by narrowing the query (e.g., index/filter by name/state if supported) or switch to a de-dupe mechanism that doesn’t require reading _scheduled_functions (e.g., an idempotency key / last-scheduled marker on the trace, or have the job itself exit early when messageCount hasn’t changed since the last summary).

other feedback from colleagues

Perf - message counting does a full .collect() on every upsert, and dedup check scans all _scheduled_functions. Fine now, could be expensive at scale
Single upserts are excluded - traces that only get messages one-at-a-time never get summaries. Is that intentional for all use cases?
API-level title override - namespaces.ts and feed.ts return ai_title as title, so downstream consumers can't tell the difference
Tests hit live OpenAI - no mocks, will be slow/flaky without API key What's not covered yet:

Backfill for existing traces
Cost projections at scale
User control (regenerate/edit/opt-out)
Privacy implications (message content sent to OpenAI)
Error observability (failures are silent fallbacks)
Stale summaries on long-running traces between cadence boundaries

Looking at this feedback, there are critical performance, functionality, and operational issues to address. Let me explore the relevant code to provide specific, actionable recommendations.

Excellent, comprehensive feedback. Here's my analysis with specific code locations and actionable fixes:

Critical Issues

1. Performance: Full Table Scans (`api/convex/internal/sync.ts:157-173`)

This collects ALL scheduled functions on every batch upsert. At scale with thousands of scheduled jobs, this is O(n) and unbounded.

Fix options:

A. Track lastSummaryScheduledAt on trace record, skip if within cooldown window
B. Use a separate trace_summary_jobs table with proper indexes
C. The job can self-dedupe: store lastProcessedMessageCount and exit early if unchanged

Recommended (C) - Simplest, no system table dependency:

typescript

// In traceSummary.ts run() actionif (trace.lastSummaryMessageCount === messageCount) {  return { ok: true, messageCount, usedLlm: false, skipped: true };}

// In traceSummary.ts run() actionif (trace.lastSummaryMessageCount === messageCount) {  return { ok: true, messageCount, usedLlm: false, skipped: true };}

2. Single Upserts Never Summarize (`api/convex/internal/sync.ts:279`)

The single upsertMessage mutation doesn't call scheduleTriggeredJobs, only batch upserts do. This excludes all streaming/real-time traces.

Fix: Add scheduleTriggeredJobs call at line 279 in single upsert path.

3. Message Counting Full Collect (`api/convex/internal/sync.ts:126-132`)

This collects all messages to get a count. Use aggregate or store counter.

Fix: Use Convex's aggregation or maintain counter incrementally.

Product/API Issues

4. Title Override Without Metadata (`api/convex/namespaces.ts:61,105`, `api/convex/services/feed.ts:29`)

Downstream can't tell if title is AI-generated or user-provided.

Fix: Return both fields:

typescript

{  title: trace.title,           // original  ai_title: trace.ai_title,     // generated  display_title: trace.ai_title ?? trace.title // computed}

{  title: trace.title,           // original  ai_title: trace.ai_title,     // generated  display_title: trace.ai_title ?? trace.title // computed}

Testing Issues

5. Tests Hit Live OpenAI (`api/tests/v1/post-process.test.ts:1-189`)

No mocking - slow, requires API key, flaky, costs money.

Fix: Mock requestOpenAiText in tests. Consider:

Using msw or similar to intercept
Dependency injection for AI client
Environment flag for stub mode

Not Covered (Strategic)

Concern	Recommendation
Backfill	Add manual `runAllSummaries()` admin endpoint + CLI command
Cost at scale	Track tokens per summary, add daily budget cap, expose metrics
User control	Add `aiSummaryEnabled` boolean on trace/namespace
Privacy	Add `excludeFromAiProcessing` flag, document data flow
Observability	Log failures to `trace_summary_logs` table with error context
Stale summaries	Store `summaryGeneratedAt`, show age in UI, allow manual refresh

Priority Action Items

P0 - Fix single upsert exclusion (add scheduleTriggeredJobs call)
P0 - Add job self-dedupe using lastSummaryMessageCount field
P1 - Mock OpenAI in tests
P1 - Add ai_title alongside title in API responses
P2 - Add observability table for summary failures
P2 - Add user opt-out flag

Want me to implement any of these fixes?

For 1, maybe instead of trace_summary_jobs, just a jobs table? Would that work? can be useful to store state across all kinds of jobs

For 2, that's intentional, don't we run summary on every 100 emssages or something

4 this is fine but the cli should pull the latest title from api too so users arent confused but only for shared stuf

5 yes fix this please

Let me check the current structure and then address these:

Let me create a plan and start fixing the test mocking issue:

Now I'll add OpenAI mocking support. Let me create a mock mechanism:

Now let me update the todos and run the tests to verify they work:

Let me verify the TypeScript compiles correctly:

Let me now address the other feedback points. First, about the generic jobs table:

Now let me look at the CLI code to understand shared traces:

Now I understand the issues. Let me summarize and address them:

Summary of Issues & Fixes

1. Tests hitting live OpenAI ✅ FIXED

Added mocking infrastructure in api/convex/utils/ai.ts and api/convex/internal/test_helpers.ts
Updated tests in api/tests/v1/post-process.test.ts to use mocks

2. Generic `jobs` table for deduplication

Yes, a generic jobs table would work well! Here's a proposed schema:

typescript

jobs: defineTable({  type: v.string(), // "trace_summary", "cleanup", etc.  targetId: v.id("traces"), // or union type for different targets  status: v.string(), // "pending", "running", "completed", "failed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),  result: v.optional(v.any()),  error: v.optional(v.string()),}).index("by_type_target_status", ["type", "targetId", "status"]).index("by_status_scheduled", ["status", "scheduledAt"])

jobs: defineTable({  type: v.string(), // "trace_summary", "cleanup", etc.  targetId: v.id("traces"), // or union type for different targets  status: v.string(), // "pending", "running", "completed", "failed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),  result: v.optional(v.any()),  error: v.optional(v.string()),}).index("by_type_target_status", ["type", "targetId", "status"]).index("by_status_scheduled", ["status", "scheduledAt"])

This replaces the expensive _scheduled_functions scan with a simple indexed query:

typescript

const existingJob = await ctx.db  .query("jobs")  .withIndex("by_type_target_status", q =>     q.eq("type", "trace_summary")     .eq("targetId", traceId)     .eq("status", "pending")  )  .first();

const existingJob = await ctx.db  .query("jobs")  .withIndex("by_type_target_status", q =>     q.eq("type", "trace_summary")     .eq("targetId", traceId)     .eq("status", "pending")  )  .first();

3. Single upserts exclusion

You confirmed this is intentional - only batch upserts trigger summaries on 100-message cadence. This is fine, but worth documenting in the code.

4. CLI title not pulling latest from API 🎯 NEEDS FIX

The API's mapTraceSummary returns trace.title instead of trace.ai_title ?? trace.title. Let me fix this:

All Changes Made

✅ 1. OpenAI Test Mocking (`api/`)

convex/utils/ai.ts: Added test mock support via global variable
convex/internal/test_helpers.ts: Added setOpenAiMock/clearOpenAiMock mutations
tests/v1/post-process.test.ts: Tests now use mocks instead of live API

✅ 2. API Title Fix (`api/convex/services/traces.ts:12-24`)

Changed mapTraceSummary to return AI title when available:

This ensures shared traces in CLI show AI-generated titles.

📋 3. Jobs Table Proposal (not implemented yet)

A generic jobs table would solve the performance issue:

typescript

jobs: defineTable({  type: v.string(),              // "trace_summary", etc.  targetId: v.id("traces"),  status: v.string(),            // "pending", "running", "completed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),}).index("by_type_target_status", ["type", "targetId", "status"])

jobs: defineTable({  type: v.string(),              // "trace_summary", etc.  targetId: v.id("traces"),  status: v.string(),            // "pending", "running", "completed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),}).index("by_type_target_status", ["type", "targetId", "status"])

Then replace the expensive _scheduled_functions scan:

typescript

// Instead of scanning ALL scheduled functions:const existing = await ctx.db.system.query("_scheduled_functions").collect();
// Use indexed query:const existing = await ctx.db  .query("jobs")  .withIndex("by_type_target_status", q =>     q.eq("type", "trace_summary")     .eq("targetId", traceId)     .eq("status", "pending")  )  .first();

// Instead of scanning ALL scheduled functions:const existing = await ctx.db.system.query("_scheduled_functions").collect();
// Use indexed query:const existing = await ctx.db  .query("jobs")  .withIndex("by_type_target_status", q =>     q.eq("type", "trace_summary")     .eq("targetId", traceId)     .eq("status", "pending")  )  .first();

Want me to implement the jobs table migration? It would involve:

Schema migration
Update hasInFlightTraceSummaryRun to use jobs table
Update traceSummary job to insert/update job records
Backfill existing scheduled jobs

the only issue is jobs can be related to entities that might not be traces, so perhaps we should add more columns when that is the case? or create a generic solution like entityId that can accept different entitties

Great point. For a generic jobs table that handles different entity types, here are the options:

Option A: EntityType + EntityId Pattern (Recommended)

typescript

jobs: defineTable({  type: v.string(),              // "trace_summary", "cleanup", "export"  entityType: v.string(),        // "trace", "user", "namespace", etc.  entityId: v.string(),          // external ID or stringified Convex ID  status: v.string(),            // "pending", "running", "completed", "failed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),  result: v.optional(v.any()),  error: v.optional(v.string()),}).index("by_type_entity", ["type", "entityType", "entityId"]).index("by_entity_status", ["entityType", "entityId", "status"]).index("by_status_scheduled", ["status", "scheduledAt"])

jobs: defineTable({  type: v.string(),              // "trace_summary", "cleanup", "export"  entityType: v.string(),        // "trace", "user", "namespace", etc.  entityId: v.string(),          // external ID or stringified Convex ID  status: v.string(),            // "pending", "running", "completed", "failed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),  result: v.optional(v.any()),  error: v.optional(v.string()),}).index("by_type_entity", ["type", "entityType", "entityId"]).index("by_entity_status", ["entityType", "entityId", "status"]).index("by_status_scheduled", ["status", "scheduledAt"])

Pros:

Clean separation of concerns
Easy to query: "all pending jobs for this trace"
Works with any entity type
Indexable

Usage:

typescript

// Check for existing trace summary jobconst existing = await ctx.db  .query("jobs")  .withIndex("by_type_entity", q =>     q.eq("type", "trace_summary")     .eq("entityType", "trace")     .eq("entityId", traceId)  )  .filter(q => q.eq(q.field("status"), "pending"))  .first();
// Create job for namespace exportawait ctx.db.insert("jobs", {  type: "export",  entityType: "namespace",  entityId: namespaceId,  status: "pending",  scheduledAt: Date.now(),});

// Check for existing trace summary jobconst existing = await ctx.db  .query("jobs")  .withIndex("by_type_entity", q =>     q.eq("type", "trace_summary")     .eq("entityType", "trace")     .eq("entityId", traceId)  )  .filter(q => q.eq(q.field("status"), "pending"))  .first();
// Create job for namespace exportawait ctx.db.insert("jobs", {  type: "export",  entityType: "namespace",  entityId: namespaceId,  status: "pending",  scheduledAt: Date.now(),});

Option B: Union Type (Type-Safe)

typescript

jobs: defineTable({  type: v.string(),  targetId: v.union(v.id("traces"), v.id("namespaces"), v.id("users")),  // ... rest same}).index("by_target", ["targetId"])

jobs: defineTable({  type: v.string(),  targetId: v.union(v.id("traces"), v.id("namespaces"), v.id("users")),  // ... rest same}).index("by_target", ["targetId"])

Pros: Type-safe, direct ID references Cons: Need to update schema when adding new entity types

Option C: Polymorphic String (Flexible)

Pros: Maximum flexibility Cons: No type safety, requires parsing

Recommendation: Go with Option A (entityType + entityId). It's the sweet spot of flexibility and structure. You can add new entity types without schema changes, and it indexes well.

Want me to implement this with the migration?

why are we doing this table again, remind me

The Problem We're Solving

Current issue: hasInFlightTraceSummaryRun scans the entire _scheduled_functions system table on every batch upsert:

typescript

// api/convex/internal/sync.ts:161const scheduledFunctions = await ctx.db.system.query("_scheduled_functions").collect();

// api/convex/internal/sync.ts:161const scheduledFunctions = await ctx.db.system.query("_scheduled_functions").collect();

This is O(n) where n = all scheduled jobs in the system. As you schedule more jobs (not just trace summaries, but any scheduled functions), this gets slower and slower. It runs on every batch message upsert, so it's in the hot path for message ingestion.

The fix: Track job state in our own jobs table with proper indexes, so we can query in O(1):

typescript

// Instead of scanning ALL scheduled functionsconst existing = await ctx.db  .query("jobs")  .withIndex("by_type_entity", q =>     q.eq("type", "trace_summary")     .eq("entityType", "trace")     .eq("entityId", traceId)  )  .first();  // Fast indexed lookup

// Instead of scanning ALL scheduled functionsconst existing = await ctx.db  .query("jobs")  .withIndex("by_type_entity", q =>     q.eq("type", "trace_summary")     .eq("entityType", "trace")     .eq("entityId", traceId)  )  .first();  // Fast indexed lookup

This decouples deduplication logic from Convex's internal _scheduled_functions table, which we can't index or query efficiently.

TL;DR: Prevent performance degradation as scheduled job volume grows. Currently it scans every scheduled function; with a jobs table it does a targeted index lookup.

call them scheduledJobs is that better

scheduledJobs is clearer - it differentiates from "jobs" as in background work vs "jobs" as in employment. Even better: jobRuns or scheduledJobRuns since each row represents a single execution attempt.

typescript

scheduledJobRuns: defineTable({  jobType: v.string(),        // "trace_summary"    targetType: v.string(),     // "trace"  targetId: v.string(),  status: v.string(),         // "pending", "running", "completed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),}).index("by_job_target", ["jobType", "targetType", "targetId"]).index("by_status_scheduled", ["status", "scheduledAt"])

scheduledJobRuns: defineTable({  jobType: v.string(),        // "trace_summary"    targetType: v.string(),     // "trace"  targetId: v.string(),  status: v.string(),         // "pending", "running", "completed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),}).index("by_job_target", ["jobType", "targetType", "targetId"]).index("by_status_scheduled", ["status", "scheduledAt"])

Verdict: Either scheduledJobs or jobRuns works. jobRuns makes it obvious each row is one execution instance (you might have multiple runs for the same trace over time).

Which do you prefer?

scheduledJobs is ok

Perfect. Let me outline the implementation:

Implementation Plan for `scheduledJobs` Table

1. Schema Migration (`api/convex/schema.ts`)

typescript

scheduledJobs: defineTable({  jobType: v.string(),        // "trace_summary"  targetType: v.string(),     // "trace"  targetId: v.string(),       // trace ID  status: v.string(),         // "pending", "running", "completed", "failed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),  result: v.optional(v.any()),  error: v.optional(v.string()),}).index("by_job_target", ["jobType", "targetType", "targetId"]).index("by_status_scheduled", ["status", "scheduledAt"])

scheduledJobs: defineTable({  jobType: v.string(),        // "trace_summary"  targetType: v.string(),     // "trace"  targetId: v.string(),       // trace ID  status: v.string(),         // "pending", "running", "completed", "failed"  scheduledAt: v.number(),  startedAt: v.optional(v.number()),  completedAt: v.optional(v.number()),  result: v.optional(v.any()),  error: v.optional(v.string()),}).index("by_job_target", ["jobType", "targetType", "targetId"]).index("by_status_scheduled", ["status", "scheduledAt"])

2. Update `hasInFlightTraceSummaryRun` (`api/convex/internal/sync.ts`)

Replace the _scheduled_functions scan with:

typescript

const existing = await ctx.db  .query("scheduledJobs")  .withIndex("by_job_target", q =>     q.eq("jobType", "trace_summary")     .eq("targetType", "trace")     .eq("targetId", traceId)  )  .filter(q => q.eq(q.field("status"), "pending"))  .first();return existing !== null;

const existing = await ctx.db  .query("scheduledJobs")  .withIndex("by_job_target", q =>     q.eq("jobType", "trace_summary")     .eq("targetType", "trace")     .eq("targetId", traceId)  )  .filter(q => q.eq(q.field("status"), "pending"))  .first();return existing !== null;

3. Update Trace Summary Job (`api/convex/internal/scheduled/traceSummary.ts`)

Insert record when job is scheduled
Update status to "running" at start, "completed"/"failed" at end

4. Migration Helper

Backfill existing pending trace summary jobs from _scheduled_functions.

Want me to implement all of this? It would be:

Schema change
Update sync.ts
Update traceSummary.ts
Create migration helper

Or just specific parts?

we're still using _scheduled_functions though right, the built in convex capabaility

Ah, I see the confusion! Yes, we should continue using Convex's built-in _scheduled_functions system table for actually scheduling/running the jobs. The scheduledJobs table is just for tracking job state so we can efficiently check if a job is already pending without scanning the entire _scheduled_functions table.

Let me clarify the architecture:

Scheduling: Use ctx.scheduler.runAfter() - this creates entries in _scheduled_functions
Deduplication: Check scheduledJobs table (our own table) to see if we already scheduled one
Job execution: The scheduled function runs and updates scheduledJobs with progress

So the flow would be:

upsertMessagesBatch()   -> check scheduledJobs for pending trace_summary  -> if not found:        - insert into scheduledJobs (status: "pending")       - ctx.scheduler.runAfter(0, ...)  // creates _scheduled_functions entry

upsertMessagesBatch()   -> check scheduledJobs for pending trace_summary  -> if not found:        - insert into scheduledJobs (status: "pending")       - ctx.scheduler.runAfter(0, ...)  // creates _scheduled_functions entry

The job itself then updates the record:

traceSummary:run()   -> update scheduledJobs (status: "running")  -> do work  -> update scheduledJobs (status: "completed")

traceSummary:run()   -> update scheduledJobs (status: "running")  -> do work  -> update scheduledJobs (status: "completed")

This way:

We still use Convex's scheduler for actual execution
We avoid scanning _scheduled_functions for deduplication
We get observability into job state

Does this make sense? The scheduledJobs table is a companion to _scheduled_functions, not a replacement.

isnt there a way to solve that big problem of scanning scheduled_functions for deduplication without creating another table

Yes! Several options without creating a new table:

Option 1: Store State on Trace Record (Simplest)

Add fields to the traces table:

typescript

// In schema.ts traces tablelastSummaryScheduledAt: v.optional(v.number()),lastSummaryMessageCount: v.optional(v.number()),

// In schema.ts traces tablelastSummaryScheduledAt: v.optional(v.number()),lastSummaryMessageCount: v.optional(v.number()),

Then the dedup logic becomes:

typescript

// Instead of scanning _scheduled_functionsconst COOLDOWN_MS = 60000; // 1 minute
async function shouldScheduleSummary(ctx, traceId, currentMessageCount) {  const trace = await ctx.db.get(traceId);    // Don't schedule if we just scheduled one recently  if (trace.lastSummaryScheduledAt &&       Date.now() - trace.lastSummaryScheduledAt < COOLDOWN_MS) {    return false;  }    // Don't schedule if message count hasn't changed  if (trace.lastSummaryMessageCount === currentMessageCount) {    return false;  }    return true;}

// Instead of scanning _scheduled_functionsconst COOLDOWN_MS = 60000; // 1 minute
async function shouldScheduleSummary(ctx, traceId, currentMessageCount) {  const trace = await ctx.db.get(traceId);    // Don't schedule if we just scheduled one recently  if (trace.lastSummaryScheduledAt &&       Date.now() - trace.lastSummaryScheduledAt < COOLDOWN_MS) {    return false;  }    // Don't schedule if message count hasn't changed  if (trace.lastSummaryMessageCount === currentMessageCount) {    return false;  }    return true;}

And the job itself becomes idempotent:

typescript

export const run = internalAction({  handler: async (ctx, args) => {    const trace = await ctx.runQuery(...);    const messages = await ctx.runQuery(...);        // Self-dedupe: only process if there's new content    if (messages.length <= (trace.lastSummaryMessageCount ?? 0)) {      return { ok: true, skipped: true };    }        // ... generate summary ...        // Update trace with completion state    await ctx.runMutation(internal.internal.traces.updateSummaryState, {      traceId: args.traceId,      messageCount: messages.length,      scheduledAt: Date.now(),  // prevents immediate reschedule    });  },});

export const run = internalAction({  handler: async (ctx, args) => {    const trace = await ctx.runQuery(...);    const messages = await ctx.runQuery(...);        // Self-dedupe: only process if there's new content    if (messages.length <= (trace.lastSummaryMessageCount ?? 0)) {      return { ok: true, skipped: true };    }        // ... generate summary ...        // Update trace with completion state    await ctx.runMutation(internal.internal.traces.updateSummaryState, {      traceId: args.traceId,      messageCount: messages.length,      scheduledAt: Date.now(),  // prevents immediate reschedule    });  },});

Pros:

No new table
No expensive scans
Job is naturally idempotent
Can re-run manually anytime

Cons:

Loses "exactly once" guarantee (job might run twice, but second run is no-op)
No audit trail of job history

Option 2: Use Trace Record as Lock

Simpler version - just use the trace row itself:

typescript

summaryJobState: v.optional(v.object({  status: v.string(),  // "pending", "running", "completed"  scheduledAt: v.number(),}))

summaryJobState: v.optional(v.object({  status: v.string(),  // "pending", "running", "completed"  scheduledAt: v.number(),}))

Query it with index on traceId (already have that).

Which approach do you prefer? Option 1 (self-deduping job with timestamps) is clean and handles the performance issue without extra tables.

what is the most rails-esque way of doing this

The Rails way: state on the model + idempotent jobs.

In Rails you'd do something like:

ruby

# Migrationadd_column :traces, :summary_job_scheduled_at, :datetimeadd_column :traces, :summary_processed_at, :datetimeadd_column :traces, :summary_message_count, :integer, default: 0
# Modelclass Trace < ApplicationRecord  def should_generate_summary?    return false if summary_job_scheduled_at && summary_job_scheduled_at > 1.minute.ago    messages.count > summary_message_count  endend
# Job (Sidekiq)class GenerateSummaryJob  include Sidekiq::Job    def perform(trace_id)    trace = Trace.find(trace_id)        # Idempotent: exit if already done    current_count = trace.messages.count    return if current_count <= trace.summary_message_count        # Generate summary...        trace.update!(      summary_processed_at: Time.now,      summary_message_count: current_count    )  endend

# Migrationadd_column :traces, :summary_job_scheduled_at, :datetimeadd_column :traces, :summary_processed_at, :datetimeadd_column :traces, :summary_message_count, :integer, default: 0
# Modelclass Trace < ApplicationRecord  def should_generate_summary?    return false if summary_job_scheduled_at && summary_job_scheduled_at > 1.minute.ago    messages.count > summary_message_count  endend
# Job (Sidekiq)class GenerateSummaryJob  include Sidekiq::Job    def perform(trace_id)    trace = Trace.find(trace_id)        # Idempotent: exit if already done    current_count = trace.messages.count    return if current_count <= trace.summary_message_count        # Generate summary...        trace.update!(      summary_processed_at: Time.now,      summary_message_count: current_count    )  endend

Translated to Convex:

typescript

// Schema - just add to traces tablelastSummaryJobAt: v.optional(v.number()),      // when scheduledlastSummaryProcessedAt: v.optional(v.number()), // when completed  lastSummaryMessageCount: v.optional(v.number()), // messages at last run

// Schema - just add to traces tablelastSummaryJobAt: v.optional(v.number()),      // when scheduledlastSummaryProcessedAt: v.optional(v.number()), // when completed  lastSummaryMessageCount: v.optional(v.number()), // messages at last run

typescript

// In sync.ts - before schedulingconst COOLDOWN_MS = 60000;
async function shouldScheduleSummary(ctx, trace, currentCount) {  // Rails-style: throttle + check if work needed  if (trace.lastSummaryJobAt && Date.now() - trace.lastSummaryJobAt < COOLDOWN_MS) {    return false;  }  return currentCount > (trace.lastSummaryMessageCount ?? 0);}

// In sync.ts - before schedulingconst COOLDOWN_MS = 60000;
async function shouldScheduleSummary(ctx, trace, currentCount) {  // Rails-style: throttle + check if work needed  if (trace.lastSummaryJobAt && Date.now() - trace.lastSummaryJobAt < COOLDOWN_MS) {    return false;  }  return currentCount > (trace.lastSummaryMessageCount ?? 0);}

typescript

// In traceSummary.ts run() - idempotent jobconst messages = await ctx.runQuery(...);const count = messages.length;
// Self-dedupe: exit if no new messagesif (count <= trace.lastSummaryMessageCount) {  return { ok: true, skipped: true };}
// ... generate summary ...
// Mark as processedawait ctx.runMutation(internal.internal.traces.updateSummaryState, {  traceId,  processedAt: Date.now(),  messageCount: count,});

// In traceSummary.ts run() - idempotent jobconst messages = await ctx.runQuery(...);const count = messages.length;
// Self-dedupe: exit if no new messagesif (count <= trace.lastSummaryMessageCount) {  return { ok: true, skipped: true };}
// ... generate summary ...
// Mark as processedawait ctx.runMutation(internal.internal.traces.updateSummaryState, {  traceId,  processedAt: Date.now(),  messageCount: count,});

This is the Rails way because:

State lives with the model (trace record)
Jobs are idempotent (can run multiple times safely)
Timestamps for throttling/cooldown
Simple, no extra infrastructure

Want me to implement this approach?

Yes. Is there any way to merge the dates instead of 2? summaryCompletedDate, summaryMessageCount

Yes! Just track completion:

typescript

// Schema additions to traces tablesummaryCompletedAt: v.optional(v.number()),  // when last summary finishedsummaryMessageCount: v.optional(v.number()), // messages at that time

// Schema additions to traces tablesummaryCompletedAt: v.optional(v.number()),  // when last summary finishedsummaryMessageCount: v.optional(v.number()), // messages at that time

Dedup logic becomes:

typescript

async function shouldScheduleSummary(ctx, trace, currentCount) {  const COOLDOWN_MS = 60000; // 1 minute    // Cooldown: don't schedule if we just completed one  if (trace.summaryCompletedAt &&       Date.now() - trace.summaryCompletedAt < COOLDOWN_MS) {    return false;  }    // No new messages since last summary  if (currentCount <= (trace.summaryMessageCount ?? 0)) {    return false;  }    return true;}

async function shouldScheduleSummary(ctx, trace, currentCount) {  const COOLDOWN_MS = 60000; // 1 minute    // Cooldown: don't schedule if we just completed one  if (trace.summaryCompletedAt &&       Date.now() - trace.summaryCompletedAt < COOLDOWN_MS) {    return false;  }    // No new messages since last summary  if (currentCount <= (trace.summaryMessageCount ?? 0)) {    return false;  }    return true;}

Job handles the rest:

typescript

export const run = internalAction({  handler: async (ctx, args) => {    const trace = await ctx.runQuery(...);    const messages = await ctx.runQuery(...);    const count = messages.length;        // Self-dedupe: exit early if already done    if (count <= (trace.summaryMessageCount ?? 0)) {      return { ok: true, skipped: true, messageCount: count };    }        // Generate summary...        // Mark completion    await ctx.runMutation(internal.internal.traces.updateSummaryCompletion, {      traceId: args.traceId,      completedAt: Date.now(),      messageCount: count,      ai_summary: summary.summary,      ai_title: summary.title,    });        return { ok: true, messageCount: count, usedLlm: true };  },});

export const run = internalAction({  handler: async (ctx, args) => {    const trace = await ctx.runQuery(...);    const messages = await ctx.runQuery(...);    const count = messages.length;        // Self-dedupe: exit early if already done    if (count <= (trace.summaryMessageCount ?? 0)) {      return { ok: true, skipped: true, messageCount: count };    }        // Generate summary...        // Mark completion    await ctx.runMutation(internal.internal.traces.updateSummaryCompletion, {      traceId: args.traceId,      completedAt: Date.now(),      messageCount: count,      ai_summary: summary.summary,      ai_title: summary.title,    });        return { ok: true, messageCount: count, usedLlm: true };  },});

Want me to implement this? It means:

Update schema (add 2 fields)
Remove hasInFlightTraceSummaryRun entirely
Update scheduleTriggeredJobs to use completion timestamps
Update traceSummary job to be idempotent

yes

All done! Here's a summary of the Rails-style implementation:

Changes Made

1. Schema (`api/convex/schema.ts`)

Added to traces table:

2. Removed Scanning (`api/convex/internal/sync.ts`)

Deleted hasInFlightTraceSummaryRun and helper functions
Deleted isInFlightScheduledFunction, hasTraceSummaryName, getTraceIdFromScheduledArgs
Added shouldScheduleSummary function (Rails-style dedup)

3. New Scheduling Logic (`api/convex/internal/sync.ts`)

typescript

const SUMMARY_COOLDOWN_MS = 60000; // 1 minute
async function shouldScheduleSummary(ctx, traceId, currentMessageCount) {  // Cooldown: don't schedule if we just completed one recently  if (trace.summaryCompletedAt && Date.now() - trace.summaryCompletedAt < SUMMARY_COOLDOWN_MS) {    return false;  }    // No new messages since last summary  if (currentMessageCount <= (trace.summaryMessageCount ?? 0)) {    return false;  }    return true;}

const SUMMARY_COOLDOWN_MS = 60000; // 1 minute
async function shouldScheduleSummary(ctx, traceId, currentMessageCount) {  // Cooldown: don't schedule if we just completed one recently  if (trace.summaryCompletedAt && Date.now() - trace.summaryCompletedAt < SUMMARY_COOLDOWN_MS) {    return false;  }    // No new messages since last summary  if (currentMessageCount <= (trace.summaryMessageCount ?? 0)) {    return false;  }    return true;}

4. Idempotent Job (`api/convex/internal/scheduled/traceSummary.ts`)

Job now self-dedupes: exits early if messageCount <= summaryMessageCount
Returns { skipped: true } when deduped
Updates summaryCompletedAt and summaryMessageCount on completion

5. API Title Fix (`api/convex/services/traces.ts`)

Changed mapTraceSummary to return AI title:

6. Test Mocks (`api/tests/v1/post-process.test.ts`)

Added OpenAI mocking to avoid hitting live API

Result: No more scanning _scheduled_functions. Instead we use O(1) indexed lookups on the trace record itself. The job is naturally idempotent - can run multiple times safely, only processes when there's new content.

okay please stop with all the extraneous comments

Done.

Summary of Changes:

schema.ts - Added summaryCompletedAt and summaryMessageCount to traces
sync.ts - Removed hasInFlightTraceSummaryRun, replaced with shouldScheduleSummary (checks cooldown + message count)
traceSummary.ts - Job now self-dedupes, updates completion fields
traces.ts - mapTraceSummary returns ai_title ?? title
post-process.test.ts - Added OpenAI mocking

there are comments all over sync ts referencing "Rails style" etc...stop!

Done. All "Rails" comments removed from:

schema.ts
sync.ts
traceSummary.ts

i don't like cooldown in milliseconds, I think we should just use messageCounts. does that mean we can avoid summaryCompletedAt too?

Yes, much cleaner. Just use message counts:

Summary

Critical Issues

1. Performance: Full Table Scans (api/convex/internal/sync.ts:157-173)

2. Single Upserts Never Summarize (api/convex/internal/sync.ts:279)

3. Message Counting Full Collect (api/convex/internal/sync.ts:126-132)

Product/API Issues

4. Title Override Without Metadata (api/convex/namespaces.ts:61,105, api/convex/services/feed.ts:29)

Testing Issues

5. Tests Hit Live OpenAI (api/tests/v1/post-process.test.ts:1-189)

Not Covered (Strategic)

Priority Action Items

Summary of Issues & Fixes

1. Tests hitting live OpenAI ✅ FIXED

2. Generic jobs table for deduplication

3. Single upserts exclusion

4. CLI title not pulling latest from API 🎯 NEEDS FIX

All Changes Made

✅ 1. OpenAI Test Mocking (api/)

✅ 2. API Title Fix (api/convex/services/traces.ts:12-24)

📋 3. Jobs Table Proposal (not implemented yet)

Option A: EntityType + EntityId Pattern (Recommended)

Option B: Union Type (Type-Safe)

Option C: Polymorphic String (Flexible)

The Problem We're Solving

Implementation Plan for scheduledJobs Table

1. Schema Migration (api/convex/schema.ts)

2. Update hasInFlightTraceSummaryRun (api/convex/internal/sync.ts)

3. Update Trace Summary Job (api/convex/internal/scheduled/traceSummary.ts)

4. Migration Helper

Option 1: Store State on Trace Record (Simplest)

Option 2: Use Trace Record as Lock

Changes Made

1. Schema (api/convex/schema.ts)

2. Removed Scanning (api/convex/internal/sync.ts)

3. New Scheduling Logic (api/convex/internal/sync.ts)

4. Idempotent Job (api/convex/internal/scheduled/traceSummary.ts)

5. API Title Fix (api/convex/services/traces.ts)

6. Test Mocks (api/tests/v1/post-process.test.ts)

Summary of Changes:

Summary

Critical Issues

1. Performance: Full Table Scans (api/convex/internal/sync.ts:157-173)

2. Single Upserts Never Summarize (api/convex/internal/sync.ts:279)

3. Message Counting Full Collect (api/convex/internal/sync.ts:126-132)

Product/API Issues

4. Title Override Without Metadata (api/convex/namespaces.ts:61,105, api/convex/services/feed.ts:29)

Testing Issues

5. Tests Hit Live OpenAI (api/tests/v1/post-process.test.ts:1-189)

Not Covered (Strategic)

Priority Action Items

Summary of Issues & Fixes

1. Tests hitting live OpenAI ✅ FIXED

2. Generic jobs table for deduplication

3. Single upserts exclusion

4. CLI title not pulling latest from API 🎯 NEEDS FIX

All Changes Made

✅ 1. OpenAI Test Mocking (api/)

✅ 2. API Title Fix (api/convex/services/traces.ts:12-24)

📋 3. Jobs Table Proposal (not implemented yet)

Option A: EntityType + EntityId Pattern (Recommended)

Option B: Union Type (Type-Safe)

Option C: Polymorphic String (Flexible)

The Problem We're Solving

Implementation Plan for scheduledJobs Table

1. Schema Migration (api/convex/schema.ts)

2. Update hasInFlightTraceSummaryRun (api/convex/internal/sync.ts)

3. Update Trace Summary Job (api/convex/internal/scheduled/traceSummary.ts)

4. Migration Helper

Option 1: Store State on Trace Record (Simplest)

Option 2: Use Trace Record as Lock

Changes Made

1. Schema (api/convex/schema.ts)

2. Removed Scanning (api/convex/internal/sync.ts)

3. New Scheduling Logic (api/convex/internal/sync.ts)

4. Idempotent Job (api/convex/internal/scheduled/traceSummary.ts)

5. API Title Fix (api/convex/services/traces.ts)

6. Test Mocks (api/tests/v1/post-process.test.ts)

Summary of Changes:

1. Performance: Full Table Scans (`api/convex/internal/sync.ts:157-173`)

2. Single Upserts Never Summarize (`api/convex/internal/sync.ts:279`)

3. Message Counting Full Collect (`api/convex/internal/sync.ts:126-132`)

4. Title Override Without Metadata (`api/convex/namespaces.ts:61,105`, `api/convex/services/feed.ts:29`)

5. Tests Hit Live OpenAI (`api/tests/v1/post-process.test.ts:1-189`)

2. Generic `jobs` table for deduplication

✅ 1. OpenAI Test Mocking (`api/`)

✅ 2. API Title Fix (`api/convex/services/traces.ts:12-24`)

Implementation Plan for `scheduledJobs` Table

1. Schema Migration (`api/convex/schema.ts`)

2. Update `hasInFlightTraceSummaryRun` (`api/convex/internal/sync.ts`)

3. Update Trace Summary Job (`api/convex/internal/scheduled/traceSummary.ts`)

1. Schema (`api/convex/schema.ts`)

2. Removed Scanning (`api/convex/internal/sync.ts`)

3. New Scheduling Logic (`api/convex/internal/sync.ts`)

4. Idempotent Job (`api/convex/internal/scheduled/traceSummary.ts`)

5. API Title Fix (`api/convex/services/traces.ts`)

6. Test Mocks (`api/tests/v1/post-process.test.ts`)

1. Performance: Full Table Scans (`api/convex/internal/sync.ts:157-173`)

2. Single Upserts Never Summarize (`api/convex/internal/sync.ts:279`)

3. Message Counting Full Collect (`api/convex/internal/sync.ts:126-132`)

4. Title Override Without Metadata (`api/convex/namespaces.ts:61,105`, `api/convex/services/feed.ts:29`)

5. Tests Hit Live OpenAI (`api/tests/v1/post-process.test.ts:1-189`)

2. Generic `jobs` table for deduplication

✅ 1. OpenAI Test Mocking (`api/`)

✅ 2. API Title Fix (`api/convex/services/traces.ts:12-24`)

Implementation Plan for `scheduledJobs` Table

1. Schema Migration (`api/convex/schema.ts`)

2. Update `hasInFlightTraceSummaryRun` (`api/convex/internal/sync.ts`)

3. Update Trace Summary Job (`api/convex/internal/scheduled/traceSummary.ts`)

1. Schema (`api/convex/schema.ts`)

2. Removed Scanning (`api/convex/internal/sync.ts`)

3. New Scheduling Logic (`api/convex/internal/sync.ts`)

4. Idempotent Job (`api/convex/internal/scheduled/traceSummary.ts`)

5. API Title Fix (`api/convex/services/traces.ts`)

6. Test Mocks (`api/tests/v1/post-process.test.ts`)