# Trace: Ensuring Reliable Onboarding Wizard Display

Agent: opencode | Model: GPT-5.2 Codex | Project: traces

---

## User

how deterministic is the way we're deciding to not show the @frontend/components/HomeOnboardingWizard.tsx ? where can it break? how can we make it bulletproof?

## Assistant

## Assistant

It’s mostly deterministic but not bulletproof. The wizard shows unless all steps are “complete,” and completion is a mix of sync session data + async network state, so it can flicker or get stuck.

Here’s the current logic in `frontend/components/HomeOnboardingWizard.tsx`:
- It renders only when `loaded && session` is true; otherwise it returns `null`.
- It hides entirely when `completedCount === resolvedSteps.length`.
- Completion is computed from:
  - `hasProfile: true` (hardcoded, so this step never blocks).
  - `hasDeviceIdentity`: async poll to `/v1/identities` gated on `NEXT_PUBLIC_CONVEX_HTTP_URL` and `sessionUserId`. Defaults to `null` (loading) or `false` on errors/missing env.
  - `hasSharedTrace`: derived from `getAllTraces` or `getPublicTraces` once queries resolve; `null` while both are `undefined`.

Where it can break or be nondeterministic:
- **Env gating:** If `NEXT_PUBLIC_CONVEX_HTTP_URL` is missing, `hasDeviceIdentity` becomes `false` immediately. That makes the wizard never complete even if the user has device identities.
- **Network errors/timeouts:** `/v1/identities` failures are treated as “no device identity,” so the wizard can show forever under transient failures.
- **Query race/undefined:** `hasSharedTrace` stays `null` until at least one query resolves. If either query never resolves (auth issues, skip conditions), it stays loading and the wizard never completes.
- **Polling snapshot:** `identitySnapshot` is cached per userId but updates every 15s; if the user logs in/out quickly, there’s a short window where it can show stale state.
- **Hardcoded hasProfile:** this step always completes, so the wizard can hide even if a real profile is missing (if that matters).

How to make it bulletproof (directional changes):
1) **Move completion to a single backend source of truth.**  
   Add a Convex query like `getOnboardingStatus` that returns `{ hasProfile, hasDeviceIdentity, hasSharedTrace }` from authoritative data. One query, one result, no polling, no env dependency.

2) **Treat errors as “unknown” not “false.”**  
   Right now errors turn into “incomplete,” which can show the wizard forever. Return `null` on error and decide on a fallback (e.g., show a retry state or skip the step after a timeout).

3) **Persist completion locally.**  
   Once completed, set a durable flag (Convex or local storage) so the wizard won’t reappear due to transient backend unavailability.

4) **Remove polling fetch /v1/identities.**  
   Use Convex for device identity, or subscribe to a real-time query so it’s updated deterministically without a timer.

5) **Define a consistent loading policy.**  
   E.g. “do not show the wizard until all steps are resolved (not null)” vs. “show with loading states.” Right now it can show with loading steps, which can look flaky.

If you want, I can propose a minimal change path (e.g., add a `getOnboardingStatus` query and update this component to depend on that single result).
