Honcho’s core model is a pairwise memory system layered on top of ordinary chat data.
At the bottom:
Workspace: tenant/container for everything.
Peer: any participant, human or agent. Peers are unique by (workspace_name, name).
Session: a conversation context inside a workspace.
session_peers: many-to-many membership table, with per-peer session config like observe_me / observe_others.
Message: raw utterance from a peer in a session, ordered by seq_in_session.
MessageEmbedding: vector-search rows for raw messages, used when agents need to search original conversation text.
See src/models.py, src/models.py, src/models.py, src/models.py, src/models.py, and src/models.py.
The memory layer is where Honcho becomes unusual:
Collection: one memory collection per (workspace, observer, observed).
Document: one persisted observation/conclusion inside that pairwise collection.
Document.level: explicit, deductive, inductive, or contradiction.
Document.source_ids: links higher-level observations back to source observations, giving Honcho a traversable reasoning graph.
Document.embedding: vector index for semantic memory search.
That means Alice’s model of Bob is separate from Bob’s model of Bob, and separate from Carol’s model of Bob. The schema enforces that with the observer, observed, and workspace_name keys on collections and documents; see src/models.py and src/models.py.
A Representation is the in-memory view over those documents. It is not a table. It groups observations into four levels:
explicit: facts directly stated.
deductive: logical implications.
inductive: patterns/generalizations.
contradiction: conflicting claims.
That shape is defined in src/utils/representation.py.
Message Flow
When a message is created, Honcho stores the raw message, optionally embeds it, then enqueues background work. Message creation creates MessageEmbedding rows if EMBED_MESSAGES is enabled; see src/crud/message.py.
Then enqueueing decides what inference tasks are needed:
- create summaries on configured message intervals,
- create representation tasks if reasoning is enabled,
- decide who observes whom based on
observe_me and observe_others.
That logic is in src/deriver/enqueue.py. A message from peer observed may produce a self-observation collection (observed -> observed) and also other peer-perspective collections like (assistant -> observed) if session config says those peers observe others.
What Inference Is For
Honcho uses inference for several distinct jobs because each job produces a different kind of artifact.
- Explicit observation extraction
Raw messages are unstructured. The deriver asks an LLM to turn recent messages into structured explicit facts. It uses a structured PromptRepresentation whose only field is explicit, then saves those observations into every relevant observer/observed collection. See src/deriver/deriver.py and src/deriver/deriver.py.
This is inference, but constrained: “facts literally stated,” not broader reasoning.
- Embeddings for retrieval
Honcho embeds messages and observations so agents can do semantic search over raw conversation and memory. This is not “reasoning” in the same sense as the LLM deriver, but it is still model inference. Observation embeddings are created when representations are saved; see src/crud/representation.py.
- Session summaries
Summaries are stored in Session.internal_metadata, not as first-class rows. Honcho creates short and long rolling summaries to keep session context bounded while preserving key facts. The prompts explicitly ask for factual summaries, preferences, context, and, for long summaries, emotional state/personality themes. See src/utils/summarizer.py and src/utils/summarizer.py.
- Dreaming / consolidation
The Dreamer runs later over accumulated observations. It has two specialist agents:
- deduction: logical implications, updates, contradictions, outdated-observation deletion, peer-card updates;
- induction: recurring patterns, preferences, traits, behavioral tendencies.
The orchestrator runs deduction then induction; see src/dreamer/orchestrator.py. The specialist prompts make the facets explicit in src/dreamer/specialists.py and src/dreamer/specialists.py.
- Peer cards
A peer card is a compact durable profile stored on the observing peer’s internal_metadata, keyed by the observed peer. For self-observation it uses peer_card; for another peer it uses {observed}_peer_card. See src/crud/peer_card.py and src/crud/peer_card.py.
This is another “facet” because it is not the full memory graph. It is a small, durable profile cache: name, preferences, stable traits, standing instructions.
- Dialectic answers
The Dialectic API is inference at query time. It prefetches semantically relevant explicit and derived observations, injects recent session history, then lets an LLM use tools to search memory/messages and synthesize an answer. See src/dialectic/core.py and src/dialectic/core.py.
So the short answer is: Honcho needs inference because its product is not “chat storage.” It is trying to maintain a layered, perspective-specific model of people. The different inference requests correspond to different compression/abstraction layers: raw messages -> explicit observations -> deductions/contradictions -> inductive patterns -> peer cards -> query-time dialectic answers.