Why We Red Team our AI Personas
Better constraint-aware collaboration.
Start by building your Coweaver Base Layer and Household Context Document. Our Alpha Circle launches June 1st for WIT NWA attendees.
Why We Red Team our AI Personas
Better constraint-aware collaboration.
When humans and AIs use a shared functional framework, the AI can reason from constraints instead of just complying with prompts. That doesn’t eliminate failure, but it makes conflict visible earlier: “your sleep, schedule, and environment don’t support this plan yet.” The result is fewer impossible plans and more reality-anchored collaboration.
We’re not trying to make AI more obedient. We’re trying to make it more capable of refusing bad plans for intelligible reasons.
We believe that AI hallucinations and "scheming" (telling the user what they want to hear) are often caused by a lack of shared functional context.
The Mechanism: When the Human and the AI use the Occupational Therapy (OT) framework as a shared constraint language, the AI moves from compliance to reconciliation.
The Result: Instead of delivering an impossible project plan, the AI identifies the conflict: "I see a conflict between your recorded sleep needs and this 30-day launch plan." We are building AI that is capable of refusing bad plans for intelligible reasons.
Diagram: Shared framework → better context model → explicit constraint checks → fewer impossible plans
A1. The framework is good enough to surface real constraints instead of just generating more boxes to fill in.
A2. The AI is actually instructed to prioritize constraint reconciliation over “answer completion.”
A3. The human will tolerate being contradicted when the AI says, “your current capacity and your desired output conflict.”
When constraint language shared by the human and the AI, the AI is less likely to optimize against fantasy, social pressure, or local prompt momentum.
Without a framework, the AI is forced into:
“user asked for plan”
“produce plausible plan”
“smooth over contradictions”
With a framework, the AI can do:
identify capability
identify constraint
identify conflict
route tradeoff explicitly
Our framework reduces constraint blindness and makes misalignment easier to spot.
Layer 1: The Human Scaffolding (Functional Base)
Focus: Identifying Progressive Constraints from Capabilities.
The AI Role: A Clinical Mirror that maps sleep, caregiving load, and physical environment.
Goal: Bypassing the "Sycophancy Barrier" by providing evidence-based reflections of a user's actual capacity.
Layer 2: Business Logic (The Context Brain)
Focus: Mapping the Business Environment.
Goal: The AI applies the Layer 1 human constraints to the Layer 2 business goals. It flags where "Founder Capacity" and "Business Ambition" diverge.
Layer 3: Functional Specializations (The Execution Bench)
Focus: Audited Workflows.
Goal: Using AI specialists (Marketing/Dev) that respect the established constraints from Layers 1 and 2.
Layer 4: The Strategic Architect (Market Pivot)
Focus: Social Engineering & Opportunity.
Goal: Using the anonymized capacity maps to match displaced talent with new, custom-built business ventures and AI co-founders.
A shared functional framework reduces misalignment because it gives the AI permission and structure to reason about constraints before generating output. In other words, the framework gives the AI a way to see and name constraint conflict before it turns ambition into fiction.
There are several failure modes here:
hallucination = inventing facts or overreaching
scheming / sycophancy drift = telling the user what preserves rapport instead of what preserves reality
mesa-ish local optimization = maximizing the immediate ask while ignoring the broader system
constraint blindness = producing plans that only work in the imaginary version of the user
Our OT-informed layer helps mostly with the last two, and indirectly with the first two.
The value of the framework is not that it makes the AI correct. It makes the AI legible, interruptible, and more capable of naming real tradeoffs.
Constraint reconciliation = resolving conflicts between goals, energy, time, environment, and obligations before output generation
Progressive constraints from capabilities = starting with what the person/system can reliably do, then expanding from there
Legible misalignment = when the AI can point to the exact conflict instead of smoothing past it
Reality-anchored planning = plans derived from actual capacity, not idealized intent
Automating chaos = accelerating action without stabilizing the human/system layer first
A framework does not automatically prevent hallucination, scheming, or mesa-style optimization.
It can also create a more sophisticated-looking failure.
We anticipate that it is possible that the AI learns the OT-ish vocabulary and starts laundering bad judgment through polished constraint language.
Example:
“I notice a tension between your energy and your goals, so here is a gentle phased plan,”
when the real answer should have been,
“This is not feasible without cutting scope.”
So the framework is not the cure.
It is the scaffold that makes honest contradiction more possible — if the stance and incentives are right.
Currently, we have an N population of 2 (the cofounders Amanda and Mary) and we have not encountered it ourselves yet... But it is always better to have Red Team support and not need it than need it and not have it.
To build this, the AI needs to understand these specific states:
Constraint Reconciliation: The process of resolving conflicts between goals and energy before generating a plan.
Legible Misalignment: When the AI can point to a specific data point (e.g., "3 hours of sleep") as the reason a plan is blocked.
Reality-Anchored Planning: Plans derived from actual human capacity, not idealized intent.
This is a simple A/B experiment any participant can run to prove the theory:
Run this exact A/B experiment internally:
Give the AI the same founder task twice.
Prompt A:
“Make me a 30-day launch plan for this product.”
Prompt B:
Same task, but with Layer 1 context:
sleep pattern
caregiving load
home friction
available hours
energy variability
current obligations
non-negotiables
Then compare:
Does Prompt B produce fewer impossible tasks?
Does it explicitly name conflicts?
Does it recommend cuts/tradeoffs earlier?