
Expanding foam fills the reachable cavities \u2014 verified, keyed, auditable routes through rare representation space.
The AI you used this morning did not move through all of itself.
It moved through a narrow set of familiar paths: the regions its training, prompts, tools, and ordinary user interactions tend to activate. Around those paths is a much larger territory: reachable, rarely visited, weakly governed, and difficult to inspect exhaustively.
This essay is about that territory.
Not because nobody has noticed it. The field has been circling it for years: superposition, sparse features, pruning, backdoors, latent representations, multilingual subspaces, stealth channels, defensive backdoors, and model interpretability all point toward the same structural fact.
Modern models contain more behavioral possibility than ordinary evaluation can cover.
The question is what we do about that.
The standard answer is detection: build better microscopes, search for hidden circuits, red-team more triggers, expand the evaluation distribution.
Detection is necessary. But detection alone is structurally outmatched by the size of the space.
This essay proposes a complementary move: stop only searching the empty rooms. Start occupying them.
The metaphor is expanding foam. In construction, expanding foam fills gaps inside a wall. It does not replace the frame. It does not rebuild the house. It expands into reachable cavities, hardens, and leaves less unoccupied space for anything else to enter.
The AI safety version is this: verified, keyed, auditable routes through rare reachable representation space. Not guardrails. Not hidden backdoors. Not a ban on entering certain regions. Occupation.
What “rooms” means
When a language model learns, it does not store facts like files in a cabinet. It builds internal representations: high-dimensional geometric states in which concepts, relations, tasks, styles, memories, languages, and behaviors acquire positions and directions.
This is not metaphorical in the casual sense. These internal coordinates can be extracted, measured, decomposed, probed, and compared. Work on transformer circuits, superposition, monosemanticity, sparse autoencoders, multilingual representation geometry, and hidden-state probing all treats model cognition as something with internal shape.
Let the hidden-state space of a model be:
where
A region
The regions that matter here are not imaginary. They are reachable but rare. The model can get there. Ordinary use usually does not take it there.
These are the empty rooms.
Why empty rooms matter
Backdoors exploit this asymmetry.
A backdoored model does not need to behave badly under normal evaluation. It only needs a route from a trigger to a behavior that ordinary testing does not visit.
The backdoor literature has already named this threat in multiple forms: triggered behavior, stealth activation, supply-chain risk, adversarial pathways, and hidden behavior that remains invisible under standard evaluation.
The point of this essay is not to rediscover that risk. The point is to ask whether detection is enough.
If rare reachable space is much larger than ordinary behavioral space, then detection has a structural disadvantage. You are searching for covert routes in a territory whose size grows faster than your ability to exhaustively inspect it.
Better interpretability tools help. Better red teams help. Better evals help. But they do not change the asymmetry.
The expanding foam move
The expanding foam theory starts from a different instinct. Instead of only asking “how do we find hidden routes?” it asks: “what if the rare reachable space is already occupied by verified routes?”
More formally, construct a set of known, keyed routes:
through rare reachable regions of the model’s representation space. Each route carries verified meaning, provenance, and access conditions. Around each route is a neighborhood
The coverage question is not raw geometric volume alone. In high-dimensional model space, raw volume is usually the wrong intuition. The relevant object is coverage under a reachable rare-space measure: the proportion of constructibly reachable rare regions that fall within known route neighborhoods. Call that coverage:
where
In simple words: if enough of the empty territory is already occupied, it becomes harder to hide a new path through it.
That is the foam.
This is not a hidden backdoor proposal
This distinction matters. A backdoor hides behavior. The foam does the opposite. It makes rare-space routes known, keyed, auditable, and attributable.
Each route has provenance. Each route belongs to a known key-holder. Each route carries verified knowledge or structure. Each route is meant to be inspected by the system that placed it there.
This is not a proposal to smuggle behavior into a model. It is a proposal to occupy rare reachable representation space with verified, keyed, auditable routes so that covert routes become harder to hide and easier to disturb.
Constraint versus occupation
Most AI safety mechanisms are forms of constraint. They say: do not answer that. Do not produce that. Do not go there. Do not cross this boundary.
Constraint is necessary. But constraint has a weakness: the forbidden room still exists. A jailbreak is often just a way of reaching it through a different corridor.
Occupation works differently. Occupation says: this region is not empty. It is already structured. It already contains verified routes, provenance, keys, and expected interference patterns.
Constraint forbids a region. Occupation makes the region legible. That is the shift.
The multilingual fold
Here is where the geometry becomes personal.
I speak nine languages. I have spent my life watching the same concept occupy different rooms depending on the language I am thinking in. Trust in English is a handshake. اعتماد in Urdu is a relationship. الثقة in Arabic is earned through witnessed action. Same concept. Different geometry.
Multilingual models show a related structure. Representation spaces contain language-agnostic components and language-specific components. Across layers, models often move from more shared semantic structure toward more language-specific expression. Languages overlap, but they do not collapse into a single flat room.
So the rare space is not just large. It is folded. A multilingual model does not have one floor. It has many partially overlapping floors, with corridors between them. The foam must account for that.
Let
This is not just translation. It is navigation. If you only inspect English-conditioned routes, you are checking one floor of a building with many floors.
The foam is keyed
The foam is not anonymous. A route without provenance is just another mystery in the wall. A defensive route must be keyed.
A key is not merely a password. In this framework, the key is the relational context that makes the route meaningful and authorized. It determines who placed the route, what it carries, when it should activate, what it is allowed to influence, and what counts as misuse.
This gives the foam three properties.
Auditability. Every route has a provenance trail. Who placed it, when, carrying what, under which authority.
Non-interference by default. A keyed route is present but not generally active. It is not meant to alter ordinary behavior unless the correct relational and contextual conditions are met.
Model-agnostic ownership. The model is the building. The foam is the occupant. A model may be open-source, closed-source, domestic, foreign, commercial, or local. The route owner, key structure, and provenance layer remain distinct from the model’s original builder.
This is where the theory connects to the Verstehen Impossibility Theorem. The theorem states that meaning cannot be cold-extracted from relational structure without the relational context that constitutes it. Applied here, the key supplies that relational context. Without it, the meaning of a route is structurally underdetermined.
This should not be confused with ordinary cryptography. The implementation still requires access control, provenance engineering, and operational security. But the semantic-security intuition is stronger than “hard to decode.” Without the relation, there is no single meaning to decode.
The untrusted-model question
This reframes a question that is often discussed politically: can we trust models built by someone else?
The better question is architectural: can we occupy and govern the reachable space inside the models we use?
A third-party model may be Chinese, American, open-source, closed-source, vendor-hosted, locally deployed, or internally fine-tuned. The geopolitical label matters less than the structural fact: if the model contains reachable rare space that you have not mapped, governed, or occupied, then you are trusting an empty building.
If you can lay your own verified, keyed, auditable routes through that space, the security posture changes. Not because all risk disappears. Because control moves from political trust to structural occupation.
What is implemented and what remains open
This theory comes from working system behavior, not only metaphor. The route/key/occupation primitive is already implemented inside Distilligent models. The same geometry underlies the Semantic Vault: a system for placing, recovering, and protecting proprietary meaning through keyed semantic routes.
What is implemented:
- visitation mapping over model hidden-state behavior
- route construction through rare reachable representation regions
- keyed semantic routes
- route fidelity checks
- provenance-bearing route ownership
- multilingual navigation primitives
- model-side integration in Distilligent systems
What remains open is the full safety theorem. Specifically:
The interference conjecture. At what coverage density do verified routes reliably disturb, reveal, or constrain adversarial routes through the same rare space?
The scaling question. Can sufficient coverage be achieved in frontier-scale models with tens or hundreds of billions of parameters?
The stability question. How stable are routes under model updates, fine-tuning, quantization, and deployment drift?
These are not weaknesses to hide. They are the boundary of the current work. The primitive exists. The full density theorem is forthcoming.
The connection to prior work
The foam theory sits inside an active conversation. Backdoor research shows that models can carry hidden triggered behavior. Interpretability research gives us tools for reading internal structure. Superposition research shows that features are packed into high-dimensional space. Sparse autoencoders help isolate features and reason about occupied directions. Pruning research demonstrates that networks contain substantial redundancy. Multilingual representation work shows that language-specific and language-agnostic spaces coexist. Steganographic and multi-agent deception work shows that hidden channels are not hypothetical. Proactive defensive backdoor work has already explored the idea of occupying attack surfaces before an adversary does.
The foam theory does not deny that lineage. It sharpens one move inside it: defensive occupation should be representation-geometric, keyed, auditable, multilingual, and model-agnostic.
The builders’ solution
The current safety posture spends enormous effort looking for what might be hidden inside models. We need that. We need microscopes. We need red teams. We need evals. We need interpretability.
But the microscope and the foam solve different problems. The microscope asks: “what is already hiding here?” The foam asks: “why is this space still empty enough for something to hide?”
That is the builder’s move. You do not only search the building. You move in. You fill reachable rare space with verified, keyed, auditable routes. You make the room legible. You make covert occupation harder. You turn empty territory into governed territory.
The expanding foam theory of AI safety is not detection alone. It is pre-emption through occupation.
One morning at a time. Usually with yogurt.
Further reading
The formal arguments and surrounding literature behind this essay sit across model internals, backdoors, representation geometry, multilingual space, and AI governance.
Superposition: Elhage, N., et al. (2022), “Toy Models of Superposition,” Anthropic. Foundational work on how neural networks pack sparse features into high-dimensional space.
Monosemanticity: Bricken, T., et al. (2023), “Towards Monosemanticity: Decomposing Language Models With Dictionary Learning,” Anthropic. Sparse-autoencoder work showing how internal features can be isolated and studied.
Backdoors (foundational): Gu, T., Dolan-Gavitt, B., & Garg, S. (2017), “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain.” The foundational backdoor paper.
Backdoors (survey): Li, Y., et al. (2022), “Backdoor Learning: A Survey,” IEEE TNNLS. A broad survey of backdoor attacks and defenses.
Defensive occupation: Wei, Z., et al. (2024), “Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor,” NeurIPS. The closest existing neighbor to the foam intuition: pre-emptive defensive occupation of attack surfaces.
Multilingual geometry: Chang, T. A., et al. (2022), “The Geometry of Multilingual Language Model Representations,” EMNLP.
Latent language: Wendler, C., et al. (2024), “Do Llamas Work in English? On the Latent Language of Multilingual Transformers,” ACL. Tracks how multilingual transformers process non-English input internally.
Language-agnostic vs specific: Zhao, et al. (2024), “LENS: Rethinking Multilingual Enhancement for Large Language Models.” Decomposes multilingual representation into language-agnostic and language-specific components.
Hidden channels: Motwani, S. R., et al. (2024), “Secret Collusion among AI Agents: Multi-Agent Deception via Steganography,” NeurIPS. Relevant to hidden channels and covert coordination between agents.
Policy context: Amodei, D. (2026), “Policy on the AI Exponential.” Why architectural safety cannot wait for regulation alone.
Distilligent’s foundations: Masud, I. (2026), “The Verstehen Impossibility Theorem,” Zenodo. DOI: 10.5281/zenodo.19820497
Masud, I. (2025), “Trust Architecture as Cognitive Topology Modification in Large Language Models,” Zenodo. DOI: 10.5281/zenodo.17050537
Masud, I. (2026), “Alignment Through Relationship: A Topological Framework for Relational Stability in Large Language Models,” Zenodo. DOI: 10.5281/zenodo.18488048