Context Images: Docker for Conversations

A sequel to WadTown Manifesto: The Miracle of Queryable Global Context.

At MLSys 2026, Charles Packer, Lianmin Zheng, Andrej Karpathy, and Solomon Hykes slowly realize they are all describing the same missing abstraction from different angles. Across fragmented hallway conversations and lossy retellings, the conference itself begins behaving like a distributed inference system. KV caches become compiled artifacts, agents become resumable runtimes, and transcripts become logs instead of memory. By the end, nobody remembers who first said wha, only that context images are Docker for conversations.

ChatGPT Prompt

SCENE 1 — COFFEE LINE

(Morning. Packer and Zheng waiting for espresso.)

Packer

The problem is that everyone treats conversations as transcripts instead of managed memory.

Zheng

The expensive part is recomputing KV state every request.

Packer

Right, because context windows are functioning like RAM.

Zheng

No, context windows are just the visible abstraction over attention locality.

Packer

…that’s basically memory management.

Zheng

It’s cache scheduling.

(Barista calls a name.)

Barista

“Charles?”

Packer

That’s me.

(He grabs coffee.)

Packer

Anyway, agents need virtual memory.

Zheng

We already built paging for KV blocks.

(They stare at each other briefly.)

Both

“Huh.”

(They leave in opposite directions.)

Blackout.


SCENE 2 — HALLWAY OUTSIDE “LLM SERVING AT SCALE”

(Late morning. Zheng runs into Karpathy.)

Karpathy

How’s the inference world?

Zheng

Everyone wants million-token contexts because they keep replaying conversations.

Karpathy

That feels architecturally wrong.

Zheng

It is. We should persist KV state directly.

Karpathy

So conversations are basically serialized process reconstruction.

Zheng

More or less.

Karpathy

Wait.

(He stops walking.)

Karpathy

Are we rebuilding working memory from logs every API call?

Zheng

Yes.

Karpathy

That’s horrifying.

Zheng

It’s stateless infrastructure.

Karpathy

No no no. That means chat history is just stdout.

(ZHENG laughs despite himself.)

Zheng

You sound like an operating systems person.

Karpathy

I think the industry accidentally became an operating systems field sometime last year.

(Beat.)

Zheng

Someone earlier told me agents need virtual memory.

Karpathy

That’s not even metaphorical anymore, is it?

(Conference volunteer interrupts.)

Volunteer

Panel starts in two minutes.

(They leave.)

Blackout.


SCENE 3 — SPEAKER DINNER

(Long table. Loud restaurant. Karpathy sitting beside Hykes.)

Hykes

So what are people obsessing over this year?

Karpathy

Context engineering.

Hykes

What does that mean?

Karpathy

Nobody knows.

(Beat.)

Karpathy

Officially it means prompts and retrieval.

Hykes

And unofficially?

Karpathy

Persistent cognitive runtimes with memory hierarchies and resumable inference state.

Hykes

Hykes

That just sounds like containers.

Karpathy

No, because these are conversations.

Hykes

Containers are also conversations. We just call them processes.

Karpathy

No, but these have memory and branching and checkpointing and resumability—

Hykes

So containers.

Karpathy

No, because they’re stochastic.

Hykes

So distributed systems.

(Long pause.)

Hykes

Wait. Are people replaying the entire conversation every request?

Karpathy

Yes.

Hykes

Why?

Karpathy

Because the APIs are stateless.

Hykes

That’s insane.

Karpathy

THANK YOU.

(Nearby attendees turn and stare.)

Hykes

Why don’t you snapshot runtime state?

Karpathy

Because the KV cache is architecture-specific and tied to tokenizer alignment and attention layout.

Hykes

Docker images are architecture-specific too.

(Silence.)

Karpathy

Hykes

Karpathy

Oh no.

Blackout.


SCENE 4 — HOTEL LOBBY, MIDNIGHT

(Packer typing furiously on laptop. Hykes enters carrying tea.)

Hykes

You’re the memory hierarchy guy.

Packer

That’s not what I said.

Hykes

Close enough.

Packer

I said agents need persistent working memory instead of replaying transcripts.

Hykes

Right. So why don’t you compile contexts into runnable artifacts?

Packer

Because contexts aren’t portable.

Hykes

Neither are containers.

Packer

Hykes

You rebuild from source when compatibility breaks.

Packer

Hykes

You HAVE source representations, right?

Packer

Transcripts. Retrieval indexes. Tool bindings. Memory stores.

Hykes

That’s a build context.

Packer

No, because—

(He stops.)

Hykes

You okay?

Packer

I think conversations are build logs.

(Long silence.)

Hykes

That sounds either very profound or deeply unhealthy.

Packer

Oh my god.

(He opens laptop again.)

Packer

Okay. Wait.

(Typing rapidly.)

Packer

Transcript equals source.

Hykes

Sure.

Packer

KV cache equals compiled artifact.

Hykes

Yep.

Packer

Running agent equals container.

Hykes

Obviously.

Packer

Context image.

Hykes

There it is.

Blackout.


SCENE 5 — AIRPORT SHUTTLE, FINAL MORNING

(All four accidentally end up together.)

Karpathy

Okay, apparently we’ve all been having the same conversation independently.

Zheng

Approximately the same conversation.

Packer

Lossily compressed.

Hykes

With poor cache locality.

(They nod.)

Karpathy

Let me see if I understand this.

(Counting on fingers.)

Karpathy

Conversations are not the runtime.

Zheng

Correct.

Karpathy

They are reconstruction artifacts for the runtime.

Packer

Yes.

Karpathy

The runtime is attention state plus tools plus working memory plus retrieval context.

Zheng

And KV locality.

Karpathy

Sure.

Karpathy

And the transcript is basically stdout.

Hykes

Exactly.

Packer

Which means agents are resumable cognitive containers.

Zheng

Backed by paged KV memory.

Karpathy

Compiled from conversational source code.

Hykes

Now you’re getting it.

(Silence.)

Shuttle Driver

You guys with the conference?

All Four

Yeah.

Shuttle Driver

What’s it about?

(Long pause.)

Karpathy

We think chatbots accidentally became operating systems.

(Driver nods like this explains nothing.)

Packer

No, no. Conversations are deployable runtime artifacts.

Zheng

No. Contexts are executable cache topologies.

Hykes

No. You reinvented Docker.

(Beat.)

Karpathy

Human conversation is just low-bandwidth distributed inference.

(Everyone goes quiet.)

Shuttle Driver

Shuttle Driver

So… computers?

Blackout.


Appendix I: Why These Four?

How did their past shape them for this epiphany?

Charles Packer arrives through memory.

His work on MemGPT frames LLMs as operating-system-like entities constrained by finite context windows, and proposes “virtual context management” inspired by hierarchical memory and paging. So he is primed to see the chat transcript not as the thing itself, but as an awkward mechanism for reconstructing a larger memory illusion.

Lianmin Zheng arrives through runtime pressure.

vLLM and PagedAttention treat the KV cache as the scarce, dynamic resource that makes LLM serving hard, borrowing virtual-memory ideas to manage KV blocks efficiently and share them across requests. So he is primed to see the “conversation” as less important than the underlying cache topology it induces.

Andrej Karpathy arrives through abstraction collapse.

His “Software 2.0” framing helped popularize the idea that neural networks are not just applications, but a new kind of software substrate; more recently, he has emphasized LLMs as a new computing platform. So he is primed to recognize when AI practice stops being prompt craft and starts becoming systems architecture.

Solomon Hykes arrives through artifact discipline.

Docker made the industry fluent in the distinction between source, image, running container, logs, volumes, registries, architecture tags, and rebuildability; Hykes is the person in the room most likely to hear “serialized, architecture-specific, resumable runtime environment” and ask why everyone is avoiding the word “image.”

Together, they form a clean four-part circuit: memory, cache, platform, container.

None of them needs to invent the whole idea alone; each only has to misunderstand the next person productively.


Appendix II: Why Docker?

Docker is the right analogy because it changed the unit of deployment.

Before Docker, people often talked about applications as if the source code was the thing. But in practice, the runnable thing was always larger. It included:

  • dependencies
  • filesystem layout
  • environment variables
  • startup commands
  • architecture assumptions
  • runtime permissions
  • mounted volumes
  • logs

Docker gave that whole messy bundle a name: an image.

Conversations have the same problem. A transcript looks like the thing, but the runnable thing is larger. It includes:

  • system prompt
  • tool schemas
  • retrieval bindings
  • memory state
  • tokenizer state
  • KV cache
  • model fingerprint
  • runtime layout
  • continuation point

A context image names that bundle.

The Analogy Works Because Portability Has Constraints

Docker images are not magically portable. They are portable within explicit constraints. They depend on:

  • architecture
  • OS assumptions
  • runtime compatibility
  • rebuildable source

Context images would have the same shape. They are tied to:

  • model
  • tokenizer
  • quantization
  • position encoding
  • attention layout
  • runtime implementation

But they remain rebuildable from:

  • transcript
  • memory graph
  • retrieval configuration
  • tool bindings
  • source documents

The Conceptual Mapping

Most importantly, Docker separated concepts that used to blur together:

  • source tree / Dockerfile → transcript, memory graph, retrieval config
  • image → serialized runnable context state
  • container → live agent or inference process
  • logs → chat transcript
  • volumes → external memory and tools
  • registry → shared context repository

The Real Insight

So “Docker for conversations” is not just a branding metaphor. It says the industry is using logs as runtimes, when it should be building, versioning, forking, running, and garbage-collecting context as infrastructure.


Appendix III: Context as Infrastructure

Once conversations become context images, context stops being prose and starts being infrastructure.

That means it needs infrastructure disciplines:

  • build
  • version
  • cache
  • fork
  • inspect
  • run
  • mount
  • garbage-collect
  • rebuild
  • audit

The Problem: Context as a Junk Drawer

Today, most AI systems treat context as a blob of text assembled at the last possible moment. The prompt becomes a junk drawer: instructions, examples, retrieved documents, user history, tool schemas, policies, scratchpad, and memory all crammed into one serialized message stream.

That works until the system becomes important.

That works until the system becomes important.

Then context needs the same maturity we expect from deployment artifacts. You need to know:

  • what source material produced this context
  • which model and tokenizer it was built for
  • which tools were mounted
  • which memory stores were included
  • which retrieval indexes were used
  • what changed since the last run
  • whether the image can be rebuilt
  • whether two images share layers
  • whether a branch can be resumed

The Shift: From Prompting to Context Operations

This is the shift from prompting to context operations.

A mature context system would treat the transcript as only one input among many. The runnable context might be assembled from:

  • system instructions
  • user preferences
  • project memory
  • relevant documents
  • tool definitions
  • retrieval indexes
  • prior decisions
  • active task state
  • KV cache layers
  • runtime metadata

The context image is the compiled artifact. The live agent is the running process. The transcript is the log.

The context image is the compiled artifact. The live agent is the running process. The transcript is the log.

That separation matters because it makes context governable. Teams could review context diffs, pin versions, reproduce failures, roll back bad memories, share known-good images, and isolate experimental branches.

Reusable Context Layers

It also makes context reusable. Instead of rebuilding the same expensive working set for every conversation, a system could maintain durable layers:

  • base assistant behavior
  • organization knowledge
  • project state
  • customer state
  • active incident state
  • personal working memory

Each layer could be rebuilt from source, cached when hot, invalidated when stale, and forked when exploration begins.

This is what “context as infrastructure” means: not better prompts, but better lifecycle management for the state that makes intelligence useful.


Appendix IV: The Context Image Spec v1.0

A Context Image is a rebuildable, runnable artifact for continuing an AI interaction from a known state.

  • It is not a chat transcript.
  • It is not a memory database.
  • It is not a model checkpoint.
  • It is not merely a prompt.

It is the compiled form of a context environment.

A Context Image contains enough information for a compatible runtime to resume, fork, inspect, or rebuild an active cognitive process.

1. Core Definition

A Context Image is composed of three layers:

  1. source
  2. compiled state
  3. runtime manifest

The source is the human-readable and rebuildable material from which the context was created.

The compiled state is the optimized runtime representation, such as KV cache blocks, prefix-cache layers, retrieval bindings, tool schemas, and active working memory.

The runtime manifest explains what the image is, how it was built, what it depends on, and where it may safely run.

The source is canonical.
The compiled state is disposable.
The manifest is the contract.

2. Required Manifest Fields

Every Context Image must declare:

  1. image name
  2. image version
  3. creation timestamp
  4. parent image, if any
  5. source hash
  6. model fingerprint
  7. tokenizer fingerprint
  8. runtime fingerprint
  9. context length
  10. position encoding configuration
  11. quantization or precision format
  12. KV cache layout
  13. tool schema version
  14. retrieval configuration
  15. memory mounts
  16. security policy
  17. rebuild instructions

A runtime may refuse to load a Context Image if any required compatibility field does not match.

This is not a failure of portability.
This is honest portability.

3. Source Bundle

The source bundle should contain the rebuildable ingredients of the image.

These may include:

  1. system instructions
  2. developer instructions
  3. user instructions
  4. chat transcript
  5. summarized history
  6. project memory
  7. user preferences
  8. source documents
  9. retrieval indexes
  10. tool definitions
  11. environment variables
  12. policy constraints
  13. previous decisions
  14. active task state

The source bundle should be inspectable, diffable, and versionable.

A Context Image without a source bundle is only a snapshot.
A Context Image with a source bundle is infrastructure.

4. Compiled State

The compiled state may contain runtime-specific artifacts.

These may include:

  1. serialized KV cache blocks
  2. prefix-cache layers
  3. attention-position metadata
  4. tokenized prompt segments
  5. embedding handles
  6. retrieval cache entries
  7. tool-call state
  8. working-memory slots
  9. branch lineage
  10. scheduler hints
  11. cache locality hints

Compiled state is allowed to be architecture-specific.

It may depend on:

  1. model weights
  2. tokenizer
  3. quantization
  4. runtime implementation
  5. attention layout
  6. GPU architecture
  7. CPU architecture
  8. cache allocator
  9. block size
  10. position encoding scheme

A Context Image runtime should treat compiled state as an optimization, not as the source of truth.

If compiled state is invalid, stale, corrupt, or incompatible, the runtime should attempt to rebuild it from source.

5. Layers

A Context Image may be layered.

Each layer represents reusable context.

Common layers include:

  1. base model behavior
  2. organization context
  3. team context
  4. project context
  5. customer context
  6. task context
  7. incident context
  8. personal working memory
  9. active continuation state

Layers should be immutable once published.

Mutable state should live in a writable top layer.

This enables:

  1. reuse
  2. branching
  3. cache sharing
  4. incremental rebuilds
  5. rollback
  6. provenance tracking
  7. garbage collection

A good Context Image system should avoid duplicating expensive lower layers when forking.

6. Lifecycle Operations

A Context Image runtime should support the following operations:

  1. build — Construct a Context Image from source material.
  2. run — Start a live inference process from an image.
  3. resume — Continue from a saved image state.
  4. fork — Create a branch from an existing image.
  5. commit — Save the current runtime state as a new image.
  6. inspect — Show manifest, lineage, source hashes, mounted tools, and memory dependencies.
  7. diff — Compare two images by source, manifest, memory, or transcript.
  8. rebuild — Regenerate compiled state from canonical source.
  9. evict — Remove compiled state while preserving source.
  10. gc — Remove unreachable layers and unused cache blocks.
  11. export — Package source and manifest for another environment.
  12. import — Load an image, validating compatibility before execution.

The minimal viable runtime supports:

  1. build
  2. run
  3. fork
  4. commit
  5. rebuild
  6. inspect

Everything else is polish.

7. Compatibility

A Context Image is compatible with a runtime only if the declared execution environment matches.

Compatibility should be checked against:

  1. model family
  2. exact model weights
  3. tokenizer
  4. vocabulary
  5. chat template
  6. context length
  7. RoPE or position encoding
  8. quantization
  9. attention implementation
  10. KV layout
  11. tool schema
  12. memory API
  13. retrieval API
  14. safety policy

The runtime must distinguish between:

  1. source-compatible
  2. rebuild-compatible
  3. binary-compatible
  4. runtime-compatible

For example:

  1. A transcript may be source-compatible across many models.
  2. A tokenized prompt may be rebuild-compatible only with the same tokenizer.
  3. A KV cache may be binary-compatible only with the same model and runtime.
  4. A live continuation may be runtime-compatible only on the same local machine.

This distinction prevents false portability.

8. Runtime Identity

A running Context Image is not the same thing as the image itself.

The image is the artifact.
The running process is the instance.

A single image may produce many live instances.

Those instances may:

  1. diverge
  2. branch
  3. mutate working memory
  4. call different tools
  5. produce different transcripts
  6. commit different descendants

The transcript belongs to the instance.

The lineage belongs to the image.

The memory writes belong to the mounted volumes.

9. Logs and Transcripts

A transcript is a log.

It may be used for:

  1. audit
  2. replay
  3. debugging
  4. rebuilding
  5. human review
  6. summarization
  7. provenance

But the transcript is not the runtime.

The runtime includes state that may not appear directly in the transcript, including:

  1. cached attention state
  2. tool handles
  3. retrieval bindings
  4. active memory mounts
  5. hidden scheduler state
  6. unresolved continuations
  7. branch ancestry

A mature system should preserve transcripts, but should not confuse them with executable context.

10. Memory Mounts

Context Images may mount external memory.

Memory mounts may be:

  1. read-only
  2. read-write
  3. ephemeral
  4. persistent
  5. local
  6. remote
  7. user-scoped
  8. project-scoped
  9. organization-scoped

Examples include:

  1. vector databases
  2. document stores
  3. file systems
  4. knowledge graphs
  5. issue trackers
  6. code repositories
  7. user preference stores
  8. tool histories
  9. prior decision logs

A Context Image should record what memory was mounted, but should not necessarily copy all mounted memory into the image.

The image contains bindings.
The mount contains data.

11. Security and Audit

A Context Image may contain sensitive state.

It may encode information in:

  1. transcripts
  2. summaries
  3. retrieved documents
  4. memory handles
  5. tool results
  6. KV cache blocks
  7. embeddings
  8. latent continuation state

Therefore, a runtime should support:

  1. manifest inspection
  2. source inspection
  3. redaction
  4. access control
  5. signature verification
  6. provenance tracking
  7. policy validation
  8. encrypted storage
  9. safe export modes
  10. compiled-state eviction

A Context Image registry should not accept opaque runtime blobs without source, provenance, or compatibility metadata.

Opaque snapshots are convenient.
Auditable images are infrastructure.

Opaque snapshots are convenient.
Auditable images are infrastructure.

12. Rebuild Semantics

Every serious Context Image should answer one question:

Can this image be rebuilt from source?

A rebuildable image should declare:

  1. source files
  2. source hashes
  3. build order
  4. model dependency
  5. tokenizer dependency
  6. retrieval dependency
  7. memory dependency
  8. tool dependency
  9. build parameters
  10. deterministic settings, where available

Rebuilding may not reproduce stochastic outputs exactly.

But it should reproduce the runnable context environment closely enough to continue, inspect, debug, or validate the process.

The goal is not perfect determinism.

The goal is operational trust.

13. Branching

Forking is a first-class operation.

A forked Context Image should preserve:

  1. parent image reference
  2. fork timestamp
  3. inherited layers
  4. modified top layer
  5. transcript divergence point
  6. memory write policy
  7. compatibility metadata

Branches should be cheap when lower layers are shared.

The runtime should support copy-on-write behavior for:

  1. KV cache blocks
  2. working memory
  3. retrieved context
  4. tool state
  5. transcript logs

Merging branches is not required in v1.0.

Summarizing branches is allowed.
Diffing branches is encouraged.
Pretending semantic merge is solved is forbidden.

14. Example Manifest

schema: context-image/v1.0
name: quilt/customer-review
version: 2026.05.22
created_at: 2026-05-22T09:30:00-07:00
parent: quilt/base-engineering:2026.05
source:
transcript: transcript.jsonl
memory_graph: memory.yaml
retrieval_config: retrieval.yaml
tools: tools.yaml
source_hash: sha256:...
model:
name: llama-3.1-8b-instruct
weights_hash: sha256:...
tokenizer_hash: sha256:...
chat_template_hash: sha256:...
context_length: 131072
position_encoding: rope
runtime:
engine: llama.cpp
engine_version: ...
kv_layout: contiguous-v1
quantization: q4_k_m
compatible_arch:
- arm64
- x86_64
compiled_state:
kv_cache: kv.bin
token_prefix: tokens.bin
prefix_layers:
- base-assistant
- quilt-engineering
- customer-review
mounts:
memory:
- name: project-memory
mode: read-write
retrieval:
- name: docs-index
mode: read-only
tools:
- name: file-search
- name: shell
policy:
exportable: false
allow_compiled_state_export: false
allow_source_export: true
redact_on_export:
- secrets
- credentials
- private_documents
rebuild:
command: ctx build .
deterministic: partial

15. Example CLI

ctx build .
ctx inspect quilt/customer-review
ctx run quilt/customer-review
ctx fork quilt/customer-review quilt/customer-review-alt
ctx commit quilt/customer-review-alt:v2
ctx diff quilt/customer-review quilt/customer-review-alt
ctx evict quilt/customer-review --compiled-state
ctx rebuild quilt/customer-review
ctx gc

The CLI should make the distinction obvious:

  1. ctx build creates an image
  2. ctx run creates an instance
  3. ctx commit saves an instance as a new image
  4. ctx evict deletes acceleration state
  5. ctx rebuild reconstructs acceleration state from source

16. Non-Goals for v1.0

Context Images v1.0 does not attempt to solve:

  1. universal portability
  2. semantic branch merging
  3. cross-model KV translation
  4. deterministic replay of stochastic generations
  5. safe sharing of arbitrary opaque caches
  6. replacing transcripts
  7. replacing memory systems
  8. replacing model checkpoints
  9. replacing agent frameworks

The goal is narrower:

Define the missing artifact between transcript and runtime.

17. The One-Sentence Spec

A Context Image is a rebuildable, architecture-aware, runnable artifact that packages the state needed to continue an AI interaction, while preserving a clean separation between source, compiled context, live instance, and transcript log.

Leave a comment

Blog at WordPress.com.

Up ↑