Skip to main content

Overview

Checkpointing lets you persist workflow state over time so you can:
  • Recover from failures
  • Resume long-running agents
  • Support human-in-the-loop workflows
  • Build time-travel debugging tools
A checkpoint is a structured object containing a run_id, step_name, timestamp, and a serializable snapshot of state.

Checkpoint stores

Checkpoint storage is abstracted by the CheckpointStore protocol. Coevolved includes an in-memory store for development:
  • MemoryCheckpointStore: convenient, but not durable
In production, implement your own store (Redis, Postgres, S3, etc.).

Checkpoint policies

CheckpointPolicy controls when to checkpoint:
  • Before a step/iteration
  • After a step/iteration
  • On error
  • On interrupt
Prebuilt loops (like agent_loop) accept a checkpoint store + policy and apply it around each iteration.

Resume workflows

Checkpointing is most useful when paired with:
  • A stable run_id you can use to query history
  • A resume mechanism (for interrupts or retries)
Coevolved’s checkpointing primitives are intentionally storage-agnostic. You decide how to persist, index, and load state.

Next steps