Every Jido agent runs in its own BEAM process. If it crashes, OTP restarts it. If one agent fails, no other agent is affected. This is not a feature you configure. It is how the runtime works.
At a glance
| Item | Summary |
|---|---|
| Best for | Teams building production agent systems that need uptime guarantees |
| Core packages | jido, jido_action, jido_signal |
| Package status | jido (Beta), jido_action (Beta), jido_signal (Beta) |
| First proof path | Start agents under a supervisor → crash one → watch it recover |
| Key idea | Process isolation + OTP supervision = self-healing agents with no custom recovery code |
Supervise your agents
children = [
{Jido.AgentServer,
id: :support_agent,
agent: MyApp.SupportAgent,
name: :support_agent},
{Jido.AgentServer,
id: :billing_agent,
agent: MyApp.BillingAgent,
name: :billing_agent}
]
{:ok, _pid} = Supervisor.start_link(children, strategy: :one_for_one)
Two agents, each in its own process, both watched by a supervisor. If :support_agent crashes, OTP restarts it. :billing_agent keeps running, unaware anything happened.
Process isolation per agent
Each Jido.AgentServer is a separate BEAM process with its own:
- Memory. No shared heap. A memory issue in one agent cannot corrupt another.
- Mailbox. Messages are queued per process. A slow agent does not block a fast one.
- Failure boundary. An unhandled error crashes one process, not the system.
This is the same isolation model that lets telecom systems run for years without downtime. Jido inherits it directly from the BEAM.
Automatic crash recovery
OTP supervision handles recovery without custom code:
| What happens | What OTP does |
|---|---|
| Agent process crashes | Supervisor restarts it with a fresh state |
| Agent crashes repeatedly | Supervisor applies restart intensity limits and can escalate |
| External dependency times out | Agent process stays isolated; other agents are unaffected |
| Supervisor itself crashes | Its parent supervisor restarts the entire subtree |
You define the supervision strategy (:one_for_one, :one_for_all, :rest_for_one). OTP enforces it.
Failure isolation in practice
Agent systems without process isolation share failure modes:
| Without isolation | With BEAM process isolation |
|---|---|
| One agent’s exception crashes the entire pipeline | Exception is contained to one process |
| Memory leak in one agent degrades all agents | Each process has its own heap |
| Slow LLM response blocks other agents | Each process has its own mailbox and execution |
| Recovery requires custom retry/restart logic | OTP supervisor restarts automatically |
Jido’s deterministic cmd/2 model reinforces this: agent state transitions are pure. If a process crashes after cmd/2 but before directive execution, the agent state is still consistent. The supervisor restarts the process, and work continues.
What to explore next
- Agent foundations:How Jido agents work
- Coordination:Agents that work together
- Observability:Observe everything
- Adoption path:Start small, grow safely
- Reference docs:Architecture
Get Building
Start two agents under a supervisor. Crash one and verify the other is unaffected. Then read Observe everything to add telemetry to your agent lifecycle.