Parallel & Async Execution¶
HBIA supports three independently configurable execution modes: serial, parallel, and async. These can be combined for different performance characteristics.
Execution Stages¶
The foundation of parallelism is the execution stage. GraphTopology.execution_stages() analyzes the DAG and groups vertices into stages where all vertices in a stage have no mutual dependencies:
Diamond DAG: A → (B, C) → D
Stage 1: [A] → must execute first (no dependencies)
Stage 2: [B, C] → can execute in parallel (independent)
Stage 3: [D] → depends on both B and C
You don't write parallel code. You declare data dependencies in YAML, and HBIA discovers parallelism automatically.
Serial Mode (Default)¶
Vertices execute one at a time, stage by stage. This is:
- Safest — no concurrency issues.
- Easiest to debug — execution order is deterministic and predictable.
- Recommended for development and testing.
Parallel Mode¶
result = run_flow(
graph,
handlers={...},
initial_data={...},
parallel_enabled=True,
max_workers=4,
)
Vertices in the same execution stage run concurrently using concurrent.futures.ThreadPoolExecutor:
- Stages still execute sequentially (Stage 1 finishes before Stage 2 starts).
- Within each stage, all vertices run simultaneously.
max_workerscontrols the thread pool size (default: 4).
When to Use¶
- CPU-bound work with independent vertices.
- I/O-bound work where threads release the GIL (network calls, file I/O).
- DAGs with wide fan-out patterns (many parallel branches).
Thread Safety¶
Your handler functions must be thread-safe when parallel mode is enabled. The DataStore and StateStore are designed for concurrent access, but your handlers should not share mutable global state.
Async Mode¶
result = await run_flow_async(
graph,
handlers={
"fetch": async_fetch_fn, # async def
"process": async_process_fn, # async def
},
initial_data={...},
async_enabled=True,
)
Async mode uses asyncio to execute async def handlers:
- Handlers are
await-ed natively. - Ideal for I/O-heavy workloads with many concurrent network requests.
- Uses
run_flow_async()instead ofrun_flow().
When to Use¶
- I/O-bound workloads with many network calls (API requests, database queries).
- When your existing codebase already uses
async/await.
Combining Modes¶
The three flags are independent and can be combined:
| Configuration | Behavior |
|---|---|
| Serial only (default) | One vertex at a time, synchronous |
| Cache only | Serial, but skip re-executing unchanged pure vertices |
| Parallel only | Same-stage vertices run in threads |
| Async only | async def handlers are awaited, one at a time |
| Parallel + Cache | Threads + skip cached results |
| Parallel + Async | Maximum throughput: threads + coroutines |
| All three | Threads, coroutines, and caching combined |
Configuration¶
Per-Call¶
result = run_flow(
graph, handlers={...}, initial_data={...},
parallel_enabled=True,
async_enabled=False,
cache_enabled=True,
max_workers=8,
)
Via Settings¶
from honey_badgeria.conf import Settings, configure
configure(Settings(
PARALLEL_ENABLED=True,
ASYNC_ENABLED=False,
CACHE_ENABLED=True,
MAX_WORKERS=8,
))
Via Environment Variables¶
export HBIA_PARALLEL_ENABLED=true
export HBIA_ASYNC_ENABLED=false
export HBIA_CACHE_ENABLED=true
export HBIA_MAX_WORKERS=8
Priority¶
Settings are resolved in this order (highest priority first):
- Per-call arguments
- Explicit Settings object
- Environment variables (
HBIA_*) - Built-in defaults
Atomicity and Parallelism¶
Atomic groups override parallel settings. When a group declares no_parallel: true (the default), vertices in that group always execute serially, even if parallel_enabled=True globally.
This is intentional — atomic operations (database transactions, payment processing) typically need serial execution for correctness:
atomic_groups:
payment:
vertices: [reserve, charge, confirm]
on_failure: rollback
no_parallel: true # Serial within this group
no_cache: true # No caching within this group
Vertices outside the atomic group still run in parallel if parallel_enabled=True.