Parallel & Async Execution¶

HBIA supports three independently configurable execution modes: serial, parallel, and async. These can be combined for different performance characteristics.

Execution Stages¶

The foundation of parallelism is the execution stage. GraphTopology.execution_stages() analyzes the DAG and groups vertices into stages where all vertices in a stage have no mutual dependencies:

Diamond DAG: A → (B, C) → D

Stage 1: [A]        → must execute first (no dependencies)
Stage 2: [B, C]     → can execute in parallel (independent)
Stage 3: [D]        → depends on both B and C

You don't write parallel code. You declare data dependencies in YAML, and HBIA discovers parallelism automatically.

Serial Mode (Default)¶

result = run_flow(
    graph,
    handlers={...},
    initial_data={...},
    # Default: serial execution
)

Vertices execute one at a time, stage by stage. This is:

Safest — no concurrency issues.
Easiest to debug — execution order is deterministic and predictable.
Recommended for development and testing.

Parallel Mode¶

result = run_flow(
    graph,
    handlers={...},
    initial_data={...},
    parallel_enabled=True,
    max_workers=4,
)

Vertices in the same execution stage run concurrently using concurrent.futures.ThreadPoolExecutor:

Stages still execute sequentially (Stage 1 finishes before Stage 2 starts).
Within each stage, all vertices run simultaneously.
max_workers controls the thread pool size (default: 4).

When to Use¶

CPU-bound work with independent vertices.
I/O-bound work where threads release the GIL (network calls, file I/O).
DAGs with wide fan-out patterns (many parallel branches).

Thread Safety¶

Your handler functions must be thread-safe when parallel mode is enabled. The DataStore and StateStore are designed for concurrent access, but your handlers should not share mutable global state.

Async Mode¶

result = await run_flow_async(
    graph,
    handlers={
        "fetch": async_fetch_fn,      # async def
        "process": async_process_fn,  # async def
    },
    initial_data={...},
    async_enabled=True,
)

Async mode uses asyncio to execute async def handlers:

Handlers are await-ed natively.
Ideal for I/O-heavy workloads with many concurrent network requests.
Uses run_flow_async() instead of run_flow().

When to Use¶

I/O-bound workloads with many network calls (API requests, database queries).
When your existing codebase already uses async/await.

Combining Modes¶

The three flags are independent and can be combined:

Configuration	Behavior
Serial only (default)	One vertex at a time, synchronous
Cache only	Serial, but skip re-executing unchanged pure vertices
Parallel only	Same-stage vertices run in threads
Async only	`async def` handlers are awaited, one at a time
Parallel + Cache	Threads + skip cached results
Parallel + Async	Maximum throughput: threads + coroutines
All three	Threads, coroutines, and caching combined

Configuration¶

Per-Call¶

result = run_flow(
    graph, handlers={...}, initial_data={...},
    parallel_enabled=True,
    async_enabled=False,
    cache_enabled=True,
    max_workers=8,
)

Via Settings¶

from honey_badgeria.conf import Settings, configure

configure(Settings(
    PARALLEL_ENABLED=True,
    ASYNC_ENABLED=False,
    CACHE_ENABLED=True,
    MAX_WORKERS=8,
))

Via Environment Variables¶

export HBIA_PARALLEL_ENABLED=true
export HBIA_ASYNC_ENABLED=false
export HBIA_CACHE_ENABLED=true
export HBIA_MAX_WORKERS=8

Priority¶

Settings are resolved in this order (highest priority first):

Per-call arguments
Explicit Settings object
Environment variables (HBIA_*)
Built-in defaults

Atomicity and Parallelism¶

Atomic groups override parallel settings. When a group declares no_parallel: true (the default), vertices in that group always execute serially, even if parallel_enabled=True globally.

This is intentional — atomic operations (database transactions, payment processing) typically need serial execution for correctness:

atomic_groups:
  payment:
    vertices: [reserve, charge, confirm]
    on_failure: rollback
    no_parallel: true    # Serial within this group
    no_cache: true       # No caching within this group

Vertices outside the atomic group still run in parallel if parallel_enabled=True.