Skip to content

Parallel & Async Execution

HBIA supports three independently configurable execution modes: serial, parallel, and async. These can be combined for different performance characteristics.

Execution Stages

The foundation of parallelism is the execution stage. GraphTopology.execution_stages() analyzes the DAG and groups vertices into stages where all vertices in a stage have no mutual dependencies:

Diamond DAG: A → (B, C) → D

Stage 1: [A]        → must execute first (no dependencies)
Stage 2: [B, C]     → can execute in parallel (independent)
Stage 3: [D]        → depends on both B and C

You don't write parallel code. You declare data dependencies in YAML, and HBIA discovers parallelism automatically.

Serial Mode (Default)

result = run_flow(
    graph,
    handlers={...},
    initial_data={...},
    # Default: serial execution
)

Vertices execute one at a time, stage by stage. This is:

  • Safest — no concurrency issues.
  • Easiest to debug — execution order is deterministic and predictable.
  • Recommended for development and testing.

Parallel Mode

result = run_flow(
    graph,
    handlers={...},
    initial_data={...},
    parallel_enabled=True,
    max_workers=4,
)

Vertices in the same execution stage run concurrently using concurrent.futures.ThreadPoolExecutor:

  • Stages still execute sequentially (Stage 1 finishes before Stage 2 starts).
  • Within each stage, all vertices run simultaneously.
  • max_workers controls the thread pool size (default: 4).

When to Use

  • CPU-bound work with independent vertices.
  • I/O-bound work where threads release the GIL (network calls, file I/O).
  • DAGs with wide fan-out patterns (many parallel branches).

Thread Safety

Your handler functions must be thread-safe when parallel mode is enabled. The DataStore and StateStore are designed for concurrent access, but your handlers should not share mutable global state.

Async Mode

result = await run_flow_async(
    graph,
    handlers={
        "fetch": async_fetch_fn,      # async def
        "process": async_process_fn,  # async def
    },
    initial_data={...},
    async_enabled=True,
)

Async mode uses asyncio to execute async def handlers:

  • Handlers are await-ed natively.
  • Ideal for I/O-heavy workloads with many concurrent network requests.
  • Uses run_flow_async() instead of run_flow().

When to Use

  • I/O-bound workloads with many network calls (API requests, database queries).
  • When your existing codebase already uses async/await.

Combining Modes

The three flags are independent and can be combined:

Configuration Behavior
Serial only (default) One vertex at a time, synchronous
Cache only Serial, but skip re-executing unchanged pure vertices
Parallel only Same-stage vertices run in threads
Async only async def handlers are awaited, one at a time
Parallel + Cache Threads + skip cached results
Parallel + Async Maximum throughput: threads + coroutines
All three Threads, coroutines, and caching combined

Configuration

Per-Call

result = run_flow(
    graph, handlers={...}, initial_data={...},
    parallel_enabled=True,
    async_enabled=False,
    cache_enabled=True,
    max_workers=8,
)

Via Settings

from honey_badgeria.conf import Settings, configure

configure(Settings(
    PARALLEL_ENABLED=True,
    ASYNC_ENABLED=False,
    CACHE_ENABLED=True,
    MAX_WORKERS=8,
))

Via Environment Variables

export HBIA_PARALLEL_ENABLED=true
export HBIA_ASYNC_ENABLED=false
export HBIA_CACHE_ENABLED=true
export HBIA_MAX_WORKERS=8

Priority

Settings are resolved in this order (highest priority first):

  1. Per-call arguments
  2. Explicit Settings object
  3. Environment variables (HBIA_*)
  4. Built-in defaults

Atomicity and Parallelism

Atomic groups override parallel settings. When a group declares no_parallel: true (the default), vertices in that group always execute serially, even if parallel_enabled=True globally.

This is intentional — atomic operations (database transactions, payment processing) typically need serial execution for correctness:

atomic_groups:
  payment:
    vertices: [reserve, charge, confirm]
    on_failure: rollback
    no_parallel: true    # Serial within this group
    no_cache: true       # No caching within this group

Vertices outside the atomic group still run in parallel if parallel_enabled=True.