Architectural Brief: Inventory Allocation Simulator

Inventory planning fails when yesterday's stock position is treated as today's truth. This system separates mutable planning data from frozen simulation evidence so a planner can ask why a transfer was recommended after the warehouse, SKU, or lane data has already changed.

System Topology

Architecture diagramScroll on small screens

Infrastructure Decisions

Compute: Docker Compose with a Julia application container. Chose this over a split Python optimizer service because Genie, JuMP, and the worker lifecycle can ship as one deployable unit without a cross-language boundary.
Data Layer: PostgreSQL 16 for tenant data, simulations, recommendations, decisions, local notifications, and outbox rows. Chose this over document storage because the domain depends on scoped relationships between warehouses, SKUs, lanes, policies, and audit records.
Analytics Layer: DuckDB for backtests. Chose this over loading every historical analysis into PostgreSQL because CSV-heavy replay work should not compete with the operational API tables.
Optimization: JuMP with HiGHS. Chose this over a black-box model because planners need binding constraints, service-level tail behavior, solver status, and net-value math they can inspect.
Frontend: Server-rendered Genie views with HTMX. Chose this over a separate SPA because the console is form-heavy and operational, not a consumer app.
Jobs: Redis plus persisted job and run tables. Chose this over in-memory tasks only because imports, outbox dispatch, and simulations need restart-safe status.

Constraints That Shaped the Design

Input: The system accepts CSV and API data for warehouses, SKUs, inventory positions, demand history, transfer lanes, and allocation policies.
Output: The system produces simulation runs, stored demand scenarios, allocation recommendations, decision audit rows, CSV exports, local notifications, and optional ecosystem outbox events.
Scale Handled: The benchmark fixture covers 50 warehouses, two thousand SKUs, and 100 demand scenarios. The live Batch 053 run completed in 17,928.4753 ms against a 600,000 ms target and produced two thousand recommendations.
Hard Constraints: scenario_count is capped at 100 in src/planning/simulations_lifecycle.jl even though configuration parsing accepts higher defaults. The cap prevents one API request from creating runaway worker and database load.
Solver Limits: SOLVER_TIMEOUT_SECONDS=120 and MAX_SOLVER_GAP=0.05 come from .env.example. Time-limit incumbents are accepted only when the gap is at or below the configured ceiling.
Audit Boundary: capture_simulation_input_snapshot reads up to SNAPSHOT_MAX_ROWS = 1_000_000 for each planning surface and stores the snapshot before worker processing begins.
Integration Boundary: Delivery Gateway, Notification Hub, and Workflow Engine are disabled by default. The simulator keeps CSV import, simulation, review, notification, and export paths alive without them.

Decision Log

Decision	Alternative Rejected	Why
Frozen `input_snapshot` on simulation creation	Reread live tables during worker processing	Completed runs must remain explainable after inventory, demand, lanes, or policy settings change.
Stockout-adjusted demand cleaning	Treat observed sales as demand	A zero-sales period during a stockout is unavailable inventory, not proof of low demand.
Shared `recommendation_net_value`	Recompute net value separately in API, UI, CSV, and notifications	One calculation path prevents a planner seeing one value in the console and another in the export.
Local notifications plus optional outbox	Direct external notifications only	Adapter failure cannot change recommendation truth or make the standalone console unusable.
PostgreSQL decision rows	Status changes without an audit record	Approve, reject, export, and expire actions need user, reason, time, and idempotency evidence.
Server-rendered console	React SPA	The UI is an operations workbench with tables, forms, and review screens; server rendering keeps deployment smaller.

Scaling Limits

The current solver benchmark proves the reference workload, not every network shape. A dense lane graph across every warehouse and SKU would create many more lane-SKU variables than the shipped benchmark. At that point the next design change would be partitioning by region, category, or policy rather than asking one optimization model to cover the full network at once.

The architecture already has the important boundary: simulations are frozen, jobs are persisted, and recommendations are audit records. Scaling the optimizer changes the worker strategy. It does not change the planner contract.