Architectural Brief: Inselvolt Pricing Engine

The operations team was spending 90 minutes every morning updating product prices by hand. Thirteen suppliers, 4,162 products, one person with a spreadsheet. This is the system that replaced that process and made the store reactive to competitor pricing on Idealo within minutes instead of hours.

System Topology

Infrastructure Decisions

Compute: AWS ECS Fargate (containerized). Chose over Lambda because the full pipeline runs for 7-12 minutes end-to-end. Lambda's 15-minute timeout leaves no margin for retries, and the memory footprint with 13 supplier DataFrames in memory exceeds Lambda's practical limits.
Data Layer: Pandas DataFrames in-memory. Chose over PostgreSQL or SQLite because this is a batch script, not an API. No concurrent reads, no persistent state between runs. Everything loads, processes, and exports in a single execution. AWS DynamoDB is used only for run-tracking metadata, not core data.
Core Language: Python 3.10 with no web framework. Chose Python for pandas ecosystem and the team's familiarity. No FastAPI or Flask: there's no API to serve. The entry point is run.py calling run_inselvolt_bot() with a 2-hour timeout watchdog.
Shopify Integration: GraphQL Admin API with Bulk Operations (JSONL staged upload, mutation, polling, result download). Chose over REST Admin API because individual REST variant updates took 30+ minutes for 4,162 products and hit rate limits. Bulk operations complete in under 7 minutes with SHOPIFY_RATE_LIMIT_BUFFER=0.8.
Data Bus: Google Drive API (service account). Chose over S3 because the operations team manages pricing rules, supplier feeds, and competition monitoring in Google Sheets. Google Drive is both data source and data sink. Replacing it with S3 would have required rebuilding their entire workflow.
Idealo Integration: Custom scraping API, offer aggregation, and CSV export pipeline. No official Idealo API for real-time pricing data. Competition data flows through a separate module (idealo_operations/) with its own retry logic.

Constraints That Shaped the Design

Input: 13 supplier price feeds (CSV, XLSX, varying column names, mixed decimal formats) downloaded from Google Drive subfolders. Base product catalog pulled from Matrixify CSV export URL. Pricing rules from a Google Sheet with price-logic and price-logic-exclusions tabs.
Output: Updated prices and metafields synced to Shopify (bulk GraphQL). CSV exports for Idealo, Solute, and Adcell marketplaces. Daily backup snapshots to Google Drive.
Scale Handled: ~4,162 products per run. At 40,000 products, the in-memory DataFrame approach would need chunked processing and partitioned bulk operations. The pricing logic itself is stateless per-product and would scale linearly.
Hard Constraints: Shopify API rate limit at 80% buffer (SHOPIFY_RATE_LIMIT_BUFFER=0.8). Bulk operation timeout at 3,600 seconds. retry_with_backoff decorator on all Google Drive and API calls (5 retries, exponential backoff, 60s max wait). A 2-hour global timeout in run.py kills the process if anything hangs.

Decision Log

Decision	Alternative Rejected	Why
Two-pass SKU matching (product SKU then supplier part number)	Fuzzy string matching	Solar part numbers are dense alphanumerics. `STP8.0-3AV-40` and `STP10.0-3AV-40` are different products at different prices, one an 8kW inverter and one a 10kW. Fuzzy matching produced dangerous false positives. Two deterministic passes reached 99.5% coverage with zero false matches.
Per-supplier cleaner functions (11 separate functions)	Generic data normalizer with rules dictionary	Three suppliers in, the generic approach was already unreadable. Each supplier has unique format quirks: German decimals, text stock values, inconsistent column names. Isolated cleaners are ugly but safe to modify independently.
Category surcharge before pricing rules	Surcharge after pricing rules	The category-specific surcharge must be part of the base cost before margin brackets apply. Applying it after creates compound markup drift that looks plausible but is consistently a few percent too high. Caught this only after manual spot-checking specific products.
JSONL staged bulk mutations for Shopify	Individual GraphQL mutations	4,162 individual mutations took 30+ minutes with rate limit pauses. JSONL upload, single bulk mutation, poll-wait-download takes under 7 minutes. The complexity cost is a state machine for upload-mutate-poll-download, but the 20x speedup justifies it.
Google Drive as primary I/O	AWS S3 or managed file store	The client's operations team manages pricing rules, competitor monitoring, and supplier feeds in Google Sheets. Forcing a workflow change would have delayed delivery by months. Google Drive API with service account auth handles both read and write.
Supplier selection with exclusion priority	Simple lowest-price deduplication	The operations team sets fixed prices on specific SKUs via a Google Sheet. Automation must not override these manual decisions. Additionally, the cheapest supplier row might have zero inventory. The selection logic checks exclusions first, then inventory, then price.

Architectural Brief: Inselvolt Pricing Engine

System Topology

Infrastructure Decisions

Constraints That Shaped the Design

Decision Log

Contents