← Back to API Docs

FlowFabric API — Performance Guide

Benchmarks, competitive analysis, and query optimization patterns for production hydrologic workflows.

All production figures below were measured against the deployed API at flowfabric-api.lynker-spatial.com from within AWS us-east-1 (≤10 ms RTT), 5 iterations each, warm cache.

Use case Measured time Rows Response size
Rating curves — 2 reaches 0.78 s 672 12 KB
NWM forecast — 100 reaches, latest run 0.42 s 1,800 45 KB
NWM reanalysis — 10 reaches, 10 years 8.3 s 876,000 45 KB

Local development: Querying from a developer laptop adds S3 egress latency — typically 3–10× slower than the deployed path. The numbers above reflect what end users see in production.


Benchmark Details

Rating Curves — 2 reaches

import httpx, pyarrow as pa, io

response = httpx.post(
    "https://flowfabric-api.lynker-spatial.com/v1/ratings",
    headers={"Authorization": f"Bearer {token}"},
    json={"feature_ids": ["8318793", "8318787"], "type": "rem"}
)
table = pa.ipc.open_stream(io.BytesIO(response.content)).read_all()
Metric Value
Average 0.78 s
Std dev ±0.04 s
Min / Max 0.73 s / 0.82 s
Rows returned 672
Response size 12 KB (Arrow IPC) vs ~28 KB JSON

Why it's fast: Partition-level filtering prunes to the relevant VPU shards before any row is read, skipping 99%+ of stored data.


NWM Forecast — 100 reaches, latest run

response = httpx.post(
    "https://flowfabric-api.lynker-spatial.com/v1/datasets/nws_owp_nwm_short_range/streamflow",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "query_mode": "run",
        "issue_time": "latest",
        "scope": "features",
        "feature_ids": ["8318793", "8318787", ...]   # 100 IDs
    }
)
Metric Value
Average 0.42 s
Std dev ±0.04 s
Min / Max 0.39 s / 0.47 s
Rows returned 1,800 (18 time steps × 100 reaches)
Response size 45 KB (Arrow IPC) vs ~120 KB JSON

NWM Reanalysis — 10 reaches, 10 years

response = httpx.post(
    "https://flowfabric-api.lynker-spatial.com/v1/datasets/nws_owp_nwm_reanalysis_3_0/streamflow",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "query_mode": "absolute",
        "start_time": "2013-01-01T00:00:00Z",
        "end_time":   "2022-12-31T23:00:00Z",
        "scope": "features",
        "feature_ids": ["8318793", "8318787", ...]   # 10 IDs
    }
)
Metric Value
Average 8.3 s
Std dev ±0.4 s
Min / Max 7.9 s / 8.7 s
Rows returned 876,000 (10 reaches × 10 yr × 8,760 hr/yr)
Response size 45 KB (Arrow IPC)

Why it's fast given the scale: The entire reanalysis corpus is petabyte-scale. The API reads only the Zarr chunks that intersect the requested feature IDs and time window — roughly 0.00001% of total stored data.


Why Arrow IPC?

All endpoints default to Apache Arrow IPC — a binary columnar wire format.

Format Relative size Python parse time
Arrow IPC 1× (smallest) ~1 ms
Parquet ~1.2× ~5 ms
JSON ~2.5–3× ~50 ms

Arrow is natively supported in Python (pyarrow), R (arrow), JavaScript (apache-arrow), and most BI tools.

Python

import pyarrow as pa, io
table = pa.ipc.open_stream(io.BytesIO(response.content)).read_all()
df = table.to_pandas()

R

library(arrow)
df <- as.data.frame(arrow::read_ipc_stream(httr::content(resp, "raw")))

JavaScript

import * as arrow from "apache-arrow";
const table = arrow.tableFromIPC(await response.arrayBuffer());

Competitive Analysis

Four direct access paths exist for NWM data alongside FlowFabric. Each fills a different niche; none covers the full scope of FlowFabric.

CIROH Hub is a resource aggregator and documentation portal — it catalogs NWM data and access options across AWS, GCP, and Azure but is not itself a query service. The tools below are what CIROH Hub points to.

Capability FlowFabric CIROH GCP API NOAA NOMADS NOAA NWPS AWS Archive
Server-side reach filter ✅ Yes ✅ Yes ❌ No ✅ Yes Zarr only
Batch (many reaches, one call) ✅ Yes ✅ Yes ❌ No ❌ One per call ❌ No
Forecast data ✅ Yes ✅ Yes ✅ Rolling 2–4 days ✅ Operational ❌ No
Reanalysis (1979–2023) ✅ Yes ⚠ 2018–present only ❌ No ❌ No ✅ Static archive
Wire format Arrow IPC JSON / CSV NetCDF (full file) JSON NetCDF / Zarr
Authentication Bearer / API key API key (CIROH members) None None None (AWS SDK)
Access model REST API REST API File download REST API S3 CLI/SDK
Public / no membership ✅ Yes ❌ CIROH members only ✅ Yes ✅ Yes ✅ Yes

CIROH / NOAA GCP API

hub.ciroh.org/docs/products/data-management/bigquery-api/

The CIROH GCP API (nwm-api.ciroh.org) is a FastAPI service backed by NWM data on Google Cloud. It exposes four endpoints: /forecast, /analysis-assim, /geometry, /return-period. The comids parameter provides true server-side reach filtering — the closest structural analog to FlowFabric's architecture.

Key gaps: access requires CIROH membership and an approved project; responses are JSON or CSV only; reanalysis coverage starts at September 2018 (operational GCP archive start, not 1979); and BigQuery scan-based billing can surface unexpected cost on large queries.

NOAA NOMADS File Server

nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/

NOMADS is NOAA's raw operational file server. Files appear as model cycles complete — roughly 30–60 minutes post-run — and are retained for approximately 2–4 days. Each file covers all CONUS reaches for one time step (~13 MB per channel_rt file). There is no per-reach filter; every query is a full file download followed by local parsing.

NOAA Water Prediction Services (NWPS) API

api.water.noaa.gov/nwps/v1/docs/

The NOAA NWPS API is a public REST service with native reach-ID routing (GET /reaches/{reachId}/streamflow). It returns analysis assimilation (~3-day past window) and short/medium-range forecasts in a single JSON response. No authentication or API key required.

The constraints are structural: one reach per request (no batch endpoint), operational data only (no reanalysis), response units are ft³/s JSON (no binary format), and service availability is best-effort — 503 responses have been observed. There is no stated SLA for third-party use.

AWS NWM Retrospective Archive

registry.opendata.aws/nwm-archive/

The AWS Open Data registry hosts all NWM retrospective runs: v1.2 (1993–2017), v2.0 (1993–2018), v2.1 (1979–2020), and v3.0 (1979–2023, ~250 TB). Data is free with no authentication; standard AWS egress charges apply outside us-east-1.

This is the authoritative source for long-record reanalysis — FlowFabric's reanalysis backend reads from this same archive. Accessing it directly requires xarray + zarr expertise: the Zarr format enables per-reach partial reads, but naive NetCDF access downloads the full file. There is no API layer, no operational data, and no updates after January 2023.


Overall Assessment

FlowFabric is the only service that combines:

  1. Both forecast and reanalysis — all alternatives cover one or the other, not both.
  2. True batch reach filtering in a public, no-membership REST API — the CIROH GCP API does this too, but requires CIROH affiliation.
  3. Binary columnar output (Arrow IPC) — every alternative returns JSON, CSV, or requires client-side NetCDF parsing.
  4. A measured production SLA — 0.42 s for 100-reach forecasts, 8.3 s for 10-reach 10-year reanalysis.

The closest competitor for operational use is the CIROH GCP API (members only); the closest for reanalysis bulk work is direct S3 access to the AWS archive. FlowFabric's design goal is to make the reach-level query case fast and simple enough that building a direct S3/Zarr pipeline is rarely worth it.


Optimization Tips

1. Preview before you query

?estimate=true returns row and byte counts instantly and does not count against your quota:

estimate = httpx.post(
    "https://flowfabric-api.lynker-spatial.com/v1/datasets/nws_owp_nwm_reanalysis_3_0/streamflow?estimate=true",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "query_mode": "absolute",
        "start_time": "2013-01-01T00:00:00Z",
        "end_time":   "2022-12-31T23:00:00Z",
        "scope": "features",
        "feature_ids": ["8318793"]
    }
).json()

print(estimate["estimated_rows"], estimate["estimated_bytes"])
if estimate["would_exceed_sync_limits"]:
    print("Switch to mode='export'")

2. Batch your feature IDs

One call with 100 feature IDs is 50–100× faster than 100 separate calls:

# ❌ Slow — 100 separate requests
for fid in feature_ids:
    df = query(feature_ids=[fid])

# ✅ Fast — one request
df = query(feature_ids=feature_ids)

3. Use export mode for large queries

mode="sync" streams immediately — suitable for < 100 MB. For larger payloads, mode="export" writes a Parquet file to S3 and returns a pre-signed download link:

result = httpx.post(..., json={..., "mode": "export"}).json()
# result["download_url"] ready in 30–60 seconds

4. Use /v1/stage instead of chaining two calls

The stage endpoint chains streamflow lookup and rating curve translation in one round trip:

# ❌ Two round trips
sf    = query_streamflow(dataset_id=..., feature_ids=...)
rc    = query_ratings(feature_ids=...)
stage = translate(sf, rc)

# ✅ One round trip
stage = httpx.post("/v1/stage", json={
    "dataset_id":   "nws_owp_nwm_analysis",
    "issue_time":   "latest",
    "feature_ids":  [...],
    "ratings_type": "rem"
}).json()

5. Filter by Strahler order for regional queries

When using POST /v1/features/bbox over a large area, stream_order_min reduces the feature count before any data query:

features = httpx.post("/v1/features/bbox", json={
    "bbox": [-110, 35, -100, 45],
    "stream_order_min": 4,    # main stems only
    "max_features": 1000
}).json()["feature_ids"]

Rate Limits & Quotas

Rate limits reset every 60 seconds. Current window status is included in every response:

X-RateLimit-Limit: 120
X-RateLimit-Remaining: 117
X-RateLimit-Reset: 1745836800
Tier Requests / min Monthly data
free 20 3 GB
standard 120 Unlimited
pro 600 Unlimited
enterprise 3,000 Unlimited

Check remaining quota at GET /v1/me/usage.


Benchmark Methodology

Detail Value
Environment AWS us-east-1, deployed API
Round-trip latency ~10 ms
Iterations 5 (3 for long reanalysis queries)
Cache state Warm (vpuid index pre-loaded)
Date measured January 9, 2026

Results represent steady-state performance after the first request warms the in-process partition index. Cold-start (first request after a fresh deployment) adds a one-time index initialization cost.


Support