Demo ≠ Product: Revisiting a Traffic Sensing Idea

I love demos. I also don’t trust them.

Over 25 years I’ve shipped many computer vision (CV) systems that looked great in the lab and became painful the moment they met weather, procurement, and people.

This article is about an idea I had in 2012 and shelved. The economics didn’t work. But hardware costs have changed enough that I recently decided to dust it off to see if the calculus has shifted.

The idea: use CV at a few “calibration” sites to generate ground-truth counts, then deploy cheap RF sensors (aggregate WiFi/Bluetooth presence) at hundreds of sites where cameras would be too expensive. Train a model to predict counts from the RF signal.

In 2012, this felt like consulting disguised as product: expensive edge compute, costly cellular, every site a mini science fair. I shelved it.

Now? A Pi Zero-class board costs single-digit euros at scale. ESP32-class modules are essentially commodity. Cellular IoT plans exist. Solar/battery deployments are routine. Per-unit economics have changed by an order of magnitude.

Does that change the answer?

The competitive landscape

Traffic counting has mature solutions. The opportunity, if any, sits where incumbents are predictably weak.

Intrusive sensors (loops, piezo, tubes): Accurate, but installing them is civil works. Scaling to hundreds becomes a construction program.

Camera + CV: Rich output, but operational burden compounds: permissions, power, connectivity, cleaning, weather, privacy procurement.

Probe data (phones, nav apps): Huge coverage, zero hardware. But weak on low-volume segments, fine time resolution, and specific locations you care about. This competitor didn’t exist in 2012.

Bluetooth/WiFi MAC tracking: Platform privacy countermeasures keep degrading it. Not a new bet.

Radar: Weather-robust, but meaningful per-unit cost and site-specific calibration.

A CV-calibrated RF approach only makes sense in a narrow wedge: on-site measurement (not modelled proxy), lower friction than cameras at scale, more reliable than probe data where probe data is weak, and privacy-clean – aggregate presence, not identity tracking.

If it can’t do that, it’s not a product. It’s a demo with recurring costs.

What’s changed since 2012

Edge compute (CV-capable): $200-500 then, $15-50 now. 10-20x cheaper.

Edge ML inference: Required GPU/DSP then. $50-100 now (Coral, Hailo). Newly feasible.

Cellular IoT: $20-50/month then. $2-5/month now. 10x cheaper.

Solar/battery edge: Custom and expensive then. Commodity $50-100 now.

RF sensor fully loaded: $300-500 then, $50-100 now.

CV calibration station: $1000+ then, $200-400 now.

But cheaper hardware doesn’t mean better business. Deployment friction still exists. Privacy complexity is worse (GDPR). And GPS probe data is now a real competitor.

The technical bet

*Figure: Calibrate RF signatures against CV ground truth at one site, then deploy RF-only at scale to infer traffic.*

Core hypothesis: aggregate RF activity correlates with traffic volume strongly enough to predict counts once calibrated against CV ground truth.

Plausible but unproven in 2012. Still plausible but unproven in 2025.

Questions that matter: Does correlation generalise across sites? Is it robust to iOS/Android behaviour changes? Can you separate vehicles from pedestrians? What’s the accuracy floor?

For planning-grade use, you probably need daily volume within +/-20% and correct directional trends. Anything worse than +/-50% is noise. The gap between “interesting correlation” and “useful product” is where most sensing concepts die.

The design choice: aggregate statistics, not tracking

In 2012, the obvious move was MAC detection – count unique devices. That solution is generally less effective now due to MAC randomisation. Even in the best case it’s brittle and drags you into exactly the wrong procurement conversation.

A modern version treats RF like an analogue sensor. Instead of “how many devices?”, the question is: what’s aggregate RF activity doing, and does it correlate with flow?

Compute on-device: management-frame rates by channel, RSSI histograms, short-window autocorrelation, ratios between antenna views. No persistent identifiers. No payload capture. Discard raw observations quickly.

Extracting directional signal

Mobile hotspots act like moving access points: when enabled, they emit periodic beacons (order of magnitude ~10 Hz, though OS and power state can vary this). Useful signal – but beacons carry identifiers, so the only defensible design keeps them in RAM just long enough to form ephemeral tracklets (seconds, not days), then outputs only directional aggregates.

Hardware: Two directional antennas aimed in opposite directions along the road.

Direction inference: For each short-lived emitter, compare the RSSI envelope across the two antenna views. If the upstream view sees the rise/peak before the downstream view, classify upstream-to-downstream. Opposite lag, opposite direction. Ambiguous gets labelled unknown.

Aggregation: Once classified, discard the tracklet and increment only window-level counters (e.g. 5-10 minutes): “N pass-bys, directional split ~70/30, confidence Z”.

Filter stationary APs: Down-weight or ignore emitters that persist across windows or show flat RSSI (no pass-by shape). Everything on-device. Nothing exported except aggregates and health metrics.

Caveat: hotspot penetration is biased and variable. Treat direction-split as a feature that improves estimates when present – never the only input.

Failure modes

1. RF variance makes cross-site prediction brittle. Site A is a quiet roadside with minimal nearby WiFi. Site B is near a cafe, bus stop, and apartment block – RF activity dominated by stationary infrastructure and waiting pedestrians. The model trained at A confidently over-predicts “traffic” during lunch peaks at B and under-predicts late-night vehicle flow.

2. OS updates silently break your model. An iOS update changes probe behaviour – frequency, burst patterns, background suppression. Your RF features drop 15-30% overnight across a subset of devices. Counts look “down” everywhere. Nothing is physically wrong on the road. Your sensor just lost sensitivity to a chunk of the population, and the model keeps outputting numbers as if nothing happened.

3. CV “ground truth” has systematic errors you don’t notice. Your calibration camera undercounts cyclists at night due to glare, and misclassifies e-scooters as pedestrians in rain. The RF model learns those biases as truth. Six months later you change the camera angle or upgrade the detector – now your RF predictor appears to drift, but the real drift was in your labels.

4. Events distort baselines. A stadium event causes a pedestrian surge (RF spikes) while vehicle traffic is restricted (counts drop). Your model interprets RF as traffic and reports a phantom vehicle peak. Or roadworks move cars away but leave stationary WiFi noise unchanged, so the model fails to detect the intervention impact.

5. No confidence scoring – trust collapses after bad predictions. Water gets into an enclosure, detunes an antenna, drops received power by 10 dB. The model outputs “traffic collapsed” for two weeks. The customer acts on it. When you discover it was a hardware fault, you’ve lost trust – not because the model was wrong, but because it never said “I don’t know”.

6. Fleet ops ignored – you become field services. Ten devices go offline after the first cold snap because battery capacity halves and solar yield drops. Another ten fill their storage because logging is too verbose. Without remote health monitoring and OTA rollback, you’re driving around swapping SD cards. At 200 sites, your “product” is a maintenance operation with graphs.

7. Privacy posture unclear – procurement blocks you. A security review asks: “Do you store MAC addresses? Can you prove deletion? What’s the retention policy?” If your answer is “we don’t think so” or “it’s anonymised”, it’s game over. Worse: a debug build accidentally ships with raw identifier logging, and now you have a real compliance incident.

Privacy is the product

If you’re near WiFi/Bluetooth, assume procurement’s first question is: “Are you tracking people?”

Answer must be absolute: We measure aggregate flow. We do not track individuals. Closed loop. No identifiers stored or exported (RAM-only, short-lived grouping where needed). On-device aggregation. Immediate discard of raw observations. Procurement-ready documentation from day one.

Who buys

Assuming the technical bet pays off, the path to market isn’t “all cities” – it’s specific buyers with acute needs and short decision cycles.

Early: Construction impact monitoring. Traffic engineering consultancies. Campuses, airports, ports.

Medium-term: Mid-sized cities needing to justify interventions without sensor coverage.

Long-term: National road authorities – hard procurement, large footprints once approved.

Conclusion

Economics have improved enough to justify a bounded experiment – but not enough to make this an obvious business idea.

Better than 2012: Hardware costs, edge ML, cellular IoT.

Worse than 2012: Privacy complexity, MAC detection viability, competitive landscape.

Unchanged: Core technical uncertainty, deployment friction, risk of becoming a consulting business.

If I were serious, I’d want: one site for a month (does RF correlate with CV at all?), a two-site generalisation test (does the model transfer?), and early pricing conversations (would anyone pay for directional trends with explicit uncertainty?).

The hard part isn’t the signal processing – it’s everything that happens after you prove the correlation works once.