The Evolution of Edge Vision Systems

Figure: One of two server racks powering the Greyhound Tracking System (2004-2005)

Here are two declassified case studies showing how classic computer vision delivered real‑time results on modest hardware from the early 2000s. They mark the first known, publicly deployed real‑time, multi‑camera tracking of their kind on commodity CPUs:

Both ran live, streamed data to dial‑up users, and had to succeed on tiny FLOPS budgets. That scarcity forged habits that still matter on modern edge devices. For additional details the above articles are the best place to start (in particular the Greyhound piece includes lots of photos). However, a brief summary and lessons learned are outlined below.


2003–2005: What Shipped Under Tight Constraints

Context
When these systems went live, OpenCV was a fledgling v0.x, AlexNet (2012) was years away, and a single Pentium 4 could push on the order of ~12 GFLOPS (≈0.012 TOPS) of peak compute. No practical GPGPU. 1 GB RAM on high-end systems. Interlaced PAL video at 25 fps.

Systems at a glance

SystemCamerasComputeObjectsEnd‑to‑End LatencyNotable Tricks
Speedway (2003)9 × PAL CCTV3 × Pentium 44 motorcycles<200 msSSE2 color kernels, helmet‑cam identity hints
Greyhound (2004-2005)52 × PAL CCTV9 × Pentium 46 dogs + hare<220 ms64×16 analog video matrix; 1‑D “track‑unwrapped” EKF

Why the “primitive” hardware helped

ConstraintCounter‑measure
25 fps interlaced PALUse single‑field processing to halve motion blur; regain detail via multi‑view geometry
Zero GPUs, 1 GB RAMHand‑rolled SIMD, LUT color classifiers, early‑exit motion masks, ROI pyramids
100 Mb LAN; many dial‑up usersStream state vectors (<1 kB per frame) instead of video
Dust, glare, dropoutsPer‑pixel variance masks; auto‑recovery and camera failover

Engineering highlights

  • Geometry‑first pipelines. In both systems the oval track was “unwrapped” to a 1‑D arclength coordinate s. Each camera produced observations z = s + noise; a single EKF fused them into smooth trajectories. Occlusions became gaps along s, not hard 2‑D re‑identification problems.
  • Deterministic latency. Fixed time‑budgets per stage (capture → mask → blob → association → fuse), with watchdogs that degraded gracefully (smaller ROIs, shorter association windows) under load.
  • Robust association. Simple gating (Mahalanobis distance) + nearest‑neighbour across cameras outperformed heavier global solvers on commodity CPUs of the era.
  • Operational pragmatism. Camera‑by‑camera health scores; automatic de‑weighting in the filter when variance spiked (rain, floodlights, spectators).

Core idea: Use the world’s structure (track layout, motion priors, order constraints) so that simple algorithms win in real time.


Some of the lessons which are still relevant today

2003 Approach2025 Equivalent
Hand‑optimised kernels and cache awarenessBetter quantisation strategies, compiler pragmas, and memory layouts for edge TPUs / NPUs
Geometry before deep netsSmaller models, fewer labels; homographies and EKFs reduce training burden
Bandwidth‑first designOn‑device inference + light-weight uplinks (telemetry, not video) lowers cost and improves privacy
Designed‑for‑failureSelf‑healing nodes, health telemetry, and graceful degradation are as important as the mean Average Precision of your models.

Early deployments showed that real‑time is achievable on modest hardware by leaning on geometry, priors, and simplification of the problem space.

Evergreen lesson: Scarcity clarifies vision, whether it’s squeezing handcrafted kernels onto a Pentium 4 in 2003 or quantising modern detectors onto a 3-5 W edge accelerator in 2025.

Scroll to top