Diagnostics

A diagnostic is a small test you write in your firmware that the platform can run on a single device or across the whole fleet. You declare it once in a YAML file. The build pipeline registers it. The cloud knows about it. From there, the test fires from one of three places.

Three trigger surfaces

Manual from the dashboard. An operator opens a gateway, clicks the Diagnostics tab, and hits Run on the test. The cloud publishes a command to that one device, the device executes the C function you wrote, and the result lands back in the dashboard within seconds. Useful when a customer calls support and you want to see whether the temperature sensor is actually responding without driving to the install site.

Automatic, after every OTA. Set run_after_ota: true on a diagnostic and the platform runs it on every device the moment a new firmware release applies. The release status flips to verifying while checks are in flight, then to verified if they all pass.

Auto-rollback when verification fails. This is the keystone. If any run_after_ota: true diagnostic returns fail, timeout, or error after an OTA, the platform automatically re-pins the gateway to the previous successfully-verified release. No operator clicks. No paging. The bad release is contained on the gateways it already reached, and the rest of the fleet rolling out behind it gets blocked because the release is now in verification_failed state.

Why this matters for fleets

A regression in firmware that breaks motor_health on one device gets caught by the platform before it reaches all 1,000. Without diagnostics, the failure surfaces as "support tickets a week later when customers notice their motors are running rough." With diagnostics, it surfaces in the verification panel within a minute of the first device applying, and the rollout halts itself.

The trade-off is real: a buggy diagnostic can roll your whole fleet back. The platform mitigates this with a loop guard (don't cycle if the rollback target also fails verification) and storm protection (auto-disable the auto-rollback flag if a project sees three consecutive verification failures in 24 hours). See Post-OTA verification for the full state machine.

What you write vs. what the platform handles

You write:

  • A YAML declaration per check in .scadable/diagnostics/.
  • A C function per check in your firmware, using the SCD_DIAG macro and DIAG_PASS / DIAG_FAIL return helpers.
  • A single scadable_register_diagnostic() call per check (or use codegen, which emits scadable_init_diagnostics() and does this for you).

The platform handles:

  • Parsing the YAML into the cloud catalog on every release build.
  • Dispatching the run command to the device over MQTT.
  • Capturing the result envelope back into Postgres with full history.
  • Aggregating per-gateway results into a per-release verification verdict.
  • Rolling back automatically on verification failure.
  • Surfacing everything in the dashboard.

Pages in this section

  • Config schema: every field in .scadable/diagnostics/*.yaml, validation rules, future-type extensibility.
  • C API: SCD_DIAG, DIAG_PASS / DIAG_FAIL, registration timing, error semantics.
  • Post-OTA verification: the full lifecycle, auto-rollback, loop guard, storm protection.
  • Dashboard walkthrough: where to enable the beta flag and how to read the new tabs.

Status

Diagnostics ship as a beta feature, gated per-namespace via feature_flags.diagnostics_v1. To enable for your namespace, contact SCADABLE support or, if you have admin access, flip the flag in your namespace settings. Backend code paths run for everyone (the cloud always ingests catalogs and accepts results); the UI surfaces hide unless the flag is on. See Dashboard walkthrough for the enablement path.

Requires libscadable >= 0.3.0. Older firmware silently no-ops on cloud-triggered runs; the dashboard surfaces "Requires libscadable v0.3.0+" with the Run button greyed.