Remote updates
Handling the lifecycle of IoT devices
Last updated
Handling the lifecycle of IoT devices
Last updated
Lifecycle management is a key part of an IAM of things (Connected Device Life Cycle: How does it impact the viability of IoT projects?, Wavestone, 2020):
We can notice:
At the manufacturing step, there exists a risk of fraud or backdoors. Securing the hardware supply chain is a topic in itself. In some cases blockchain technologies may help, but the physical requirements make it challenging (for instance, see DUST identity as an interesting use case). Additionally SBOM (see FAQ and https://cyclonedx.org) completes the picture.
Bootstrapping trust between enterprise systems and devices can be very challenging at scale. We must, somehow, provision unique cryptographic keys and credentials in large fleets of remote devices. For enrollment, we refer the reader to existing protocols.
Final decommissioning IT systems requires that all data be erased, and that electronic waste is handled according to regulations.
Of course there may be cases where such an update mechanism is either not possible or considered too critical. But when it's possible, the update process should be managed in relationship with threat management and mitigation strategies. In some cases, the updates can be done using a local connection to the machine. We'll now focus only on remote firmware updates.
In the healthcare domain, this can be necessary when: a) a device needs to be re-purposed for a new user or more frequently, when b) a bug fix or security patch needs to be applied. This corresponds to requirement "5.3 Keep software updated" from EN 303 645.
For vendors, it's both a way to future proof the device and a way to sell additional value services. Compelling reasons to provide an update mechanism from the start:
Fix your own code, since QA can't catch all bugs
Fix configuration (e.g. public keys) and supply chain dependencies (e.g. we can expect chip SDKs to evolve over the device lifetime)
Fix security issues (e.g. new MITM attack on bluetooth protocol, update to TLS, etc.) or anticipate major threats
This mechanism should take into account the CE marking for medical devices.
We first started with a review of existing methods. Many healthcare IT systems are based on windows computers and can simply use traditional IT updates. Embedded systems of course have stronger limitations, in terms of bandwidth, memory, storage and processing power. For instance, flash memory may be degraded over time, after a significant number of erase and write operations. In some critical cases, a reboot may not be possible.
The update mechanism should be dependable. Typically one wants to avoid this kind of issues:
An additional complexity is the variety of targets, even if embedded linux is more present in Europe (30%). In our case, we target 32 bits microcontrollers and 64 bit architectures.
Update platforms are thus usually specialized by target: while mender targets embedded linux, pelion logically targets ARM devices. The comparison between high-level update frameworks is as follows (source: Firmware over-the-air programming techniques for IoT networks - A survey):
We plan to use a different strategy compared to the A/B approach (i.e. position independent firmware images) described in the survey study, because it would introduce physical memory mapping constraints and wouldn't allow using external non volatile memory to store the updates.
Rolling your own over the air mechanism and secure bootloader is risky, as one needs to handle many complexities (reliability, code safety, key management, etc.). Standards such as the TUF update framework bring guidance, which can be completed by the intoto or IETF SUIT metadata about firmware images. A first evaluation of those standards was carried out in 2019 as part of EU project sparta. Another related example is wolfboot. However, most update mechanisms are done considering only the vendor side, while we need here to handle also the business impact of those roll-outs. This can only be done if the scheduling engine can be customized and deployed either as a service or on-premise.
Our own prototyping scenario is close to figure 1 in sparta (a classical architecture for consumer IoT), albeit with the following differences due to organizational constraints:
updates are scheduled by the healthcare organisation, based on data provided by the vendor (which acts as the resource server) and the internal requirements. GNAP authorization allows to validate both the update and its schedule
the device regularly fetches the schedule, and gets an update (ex: full or delta update as a signed image such as a bin or elf file, etc.) if needed from the schedule server. Note: in a degraded mode where the device needs to be fully isolated, it would still be possible to access manually to the device and apply the patch with a JTAG connection
Note that compared to the sparta project, we don't cover premature optimization benchmarks (like optimizing the cryptographic libraries), because we're more interested in audited implementations and are focusing on architectural patterns
A vendor may therefore provide its firmware updates as protected resources (protected by an access token, delivered by a GNAP authorization server). The scheduler is typically (although not necessarily) deployed within the healthcare's organization network to fetch and verify any new update, and plan the installation based on internal policies. In less critical environments, the vendor may provide the scheduler as a service too, and the device may be provided a biscuit token to download the update directly.
We separate the scheduling and authorization part from the actual update. This is necessary for several reasons:
The medical impact should be assessed first, so that the best time can be chosen. Rules can be manual or semi automated, based on the device registry (possibly with policies such as staged rollouts and must-pass-through releases if needed)
Devices aren't required to speak to the outside to fetch the data to update, so the internal network segmentation can remain the same. It also removes a strong dependency of the performance of the distribution system provided by the vendor. For limited devices, we can use any secure channel (transport independence is mandated by SUIT)
To avoid inadvertently breaking the device, the update code should be kept separate from the application code (and also to ease the traceability requirements). In the event we lose power while updating the application loader, there may be an invalid image at the address the chip boots from (0x0
by default for Cortex-M, but aliased to 0x80000000
on STM32). The solution is to add a small, immutable bootloader whose sole job is to sit at the start address and load our application loader.
A threat assessment of the update itself should be possible. The firmware update code itself should be updatable and versioned in case it also needs to be fixed (e.g. allocate more code space to the application, rotate a security key)
All updates should be recoverable. Non volatile data is to be versioned
The ability to keep track of errors and the possibility to abort and revert
Therefore we end up with the following device architecture, which is the only strong requirement for vendors: