Firmware security risks
One of the most neglected areas of cyber security is the firmware that underpins every device – from small sensors, right through to the servers running the cloud infrastructure – which all form the infrastructure of the modern digital World. While much emphasis is placed on the security of the operating system, network, applications and so on, none of this helps if the actual firmware is compromised in some way.
What used to be considered a complex, esoteric and rare suite of threats is now one of the major attack vectors: StuxNet and Triton are the visible, canonical and highly publicised cases, but these pale into insignificance compared to the estimated 500-600 firmware exploits currently announced per year. To make matters worse, we do not routinely, if at all, check for firmware compromises, despite technologies such as TPM (Trusted Platform Module), remote attestation, and measured boot being readily available. For example, every x86 UEFI (Unified Extensible Firmware Interface) based system will perform a measured boot. Microsoft now mandates attestation for Windows 11 which is a step in the right direction, albeit controversial to some.
What makes firmware attacks so devastating is that they are trivially hidden. For example, the UEFI firmware, commonplace in nearly all x86 systems, is designed to get out of the way once the system has been set up to load the operating system. Some information may be deliberately left over, such as the UEFI event log recording the boot process and ACPI (Advanced Configuration and Power Interface) tables for the operating system to set up devices etc. This makes any exploit embedded in firmware unavailable to any operating system security checks. Now also consider that we have seen firmware exploits target disk drive and battery firmware.
Secure vs. Measured Boot
Secure boot is a mechanism for ensuring the validity of firmware, boot loaders, and other components through digital signatures; the UEFI Secure Boot is a common example of this. The tables of signing keys, both valid and revoked, must be regularly updated, and of course components need to be signed by trusted authorities. What signing cannot do is prove that the component is not necessarily malware – sadly, it still comes as a surprise to some that malware can be signed (yes, really think about that statement for moment) and we do have examples of malware with valid cryptographic signatures.
Measured boot is complimentary and is based on the idea that we collect measurements – cryptographic hashes – of each component and build from a known, secure, untamperable root of trust to form a chain of trust. This trust comes from the additional knowledge that each component is known a priori by its cryptographic hash, and that any deviation results in a completely different hash which cascades up the chain to given “wrong” results. Mathematically this is a form of Merkle Tree of hashes.
A trusted reporting mechanism is required, and this is where the Trusted Platform Module (TPM) comes into use. As well as being a place to store and generate keys, the TPM is its own certificate and identity authority – each one is unique and identifiable. It has mechanisms for storing the cryptographic hashes for the chains of trust in its Platform Configuration Registers (PCR) and afurther mechanism for reporting these with cryptographic proof that the report – known as a quote – came from that TPM. This mechanism of attestation has many points of cross-reference making tampering with the process exceptionally difficult. Further to this, the TPM’s keys, NVRAM and other objects can be further protected by policies based upon the contents of the PCRs.
Firmware such as UEFI generates logs of the boot process, and these can be cross-referenced against the values stored in the TPM. Known Good Values for the PCRs, especially the all-important core root of trust measurement, need to be generated from reference environments. Fortunately, manufacturers are starting to do this – an excellent example is the Linux Vendor Firmware Service (LVFS) where many manufacturers are including the critical PCR 0 values for reference.
Remote Attestation
Finally, we come to remote attestation where the identities and known-good or expected values for the systems under attestation are stored. The term “remote” comes from the fact that these checks are made off-device allowing greater scalability of attestation, for example, for data centre environments or disperse IoT deployments. Several remote attestation environments exist varying in capabilities and scope: Keylime, Nokia Attestation Engine, Charra, OpenAttestation and so on.
It is worth mentioning the main forms of attestation with a TPM:
- The TPM’s Endorsement Key can be validated against the manufacturer’s certificates for that device, which proves the validity of device against the manufactures
- Any quote of measurements is signed by a key derived from the endorsement key and contains additional meta-data and a nonce to cross-reference not just against the requestor but also against the TPM key hierarchy.
- The keys on the TPM can be further attested using an “identity attestation” process known as make and activate credential, which only works if the keys can be loaded onto that TPM, and cryptographic operations made only on that specific TPM.
All the above are cross-referenceable and additionally if logs such as the UEFI event log is produced, further cross-referenced against that.
The power of remote attestation comes to the fore when it is integrated with the management capabilities of a system and how the response to a loss of trust is handled. The first response should always be to isolate the system in question – this might be anything from shutting it down to sandboxing the system. The next act must be to preserve the system’s state and investigate why the loss of trust occurred – this would be part of any digital forensics process and focus on the root cause of such failures.
Loss of trust however need not necessarily be binary in nature – we can lose degrees of trust of a single device, or part of, or even an entire system. The de-facto standard for TPM PCR values gives us a clue to where in the whole boot and run-time a loss of trust occurred and how extensive the loss of trust is. For example, was the firmware updated, firmware configuration, boot loader, run-time file system (Linux IMA and auditd’s responsibility) and so on. Investigating boot logs, especially the UEFI eventlogs can pinpoint the actual change. Depending upon this we might still “trust” the system; at least to a degree.
Practical applications
As we scale out to cloud environments of all kinds, remote attestation’s results on the trust of the devices can be used to drive the operation of the orchestration. For example, it might be a requirement that certain workload only be placed on suitably trusted machines, or that network configurations is only constructed from devices similarly adhering to given identities and integrity measurements as attested (e.g., for 5G slicing use cases).
Beyond this, the notion of attestation and trust is currently confined to that of devices with TPM, which means X86 and UEFI devices. However, increasingly ARM and ARM-based SoC devices can be extended to other kinds of elements (the author has a particular wish for RISC-V devices with measured boot and TPM). One interesting target is the remote attestation of containers coupled with the container signing processes (cosign, SigStore) and the software bill of materials (SBOM). Another target which already has interfaces for attestation are processor enclaves (Intel SGX, ARM
TrustZone) and when included in the chain-of-trust of the underlying hardware provides a much stronger environment for confidential computing – a precursor for anything with the word “trustworthy” in its name: Trustworthy AI/ML and so on.
Outlook
We must be honest and state that none of this comes without overhead: to take advantage of a root-of-trust and the attestation capabilities it is necessary to take these into active use and construct some infrastructure. Primarily this means knowing device identities, known good configurations, and recording this information as part of supply-chain or equipment databases. This then forms the basis of any attestation process. We can proceed further and integrate the results of any attestation with the monitoring and auditing processes, and even integrate with cloud workload management and other systems.
Given that the hardware already exists as standard – not necessarily TPM per se, but even just permanent device identity, etc. – it seems almost foolish not to take these features into use. Given the interest in concepts such as Zero Trust, trusted computing and remote attestation offer the cornerstone for establishing good Zero Trust practices. It must be noted that no security technique exists in isolation and without overhead and successful use of those techniques comes from integration of these into the wider system.
With TPM and related root of trust technologies becoming more integrated into IoT systems we are now finding places in areas such as railway and medical technologies. The full scope of trust technologies is still being mapped but this forms the backbone of device security from the supply-chain through to run-time and the verticals that will eventually run atop of this trusted infrastructure.