Finished Projects & Research Results

vecteezy_abstract-technology-big-data-background-concept-artificial_21867500_621.jpg

Research

Finished Projects & Research Results

Publication

KyberSlash: Exploiting secret-dependent division timings in Kyber implementations

◆ Collaboration with Daniel J. Bernstein (University of Illinois at Chicago), Karthikeyan Bhargavan (Inria, Paris), Shivam Bhasin (Nanyang Technological University, Singapore), Anupam Chattopadhyay (Nanyang Technological University, Singapore), Tee Kiah Chia (Nanyang Technological University, Singapore), Franziskus Kiefer (Cryspen, Berlin), Thales Paiva (University of Sao Paulo, Brazil), Prasanna Ravi (Nanyang Technological University, Singapore), and Goutam Tamvada (Cryspen, Berlin)

◆ Abstract: This paper presents KyberSlash1 and KyberSlash2 – two timing vulnerabilities in several implementations (including the official reference code) of the Kyber Post-Quantum Key Encapsulation Mechanism, currently undergoing standardization as ML-KEM. We demonstrate the exploitability of both KyberSlash1 and KyberSlash2 on two popular platforms: the Raspberry Pi 2 (Arm Cortex-A7) and the Arm Cortex-M4 microprocessor. Kyber secret keys are reliably recovered within minutes for KyberSlash2 and a few hours for KyberSlash1.

We responsibly disclosed these vulnerabilities to maintainers of various libraries and they have swiftly been patched. We present two approaches for detecting and avoiding similar vulnerabilities. First, we patch the dynamic analysis tool Valgrind to allow detection of variable-time instructions operating on secret data, and apply it to more than 1000 implementations of cryptographic primitives in SUPERCOP. We report multiple findings. Second, we propose a more rigid approach to guarantee the absence of variable-time instructions in cryptographic software using formal methods.

Read More: https://kyberslash.cr.yp.to/

Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4

◆ Collaboration with Ward Beullens (IBM Research), Fabio Campos (RheinMain University of Applied Science), Sofía Celi (Brave Software), Basil Hess (IBM Research)

◆ Abstract: MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes.

In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4.

Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced representation of keys to a nibble-sliced representation.

While the bitsliced representation was primarily motivated by faster arithmetic on microcontrollers, we show that it is not necessary for achieving high performance on Cortex-M4.On Cortex-M4, we instead propose to implement the large matrix multiplications of MAYO using the Method of the Four Russians (M4R), which allows us to achieve better performance than when using the bitsliced approach. This results in up to 21% faster signing.

For AVX2, the change in representation allows us to implement the arithmetic much faster using shuffle instructions. Signing takes up to 3.2x fewer cycles and key generation and verification enjoy similar speedups. This shows that MAYO is competitive with lattice-based signature schemes on x86 CPUs, and a factor of 2-6 slower than lattice-based signature schemes on Cortex-M4 (which can still be considered competitive).

Fast and Clean: Auditable high-performance assembly via constraint solving

◆ Collaboration with Amin Abdulrahman (Max Planck Institute for Security and Privacy), Hanno Becker (AWS), Fabien Klein (Arm)

◆ Abstract: Handwritten assembly is a widely used tool in the development of high-performance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice.

In this work, we present SLOTHY (Super (Lazy) Optimization of Tricky Handwritten assemblY), a framework for the automated superoptimization of assembly with respect to instruction scheduling, register allocation, and loop optimization (software pipelining): With SLOTHY, the developer controls and focuses on algorithm and instruction selection, providing a readable “base” implementation in assembly, while SLOTHY automatically finds optimal and traceable instruction scheduling and register allocation strategies with respect to a model of the target (micro)architecture.

We demonstrate the flexibility of SLOTHY by instantiating it with models of the Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72 microarchitectures, implementing the Armv8.1-M+Helium and AArch64+Neon architectures. We use the resulting tools to optimize three workloads: First, for Cortex-M55 and Cortex-M85, a radix-4 complex Fast Fourier Transform (FFT) in fixed-point and floating-point arithmetic, fundamental in Digital Signal Processing. Second, on Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72, the instances of the Number Theoretic Transform (NTT) underlying CRYSTALS-Kyber and CRYSTALS-Dilithium, two recently announced winners of the NIST Post-Quantum Cryptography standardization project. Third, for Cortex-A55, the scalar multiplication for the elliptic curve key exchange X25519. The SLOTHY-optimized code matches or beats the performance of prior art in all cases, while maintaining compactness and readability.

pqm4: Benchmarking NIST Additional Post-Quantum Signature Schemes on Microcontrollers

◆ Collaboration with Markus Krausz (Ruhr University Bochum), and Richard Petri (Max Planck Institute for Security and Privacy)

◆ Abstract. In July 2022, the US National Institute for Standards and Technology (NIST) announced the first set of Post-Quantum Cryptography standards: Kyber, Dilithium, Falcon, and SPHINCS+. Shortly after, NIST published a call for proposals for additional post-quantum signature schemes to complement their initial portfolio. In 2023, 50 submissions were received, and 40 were accepted as round-1 candidates for future standardization.

In this paper, we study the suitability and performance of said candidates on the popular Arm Cortex-M4 microcontroller. We integrate the suitable implementations into the benchmarking framework pqm4 and provide benchmarking results on the STM32L4R5ZI featuring 640 KB of RAM. pqm4 currently includes reference implementations for 15 submissions and M4-optimized implementations for five submissions. For the remaining candidates, we describe the reasons that hinder integration - the predominant reason being large key size or excessive memory consumption.

While the performance of reference implementations is rather meaningless and often does not correlate with the performance of well-optimized implementations, this work provides some first indication of which schemes are most promising on microcontrollers. The publicly available implementations in pqm4 also provide a good starting point for future optimization efforts.

Initially, we were hoping for a much higher code quality than for initial submissions to NIST's previous PQC project. However, we got grossly disappointed: Half of the submissions make use of dynamic memory allocations, often completely without reason; Many implementations have compiler warnings, sometimes hinting at more serious issues; Many implementations do not pass simple sanitizer tests such as using valgrind; Multiple implementations make use of static memory.

Algorithm Specification

UOV

Collaboration with Ward Beullens (IBM Research), Ming-Shing Chen (Academia Sinica), Jintai Ding (Tsinghua University), Boru Gong (Beijing Institute of Mathematical Sciences and Applications), Jacques Patarin (CNRS), Bo-Yuan Peng (Academia Sinica), Dieter Schmidt, Cheng-Jhih Shih (National Taiwan University), Chengdong Tao (Beijing Institute of Mathematical Sciences and Applications), Bo-Yin Yang (Academia Sinica)
Abstract: Unbalanced Oil and Vinegar (UOV) is a digital signature scheme using the hash-and-sign paradigm from a trapdoored multivariate quadratic map. First proposed in 1999, UOV has withstood two decades of cryptanalysis, testifying to its enduring security and reliability. UOV excels in time efficiency and signature size. Particularly at the NIST security level 1, UOV outshines other post-quantum digital signature candidates such as Dilithium, Falcon, and SPHINCS+. This results in either 128-byte signatures with 44 kB public keys or 96-byte signatures with 67 kB public keys at NIST security level 1.

Open Source Software

mlkem-c-aarch64

Jointly maintained with Hanno Becker (Amazon Web Services)
Description: High-speed high-assurance implementations of ML-KEM for AArch64.

mlkem-c-embedded

Jointly maintained with Richard Petri (Max Planck Institute for Security and Privacy)
Description: High-speed high-assurance implementations of ML-KEM for 32-bit microcontrollers.

MAYO in Go

Description: Go implementation of the MAYO signature scheme integrated into Cloudflare’s CIRCL library.

FrodoKEM Jasmin

Collaboration with Manuel Barbosa (Max Planck Institute for Security and Privacy), and Peter Schwabe (Max Planck Institute for Security and Privacy)
Description: High-assurance Jasmin implementation of FrodoKEM.

a. publication

b. Algorithm Speacifcation

NIBB

FAST AND CLEAN

pqm4

c. Open Source Software

Research

Finished Projects & Research Results

Publication

KyberSlash: Exploiting secret-dependent division timings in Kyber implementations​ ​

​

Algorithm Specification

UOV​

​

Open Source Software

mlkem-c-aarch64​

​

KyberSlash: Exploiting secret-dependent division timings in Kyber implementations

UOV

mlkem-c-aarch64