Our Cheddar paper was accepted at ASPLOS 2026!

This work is about accelerating the popular fully homomorphic encryption (FHE) scheme, CKKS, using the latest GPUs. As we state in the paper, “Cheddar is simply fast,” delivering performance improvements of 2.18–4.45× for representative FHE workloads compared to state-of-the-art GPU implementations.

The key contributions of the paper are:

The 25-30 prime system, a 32-bit residue-number-system design with an inverted-terminal data layout, enabling systematic and efficient FHE execution on GPUs.
Highly optimized 32-bit GPU kernels using signed Montgomery reduction and architecture-aware optimizations to enhance computational efficiency and memory usage in core FHE operations.
Extensive kernel fusion, reordering and splitting operational sequences to mitigate memory bandwidth bottleneck.
Reducing encrypted ResNet-20 inference latency to 0.72 seconds on a single RTX 5090 GPU.

Cheddar will be open-sourced at the start of the conference. We'd like to acknowledge the exceptional contributions of our co-first authors, Wonseok Choi and Jongmin Kim. (Notably, Jongmin's impressive track record includes co-first authorship on papers for ASPLOS, HPCA, ISCA, and MICRO.) Stay tuned for more updates!

Cheddar

Title

Cheddar: A Swift Fully Homomorphic Encryption Library Designed for GPU Architectures

Authors

Wonseok Choi, Jongmin Kim, and Jung Ho Ahn

Abstract

Fully homomorphic encryption (FHE) frees cloud computing from privacy concerns by enabling secure computation on encrypted data. However, its substantial computational and memory overhead results in significantly slower performance compared to unencrypted processing. To mitigate this overhead, we present Cheddar, a high-performance FHE library for GPUs, achieving substantial speedups over previous GPU implementations. We systematically enable 32-bit FHE execution, leveraging the 32-bit integer datapath within GPUs. We optimize GPU kernels using efficient low-level primitives and algorithms tailored to specific GPU architectures. Further, we alleviate the memory bandwidth burden by adjusting common FHE operational sequences and extensively applying kernel fusion. Cheddar delivers performance improvements of 2.18–4.45× for representative FHE workloads compared to state-of-the-art GPU implementations.