13 February 2012

New Keccak mid-range core hardware implementation

We released the VHDL code of a new mid-range core hardware implementation of Keccak.

The new implementation takes inspiration from the work of Bernhard Jungk and Jürgen Apfelbeck presented at ReConFig 2011. It cuts Keccak's state in typically 2 or 4 pieces, so naturally fitting between the fast core (1 piece) and Jungk and Apfelbeck's compact implementation (8 pieces). As a result, we get a circuit not as fast as the fast core but more compact.

The implementation is parametrized by Nb, which determines the amount of folding. With Nb=2, the Keccak-f[1600] permutation is computed in 74 clock cycles, and in 124 clock cycles with Nb=4. Higher values of Nb are possible, although not the main target of our new architecture.

We made some preliminary synthesis of this mid-range core and evaluated the corresponding throughput, with the same STM 130 nm technology used for the other implementations of Keccak. At 500MHz, we can reach a throughput of 5.6 Gbit/s in 28 kGE with Nb=2 or 3.6 Gbit/s in 22 kGE with Nb=4. As a comparison at the same frequency, the fast core processes 21.3 Gbit/s and requires 48 kGE. (In all cases, the throughput is for a rate of 1024 bits.)

We will report more data and a description of the architecture in an up-coming release of the Keccak implementation overview document.