Guido Bertoni, Joan Daemen^{1,2}, Michaël Peeters^{1}, Gilles Van Assche^{1} and Ronny Van Keer^{1}
^{1}STMicroelectronics - ^{2}Radboud University
Welcome to the web pages of the Keccak Team!
In these pages, you can find information about our different cryptographic schemes and constructions, their specifications, cryptanalysis on them, the ongoing contests and the related scientific papers.
If SHA-2 is not broken, why would one switch to SHA-3 and not just stay with SHA-2? There are several arguments why Keccak/SHA-3 is a better choice than SHA-2. In this post, we come back on a particular design choice of Keccak and explain why Keccak is not ARX, unlike SHA-2.
We specified Keccak at the bit-level using only transpositions, bit-level additions and multiplications (in GF(2)). We arranged these operations to allow efficient software implementations using fixed sequences of bitwise Boolean instructions and (cyclic) shifts. In contrast, many designers specify their primitives directly in pseudocode similarly including bitwise Boolean instructions and (cyclic) shifts, but on top of that also additions. These additions are modulo 2^{n} with n a popular CPU word length such as 8, 32 or 64. Such primitives are dubbed ARX that stands for “addition, rotation and exclusive-or (XOR)”. The ARX approach is widespread and adopted by popular designs MD4, MD5, SHA-1, SHA-2, Salsa, ChaCha, Blake(2) and Skein.
So why isn't Keccak following the ARX road? We give some arguments in the following paragraphs.
One of the main selling points of ARX is its efficiency in software: Addition, rotation and XOR usually only take a single CPU cycle. For addition, this is not trivial because the carry bits may need to propagate from the least to the most significant bit of a word. Processor vendors have gone through huge efforts to make additions fast, and ARX primitives take advantage of this in a smart way. When trying to speed up ARX primitives by using dedicated hardware, not so much can be gained, unlike in bit-oriented primitives as Keccak. Furthermore, the designer of an adder must choose between complexity (area, consumption) or gate delay (latency): It is either compact or fast, but not at the same time. A bitwise Boolean XOR (or AND, OR, NOT) does not have this trade-off: It simply take a single XOR per bit and has a gate delay of a single binary XOR (or AND, OR, NOT) circuit. So the inherent computational cost of additions is a factor 3 to 5 higher than that of bitwise Boolean operations.
But even software ARX gets into trouble when protection against power or electromagnetic analysis is a threat. Effective protection at primitive level requires masking, namely, where each sensitive variable is represented as the sum of two (or more) shares and where the operations are performed on the shares separately. For bitwise Boolean operations and (cyclic) shifts, this sum must be understood bitwise (XOR), and for addition the sum must be modulo 2^{n}. The trouble is that ARX primitives require many computationally intensive conversions between the two types of masking.
The cryptographic strength of ARX comes from the fact that addition is not associative with rotation or XOR. However, it is very hard to estimate the security of such primitives. We give some examples to illustrate this. For MD5, it took almost 15 years to be broken while the collision attacks that have finally been found can be mounted almost by hand. For SHA-1, it took 10 years to convert the theoretical attacks of around 2006 into a real collision. More recently, at the FSE 2017 conference in Tokyo, some attacks on Salsa and ChaCha were presented, which in retrospect look trivial but that remained undiscovered for many years.
Nowadays, when a new cryptographic primitive is published, one expects arguments on why it would provide resistance against differential and linear cryptanalysis. Evaluating this resistance implies investigating propagation of difference patterns and linear masks through the round function. In ARX designs, the mere description of such difference propagation is complicated, and the study of linear mask propagation has only barely started, more than 25 years after the publication of MD5.
A probable reason for this is that (crypt)analyzing ARX, despite its merits, is relatively unrewarding in terms of scientific publications: It does not lend itself to a clean mathematical description and usually amounts to hard and ad-hoc programming work. A substantial part of the cryptographic community is therefore reluctant to spend their time trying to cryptanalyze ARX designs. We feel that the cryptanalysis of more structured designs such as Rijndael/AES or Keccak/SHA-3 leads to publications that provide more insight.
But if ARX is really so bad, why are there so many primitives from prominent cryptographers using it? Actually, the most recent hash function in Ronald L. Rivest's MD series, the SHA-3 candidate MD6, made use of only bitwise Boolean instructions and shifts. More recently, a large team including Salsa and ChaCha designer Daniel J. Bernstein published the non-ARX permutation Gimli. Gimli in turn refers to NORX for its design approach, a CAESAR candidate proposed by a team including Jean-Philippe Aumasson and whose name stems from a rather explicit “NO(T A)RX”. Actually, they are moving in the direction where Keccak and its predecessors (e.g., RadioGatún, Noekeon, BaseKing) always were.
So, maybe better skip ARX?
We congratulate Yao Sun^{1} and Ting Li^{1} for solving the 3-round pre-image challenge on Keccak[r=240, c=160].
The previous pre-image challenge on the 400-bit version was solved on 2 rounds by Paweł Morawiecki in 2011. This present challenge was solved by combining brute-force and algebraic techniques. It took 5 days with eight GPU cards (nvidia 1080Ti).
We are happy to announce that we moved the contents from {keccak, sponge, ketje, keyak}.noekeon.org
to our new domain keccak.team
. Many thanks to Benoit Viguier for designing the engine behind these new pages!
We hope that you will enjoy the refreshed look. Do not hesitate to contact us if you see something incorrect or missing.
In a recent post, Adam Langley complains that “SHA-3 is slow”. Similar comments appear from time to time on the web (see also David Wong's post). But what does it mean precisely? Let us dig into it.
There are a couple of ambiguities in this statement. Let's start with the first one: where would it be “slow”?
Keccak, the winner of the SHA-3 competition, is blazing fast when implemented on dedicated (ASIC) or programmable (FPGA) hardware. Its throughput for a given circuit area is an order of magnitude higher than SHA-2 or any of the SHA-3 finalists. And if you care beyond plain speed, note that it also consumes much less energy per bit. In this sense, Keccak is a green cryptographic primitive.
Keccak has other implementation advantages, like efficient protections against side-channel attacks, but let's go to the point: what seems to be at stake is the speed in software.
How come then, that SHA3-512 is slower than SHA-512 on modern processors? This brings up to the second ambiguity of the statement: what is “SHA-3”?
“SHA-3” initially refers to the target of the competition that NIST organized from 2008 to 2012, namely a new hash standard that would complement SHA-2 in case it is broken. Hence, the initially-intended outcome of the competition is a set of four functions called SHA3-224, SHA3-256, SHA3-384, and SHA3-512.
If “SHA-3” means these four functions, then indeed SHA3-512 is unnecessarily slowed down by an excessive security parameter. Due to an absurd rule in the competition, followed by a fierce controversy in 2013, the parameters of SHA3-512 are stuck at offering 512 bits of pre-image security, a nonsense for anyone who can compute powers of two.
It turned out that Keccak has more to offer than just the drop-in replacements for SHA-{224, 256, 384, 512}. In FIPS 202 (“the SHA-3 standard”), NIST also approves two extendable-output functions (XOFs) called SHAKE128 and SHAKE256. These generalize the traditional hashing paradigm by allowing any arbitrary output length and by being parameterized by their security strength, instead of their output length.
If the term “SHA-3” embraces the aforementioned XOFs, then SHAKE{128, 256} are on par with SHA-2 on common processors. But could “SHA-3” actually be super fast in software?
To answer this last question, let us go further down the standardization road. Recently, NIST released the SP 800-185 standard (“SHA-3 derived functions”), which proposes a framework for customizable functions, called cSHAKE, that generalize SHAKE{128, 256}. And, as an application of this framework, it approves the ParallelHash functions. These functions internally use a parallel mode of operation, which an implementation can exploit to speed-up the processing.
With these standard functions included, “SHA-3” can actually outperform SHA-2 and even SHA-1 (broken but usually considered fast) for long messages on modern processors.
Of course, a cryptographic functions should carefully balance speed and security. Keccak makes use of sound design properties, like fast linear and differential propagation or purposely weak alignment, and it clearly stays away from the ARX (modular additions, rotations and exclusive or's) approach. What we understand of Keccak now made us wonder: aren't 24 rounds too much?
KangarooTwelve is a recently proposed variant of Keccak, in which the number of rounds has been reduced to 12—yet, with exactly the same round function, no tweak! It may seem like a drastic reduction, but it is in line with our proposed solution for authenticated encryption, Keyak, submitted to the CAESAR competition. And, more importantly, this wouldn't have been possible if Keccak hadn't had the chance of getting such a rather extensive scrutiny from cryptanalysis experts throughout the years—and to resist to them.
KangarooTwelve is defined on the “SHA-3” components from FIPS 202 and may please the aficionados of extreme software speed. But beware: with such speeds, the hard drive or the network connection has long become the bottleneck for most applications.
In a joint work with Silvia Mella (STMicroelectronics and University of Milano), we propose a framework for bounding the weight of differential trails. We apply this on Keccak-f with widths of 200, 400, 800 and 1600 bits to show that no trail of weight less than 92 over 6 rounds exist in either of these Keccak-f instances. Should a 6-round differential trail be used as part of a collision attack, the ratio of complying pairs is thus guaranteed to be at most 2^{-92}.
This work improves over our results of FSE 2012, both extending the coverage of differential trails both over the different Keccak-f widths and over a higher weight per round. In particular, Silvia was able to scan all 3-round trails up to weight 45. At 15 per round, and given the exponential growth of trail number per weight, this is a significant improvement over previous works.
Silvia presented her work at FSE 2017. The paper can be found here.