Guido Bertoni, Joan Daemen^{1,2}, Michaël Peeters^{1}, Gilles Van Assche^{1} and Ronny Van Keer^{1}

^{1}STMicroelectronics - ^{2}Radboud University

19 September 2017

If SHA-2 is not broken, why would one switch to SHA-3 and not just stay with SHA-2? There are several arguments why Keccak/SHA-3 is a better choice than SHA-2. In this post, we come back on a particular design choice of Keccak and explain **why Keccak is not ARX, unlike SHA-2**.

We specified Keccak at the bit-level using only transpositions, bit-level additions and multiplications (in GF(2)). We arranged these operations to allow efficient software implementations using fixed sequences of bitwise Boolean instructions and (cyclic) shifts. In contrast, many designers specify their primitives directly in pseudocode similarly including bitwise Boolean instructions and (cyclic) shifts, but on top of that also additions. These additions are modulo 2^{n} with n a popular CPU word length such as 8, 32 or 64. Such primitives are dubbed ARX that stands for *“addition, rotation and exclusive-or (XOR)”*. The ARX approach is widespread and adopted by popular designs MD4, MD5, SHA-1, SHA-2, Salsa, ChaCha, Blake(2) and Skein.

So why isn't Keccak following the ARX road? We give some arguments in the following paragraphs.

One of the main selling points of ARX is its efficiency in software: Addition, rotation and XOR usually only take a single CPU cycle. For addition, this is not trivial because the carry bits may need to propagate from the least to the most significant bit of a word. Processor vendors have gone through huge efforts to make additions fast, and ARX primitives take advantage of this in a smart way. When trying to speed up ARX primitives by using dedicated hardware, not so much can be gained, unlike in bit-oriented primitives as Keccak. Furthermore, the designer of an adder must choose between complexity (area, consumption) or gate delay (latency): It is either compact or fast, but not at the same time. A bitwise Boolean XOR (or AND, OR, NOT) does not have this trade-off: It simply take a single XOR per bit and has a gate delay of a single binary XOR (or AND, OR, NOT) circuit. So **the inherent computational cost of additions is a factor 3 to 5 higher than that of bitwise Boolean operations**.

But even software ARX gets into trouble when protection against power or electromagnetic analysis is a threat. Effective protection at primitive level requires masking, namely, where each sensitive variable is represented as the sum of two (or more) shares and where the operations are performed on the shares separately. For bitwise Boolean operations and (cyclic) shifts, this sum must be understood bitwise (XOR), and for addition the sum must be modulo 2^{n}. The trouble is that **ARX primitives require many computationally intensive conversions** between the two types of masking.

The cryptographic strength of ARX comes from the fact that addition is not associative with rotation or XOR. However, **it is very hard to estimate the security of such primitives**. We give some examples to illustrate this. For MD5, it took almost 15 years to be broken while the collision attacks that have finally been found can be mounted almost by hand. For SHA-1, it took 10 years to convert the theoretical attacks of around 2006 into a real collision. More recently, at the FSE 2017 conference in Tokyo, some attacks on Salsa and ChaCha were presented, which in retrospect look trivial but that remained undiscovered for many years.

Nowadays, when a new cryptographic primitive is published, one expects arguments on why it would provide resistance against differential and linear cryptanalysis. Evaluating this resistance implies investigating propagation of difference patterns and linear masks through the round function. In ARX designs, the mere description of such difference propagation is complicated, and the study of linear mask propagation has only barely started, more than 25 years after the publication of MD5.

A probable reason for this is that (crypt)analyzing ARX, despite its merits, is relatively unrewarding in terms of scientific publications: It does not lend itself to a clean mathematical description and usually amounts to hard and ad-hoc programming work. A substantial part of the cryptographic community is therefore reluctant to spend their time trying to cryptanalyze ARX designs. We feel that the cryptanalysis of more structured designs such as Rijndael/AES or Keccak/SHA-3 leads to publications that provide more insight.

But if ARX is really so bad, why are there so many primitives from prominent cryptographers using it? Actually, the most recent hash function in Ronald L. Rivest's MD series, the SHA-3 candidate MD6, made use of only bitwise Boolean instructions and shifts. More recently, a large team including Salsa and ChaCha designer Daniel J. Bernstein published the **non-ARX** permutation Gimli. Gimli in turn refers to NORX for its design approach, a CAESAR candidate proposed by a team including Jean-Philippe Aumasson and whose name stems from a rather explicit **“NO(T A)RX”**. Actually, they are moving in the direction where Keccak and its predecessors (e.g., RadioGatún, Noekeon, BaseKing) always were.

So, maybe better skip ARX?