The ECHO hash function offers a very wide range of throughput/area tradeoffs for hardware implementations. Additionally, since the main building block is an AES round, it is particularly suited to co-exist with AES implementations.
This page gives an overview of the performance of the ECHO hash function on various type of hardware.
quick overview
256-bit hash | 512-bit hash | |||
FPGA | Throughput (Gbps) |
#Slices |
Throughput (Gbps) |
#Slices |
High speed FPGA (Virtex 6) | 29.457 | 8,071 | — | — |
High speed FPGA (Virtex 5) | 26.390 | 10,407 | 7.810 | 9,097 |
Low area FPGA (Virtex 5) | 0.072 | 127 | — | — |
ASIC | Throughput (Gbps) |
#Gates (KGates) |
Throughput (Gbps) |
#Gates (KGates) |
High speed ASIC (0.13 μm) | 14.850 | 521.1 | 7.750 | 516.8 |
Low area ASIC (0.09 μm) | 0.204 | 60.0 | — | — |
hardware implementation strategies
Because the ECHO hash function echoes the AES structure, there are basically two levels of throughput/area tradeoff:
- the AES round level, where it is possible to use between 1 and 16 modules for the S-box step, as well as 1 to 4 modules for the MixColumn step. Each of these modules is in turn subject to a specific tradeoff.
- the ECHO state level, where BIG.SubWords can use 1 to 32 AES round modules and BIG.Mix can use 1 to 64 MixColumns (on 32-bit inputs) modules.
Thus, hardware implementations (FPGA or ASIC) offer a very large choice of speed/area tradeoffs. In addition, it is possible to reuse the MixColumns modules of the BIG.Sub step to perform the BIG.Mix step (incurring extra cycles latency though). Eventually, it is possible to unroll the whole compression function by pipelining the BIG.Sub+BIG.Mix step.
Classes of implementation strategies can thus be represented by a 6-tuple:
(#Sbox, #MixColumns, #AESRound, ReuseMix, CompactSbox, CompactMix)
where the number of modules (the first half of the tuple) is per compression function iteration, and where the ReuseMix flag indicates if the MixColumn modules are shared between the BIG.Sub and BIG.Mix steps.
Among the wide spectrum of implementations, we find the two extremes:
- High Throughput, corresponding to (32×16, 3×16×4, 2, 0, 1, 0). The latency of the compression function iteration is lowered by pipelining all stages of BIG.Sub and BIG.Mix with minimum inter-layers FSM logic. This implementation strategy was experimented in [1Optimized fully unrolled and parallel compress for ECHO], [2SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec], and [4Implementation and evaluation of SHA-3 candidates on FPGA], resulting in interesting throughputs on Virtex 5, Virtex 6 as well as on ASICs (at the expense of a big area cost).
- Low Area/, corresponding to (1, 1, 1, 1, 1, 1). The aim is a maximum reuse of the atomic units, each unit being as small as possible. A single S-box and a single MixColumns unit are needed to implement an AES round as well as the BIG.Mix step. This strategy was implemented in [7A Compact FPGA Implementation of the SHA-3 Candidate ECHO] on a Virtex 5, resulting in a highly compact design (at the expense of a relatively reduced throughput).
detailed FPGA performances
Here is a summary of ECHO douple-pipe performance on common FPGA platforms. For comparative performance figures with other candidates, take a look the SHA3 zoo hardware page. The FPGA sections concern high speed and low area extremes. Note that ECHO offers one of the highest throughput and one of the most compact area on FPGA.256-bit hash | 512-bit hash | ||||||||||
Tput. (Gbps) |
#Slices or #LEs |
Freq. (MHz) |
Latency (cycles) |
Tput. (Gbps) |
#Slices or #LEs |
Freq. (MHz) |
Latency (cycles) |
Strategy | |||
high speed |
Xilinx | Virtex 6 1Optimized fully unrolled and parallel compress for ECHO, on virtex 6 - xc6vlx75t-3ff784 | 29.457 | 8,071#Slices LUT: 25,892 #Slices Registers: 6,411 | 172.6 | 9 | — | — | — | — | Fully unrolled and parallel (512, 192, 2, 0, 1, 0) |
Virtex 5 1Optimized fully unrolled and parallel compress for ECHO, on virtex 5 - xc5vlx155t-3ff1136 | 26.390 | 10,407#Slices LUT: 33,152 #Slices Registers: 10,870 | 154.6 | 9 | — | — | — | — | Fully unrolled and parallel (512, 192, 2, 0, 1, 0) |
||
Virtex 5 2Hardware Evaluation of SHA-3 Hash Function Candidate ECHO | 14.860 | 9,333 | 87.1 | 9 | 7.810 | 9,097 | 83.9 | 11 | Fully unrolled and parallel (512, 192, 2, 0, ?, 0) |
||
Virtex 5 4SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec, on virtex 5 - xc5vlx155t (core) |
23.860 | 15,006#Slices LUT: 29,330 #Slices Registers: 4,105 | 139.0 | 9 | — | — | — | — | Fully unrolled and parallel (512, 192, 2, 0, ?, 0) |
||
Virtex 5 4SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec, on virtex 5 - xc5vlx155t (core) |
3.56 | 12,061#Slices LUT: 14,407 #Slices Registers: 8,800 | 187.0 | 81 | — | — | — | — | BIG.Sub: 1/4th of ECHO 16 SBox/AES, 2 rounds BIG.Mix: 64 in a row (128, 92, 2, 0, ?, 0) |
||
Virtex 5 5Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII, on virtex 5 - xc5vlx30-3ff324 | 2.312 | 2,827#Slices LUT: 9,885 #Slices Registers: 4,198 | 149.0 | 99 | — | — | — | — | BIG.Sub: 1/8th of ECHO 16 SBox/AES, 1 round BIG.Mix: 16 cells/reuse 4 (64, 16, 1, 1, 1, 0) |
||
Altera | Cyclone II 3Implementation and evaluation of SHA-3 candidates on FPGA | 0.397 | 39,091 | 70.6 | 273 | 0.212 | 39,091 | 70.6 | 341 | BIG.Sub: 1/32th of ECHO 16 SBox/AES, 1 round BIG.Mix: 64 in a row (16, 68, 1, 0, 0, 0) |
|
low area |
Xilinx | Virtex 5 7A Compact FPGA Implementation of the SHA-3 Candidate ECHO, on virtex 5 - xc5vlx50-2 | 0.072 | 127 +1 mem |
352.0 | 6593 | — | — | — | — | BIG.Sub: 1/256th of ECHO 1 SBox/AES, 1 round BIG.Mix: 1 MixColumns reused (1, 1, 1, 1, 1, 1) |
The VHDL implementations, whenever publicly available, can be downloaded. We also provide a new implementation with very high throughput on Xilinx Virtex 5 and Virtex 6: the synthesis and mapping reports of the Xilinx ISE software are included in the source package.
detailed ASIC performances
Here is a summary of ECHO douple-pipe performance on ASIC platforms. As for FPGAs, comparative studies with other candidates can be found on the SHA-3 zoo for high speed as well as low cost designs.256-bit hash | 512-bit hash | |||||||||
Tput. (Gbps) |
#Gates (KGates) |
Freq. (MHz) |
Latency (cycles) |
Tput. (Gbps) |
#Gates (KGates) |
Freq. (MHz) |
Latency (cycles) |
Strategy | ||
high speed |
UMC 0.09 μm 6Developing a Hardware Evaluation Method for SHA-3 Candidates | 13.966 | 260.0 | 291 | 32 | — | — | — | — | BIG.Sub: 1/4th of ECHO 16 Sbox/AES, 2 rounds BIG.Mix: 16 in a row (128, 32, 2, 0, 1, 0) |
UMC 0.13 μm 2Hardware Evaluation of SHA-3 Hash Function Candidate ECHO | 14.850 | 521.1 | 87.1 | 9 | 7.750 | 516.8 | 83.3 | 11 | Fully unrolled and parallel (512, 192, 2, 0, ?, 0) |
|
UMC 0.18 μm 8High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD and Skein | 2.246 | 141.49 | 141.84 | 97 | — | — | — | — | BIG.Sub: 1/8th of ECHO 16 SBox/AES, 1 round BigMix: 16 in a row (64, 32, 1, 0, ?, 0) |
|
low area |
UMC 0.09 μm 6Developing a Hardware Evaluation Method for SHA-3 Candidates | 0.204 | 60.0 | 137.061 | 1034 | — | — | — | — | BIG.Sub: 1/128th of ECHO 4 SBox/AES, 1 round BIG.Mix: 64 in a row (4, 65, 1, 0, 1, 0) |
UMC 0.13 μm 2Hardware Evaluation of SHA-3 Hash Function Candidate ECHO | 0.373 | 82.8 | 66.6 | 274 | — | — | — | — | BIG.Sub: 1/32th of ECHO 16 SBox/AES, 1 round BIG.Mix: 64 in a row (16, 68, 1, 0, ?, 0) |
|
The smallest of the reported implementations of ECHO requires 60.0 KGE. We however claim that this figure is overestimated. Indeed, the authors of [6Developing a Hardware Evaluation Method for SHA-3 Candidates] use a (4, 65, 1, 0, 1, 0) strategy which is still far away from the (1, 1, 1, 1, 1, 1) strategy used for FPGAs in [7A Compact FPGA Implementation of the SHA-3 Candidate ECHO]. In addition, the most compact implementations of the AES (using a single S-box and a single MixColumns unit) have an area of 3.1 KG [9Design and Implementation of Low-area and Low-power AES Encryption Hardware Core], but the authors use two S-boxes as well as a dedicated key expansion unit. Without those units, the area drops to roughly 1.77 KG. Obviously, ECHO additionally requires to sore the state and a (fixed) salt, a 64-bit key, a 64-bit addition unit, and some logic and counters for the FSM driving the BIG.Sub/BIG.Mix/BIGFinal units. The addition unit and the FSM logics amounts to 1.25 KG approximately and storing the state requires about 5000 bits (i.e. 30 KG since a bit on UMC 0.13 μm ASIC is 6 GE, see for instance [10Compact Implementations of Pairings]). Hence, it should be possible to implement ECHO in about 33 KG.
references
- [1] Optimized fully unrolled and parallel compress for ECHO, by Mabrouk and Benadjila
- [2] Hardware Evaluation of SHA-3 Hash Function Candidate ECHO, by Lu, O'Neill, and Swartzlander
- [3] Implementation and evaluation of SHA-3 candidates on FPGA, by Kinsy and Uhler
- [4] SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec, by Ramakers and Narinx
- [5] Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII, by Kobayashi, Ikegami, Matsuo, Sakiyama, and Ohta
- [6] Developing a Hardware Evaluation Method for SHA-3 Candidates, by Henzen, Gendotti, Guillet, Pargaetzi, Zoller and Gurkaynak
- [7] A Compact FPGA Implementation of the SHA-3 Candidate ECHO, by Beuchat, Okamoto, and Yamazaki
- [8] High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD and Skein, by Tillich, Feldhofer, Kirschbaum, Plos, Schmidt, and Szekely
- [9] Design and Implementation of Low-area and Low-power AES Encryption Hardware Core, by Hämäläinen, Alho, Hännikäinen, and Hämäläinen
- [10] Compact Implementations of Pairings, by Van Herrewege, Batina, Knežević, Verbauwhede, and Preneel