

## FRAMEWORK PARTNERSHIP AGREEMENT IN EUROPEAN LOW-POWER MICROPROCESSOR TECHNOLOGIES



THIS PROJECT HAS RECEIVED FUNDING FROM THE EUROPEAN UNION'S HORIZON 2020 RESEARCH AND INNOVATION

PROGRAMME UNDER GRANT AGREEMENT NO 826647



D .....

## 54<sup>TH</sup> EDITION OF THE TOP500 LIST (NOVEMBER 2019)

- Top#1 performance today:
  - 0.2 10<sup>18</sup> Flop/s Peak
  - It is 1/5 of Exascale level of performance
- Users:



| Rank | Site                                                     | System                                                                                                                                                | Cores      | Rmax<br>(TFlop/s) | Rpeak<br>(TFlop/s) | Power<br>(kW) |
|------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|------------|-------------------|--------------------|---------------|
| 1    | DOE/SC/Oak Ridge<br>National Laboratory<br>United States | <b>Summit</b> - IBM Power System AC922, IBM<br>POWER9 22C 3.07GHz, NVIDIA Volta GV100,<br>Dual-rail Mellanox EDR Infiniband<br>IBM                    | 2,414,592  | 148,600.0         | 200,794.9          | 10,096        |
| 2    | DOE/NNSA/LLNL<br>United States                           | <b>Sierra</b> - IBM Power System AC922, IBM<br>POWER9 22C 3.1GHz, NVIDIA Volta GV100,<br>Dual-rail Mellanox EDR Infiniband<br>IBM / NVIDIA / Mellanox | 1,572,480  | 94,640.0          | 125,712.0          | 7,438         |
| 3    | National Supercomputing<br>Center in Wuxi<br>China       | <b>Sunway TaihuLight</b> - Sunway MPP, Sunway<br>SW26010 260C 1.45GHz, Sunway<br>NRCPC                                                                | 10,649,600 | 93,014.6          | 125,435.9          | 15,371        |

Processor design & technology:

| Chip               | Design | Manuf. |
|--------------------|--------|--------|
| IBM POWER9         |        |        |
| NVIDIA Volta GV100 |        | *      |
| Sunway SW26010     | *3     | *3     |



## RACE TO EXASCALE

- CPU architecture choice:
  - Japan approach: Arm/SVE (homogeneous)
  - China approach: Custom many-cores (homogeneous)
  - US approach: x86 + GPU (heterogeneous)





Copyright © European Processor Initiative 2020. Power Management in EPI/PRACE Course Energy Efficiency in HPC, Ostrava/29-01-2020

heterogeneous, accelerated

European Processor Initiative

epi



## WHY EUROPE NEEDS ITS OWN PROCESSORS

- Processors now control almost every aspect of our lives
- Security (back doors etc.)
- Possible future restrictions on exports to EU due to increasing protectionism
- A competitive EU supply chain for HPC technologies will create jobs and growth in Europe
- Sovereignty (data, economical, embargo)



https://www.pearse-trust.ie/blog/the-us-cloud-act-v-the-eus-gdpr-data-privacy-security https://www.defensenews.com/global/europe/2018/08/01/a-jet-sale-to-egypt-is-being-blocked-by-a-us-

regulation-and-france-is-over-it.



## EUROPE'S AMBITION: EUROHPC

- Developing a new European supercomputing ecosystem: HPC systems, network, software, applications, access through the cloud
- Making HPC resources available to public and private users, including SMEs.
- Stimulating a technology supply industry



## EUROPEAN PROCESSOR INITIATIVE

Design a roadmap of future European low power processors



Copyright © European Processor Initiative 2020. Power Management in EPI/PRACE Course Energy Efficiency in HPC, Ostrava/29-01-2020

ENGINEERING

CHALMERS UNIVERSITÀ DI PISA

MA MATER STUDIORUM

eDI Processor Initiative

loint Underta

**<u>Budget</u>: 80 M€** 

**SIPE**ARL

EB Elektrobit

European



## **EPI TECHNOLOGIES**

#### Energy Efficiency Adopt Arm general-purpose CPU core with SVE / vector acceleration in the first EPI chip

- Develop power management solutions for the EPI chip
- Develop acceleration technologies based on RISC-V for better DP GFLOPS/Watt performance
- Inclusion of MPPA for real-time application acceleration
- Inclusion of eFPGA for reconfigurable logic

#### Modularity Supply sufficient Memory Bandwidth (Byte/FLOP)

- Focus on programming models to include accelerations.
- Develop a Common Platform to enable EPI accelerations and that eases incremental roadmap implementation





## **EPI FABLESS COMPANY**

- EPI's Fabless company
  - licence of IPs from the partners
  - develop own IPs around it
  - licence the missing components from the market
  - generate revenue from both the HPC, IA, server and eHPC markets
  - integrate, market, support & sales the chip
  - work on the next generations





## CONCLUSION

- HPC is crucial to resolve societal challenges and preserve European competitiveness
- Europe is going in the right direction with EuroHPC. This must be sustained in the long-term
- The chip design effort must continue for the EU's security and competitiveness, and should create a processor ecosystem covering IoT, servers, cloud, autonomous connected vehicles and HPC





European Processor Initiative





#### **GPP AND COMMON ARCHITECTURE**





### **EPAC - RISC-V ACCELERATOR**



- EPAC TITAN = EPI Accelerator
- VPU Vector Processing Unit (plan of record)
- STX Stencil/Tensor accelerator (PoR)



## **RHEA PROCESSOR**

- Rhea is the first EPI General Purpose Processor
- Rhea targets HPC application
- Rhea is the first « instanciation » of EPI Common Platform
- Rhea design is led by SiPEARL (the EPI fabless company) and joint-developed by EPI partners.



Rhea chip will be integrated into test platforms in order to validate the hardware units, develop the software, and run applications.







PCIe daughter card

Copyright © European Processor Initiative 2020. Power Management in EPI/PRACE Course Energy Efficiency in HPC, Ostrava/29-01-2020

HPC blade



## **RHEA DESIGN**

- Generic processing multi-core backbone:
  - Multi-core Arm Zeus processor with SVE engines for pre-ExaScale level generic processing.
  - Coherent NoC with distributed system level cache to keep the data local.
- Prototypes of High energy-efficient accelerator tiles:
  - RISC-V based acceleration (EPAC) for better GFLOPS/Watt performance.
  - Multi-Purpose Processing Array (MPPA-Kalray) for real-time application acceleration.
  - eFPGA (Menta) reconfigurable logic for flexibility.
  - Accelerators work in I/O coherent mode and share the same memory view as the multi-core backbone.
- HBM2E, DDR5 memory support.
- PCIe gen5 support for loosely coupled accelerators.
- High speed links for SMP extension and tightly coupled accelerators.
- Power Management infrastructure with low voltage support for energy efficiency
- Security infrastructure.
- Peripherals to connect an automotive MCU for PoC purpose.
- First Rhea chip will be fabricated in 6nm technology aiming at the highest processing capabilities and energy efficiency



## RHEA ARCHITECTURE

- Memory-coherent NoC connects
  - Array of computing units (CU): Arm cores, EPAC, MPPA, eFPGA
  - Memory and I/O controllers
  - Bridge to links
- High speed links
  - Die-2-Die links to connect on-package dies
  - HSL links to connect on-board packages
- Top level infrastructures
  - Power management & controller
  - Security

#### NoC: network on chip HSL: High speed links (with memory coherent support)



# **POWER ASPECTS**

ANDREA BARTOLINI (UNIBO) – POWER MANAGEMENT LEADER







## POWER MANAGEMENT SOA & REQUIREMENTS

|             | Intel                | IBM                      | ARM                 | AMD           | Cray               | Fujitsu         |
|-------------|----------------------|--------------------------|---------------------|---------------|--------------------|-----------------|
| Monitor     | S, M, A, T           | N, S, M, A, T, U         | S, M, T             | N, S, M, A, T | N, S, M, A, N      | N, S, C, M      |
| (Domain,Gra | 1ms                  | <b>500us</b> ,10ms       | 1-10KHz with        | 1 sec (C ),   | OOB                | <b>1ms</b> (N), |
| nularity)   |                      | aggregation              | SCP                 | 1ms (G)       | (100ms)            | ~ns - model     |
|             |                      | 16ms for T &<br>U, 100ms |                     |               |                    | based (C)       |
|             |                      | aggregation              |                     |               |                    |                 |
| Control     | S, M                 | N, S, M, A               | S, M                | N, S, M, A    | N, S, M, A         | S, C, M,        |
| (Domain,Gra | RAPL 1ms             | 10-100ms                 | 1-10KHz             | ~secs         | DVFS, RAPL,        | DVFS,           |
| nularity)   | (in-band),           |                          | (100ms to           |               | min-max            | Decode          |
|             | DVFS 500us           |                          | 1s)                 |               | range <i>,</i> 10- | Width,          |
|             |                      |                          |                     |               | 30s at job         | HBM2 B/W        |
|             |                      |                          |                     |               | launch             |                 |
| Interfaces, | RAPL MSRS,           | OpenBMC,                 | ACPI, SCP           | Likwid,       | CapMC,             | Power API,      |
| Tools, etc  | msr-safe,            | amester,                 | (sys ctrl           | PAPI,         | PAPI, Cray         | PAPI            |
|             | libmsr, PAPI,        | Memory Map               | proc), IPA          | Memory        | BMC                |                 |
|             | likwid               |                          | (intelligent        | Мар           | interfaces         |                 |
|             | Source PowerStack 19 |                          | allocator),<br>PAPI |               |                    |                 |

EPI power management design targets:

- Support for fine grain power monitoring, and control
- An higher performance power controller capable of supporting advanced power control algorithms.

Socket (S), Core (C), Memory (M), Accelerator (G), Node (N), Utilization (U), Temperature (T)



## GENERAL ARCHITECTURE

- Top level infrastructures
  - Power management & controller
  - Dedicated power management and control network
  - Security
- EPI Power Management Subsystem
- RISC-V ISA, Derived from the PULP platform
- Parallel processor w. DSP extensions
- Open-Source Design





## https://github.com/pulp-platform







## THE POWER CONTROLLER FIRMWARE



PM task

BMC

- Read voltage regulator, power, status (VR)
- Power model update
- Read pending command queue
- Decode Command/data
- Perform action:
  - Change target P/C state, power budget
    - Set pending BMC
  - Ask telemetry data



[ICECS 19] A. Bartolini et al. A PULP-based Parallel Power Controller for Future Exascale Systems



## CONCLUSIONS

- Power management is a key aspect of HPC processors
- Implemented by mean of a embeddded computing subsystem with extensions for interfacing with the power management IPs.
- EPI will leverage a best-in class power management subsystem based on parallel architecture with DSP extensions.