





## Mobiles: on the road to Supercomputers

**GoingDigital Community - Mobile World Congress 19** 

**Prof. Mateo Valero** 

**BSC** Director





A programme of



## Barcelona Supercomputing Center Centro Nacional de Supercomputación

#### **BSC-CNS** objectives



Supercomputing services to Spanish and EU researchers



R&D in Computer, Life, Earth and Engineering Sciences



PhD programme, technology transfer, public engagement

BSC-CNS is a consortium that includes

**Spanish Government** 

60%



**Catalonian Government** 

30%



Univ. Politècnica de Catalunya (UPC)

10%

UNIVERSITAT POLITÈCNICA DE CATALUNYA BARCELONATECH



## **Technological Achievements**

- Transistor (Bell Labs, 1947)
  - DEC PDP-1 (1957)
  - IBM 7090 (1960)
- Integrated circuit (1958)
  - IBM System 360 (1965)
  - DEC PDP-8 (1965)
- Microprocessor (1971)
  - Intel 4004





#### **Technology Trends: Microprocessor Capacity**





2X transistors/Chip Every 1.5 years Called "Moore's Law"

Microprocessors have become smaller, denser, and more powerful. Not just processors, bandwidth, storage, etc Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.

## **ANNOUNCING TESLA V100**

## GIANT LEAP FOR AI & HPC VOLTA WITH NEW TENSOR CORE

21B xtors | TSMC 12nm FFN | 815mm<sup>2</sup>

5,120 CUDA cores

7.5 FP64 TFLOPS | 15 FP32 TFLOPS

**NEW 120 Tensor TFLOPS** 

20MB SM RF | 16MB Cache

16GB HBM2 @ 900 GB/s

300 GB/s NVLink



6

## In the beginning ... there were only supercomputers

- Built to order
  - Very few of them
- Special purpose hardware
  - Very expensive
- Control Data
- (Cray-1
  - 1975, 160 MFLOPS
    - 80 units, 5-8 M\$
- ( Cray X-MP
  - 1982, 800 MFLOPS
- Cray-2
  - 1985, 1.9 GFLOPS
- ( Cray Y-MP
  - 1988, 2.6 GFLOPS
- ...Fortran+ Vectorizing Compilers





## "Killer microprocessors"



- Microprocessors killed the Vector supercomputers
  - They were not faster ...
  - ... but they were significantly cheaper and greener
- 10 microprocessors approx. 1 Vector CPU
  - SIMD vs. MIMD programming paradigms

### Then, commodity took over special purpose





#### ( ASCI Red, Sandia

- 1997, 1 Tflops (Linpack),
- 9298 processors at 200 MHz,
- 1.2 Terabytes
- Intel Pentium Pro
  - Upgraded to Pentium II Xeon, 1999, 3.1 Tflops

#### ( ASCI White, Lawrence Livermore Lab.

- 2001, 7.3 TFLOPS,
- 8192 proc. RS6000 at 375 MHz,
- 6 Terabytes,
- IBM Power 3
- (3+3) MWats

Message-Passing Programming Models

## Finally, commodity hardware + commodity software

- MareNostrum
  - Nov 2004, #4 Top500
    - 20 Tflops, Linpack
  - IBM PowerPC 970 FX
    - Blade enclosure
  - Myrinet + 1 GbE network
  - SuSe Linux







## The Killer Mobile processors<sup>TM</sup>



- ( Microprocessors killed the Vector supercomputers
  - **((** They were not faster ...
  - ... but they were significantly cheaper and greener

- (( History may be about to repeat itself ...
  - **((** Mobile processor are not faster ...
  - ... but they are significantly cheaper and greener



Network of c.2,000 European R+D experts in advanced computing: **high-performance** and **embedded** architecture and compilation

720 members, 449 affiliated members and 871 affiliated PhD students from 430 institutions in 46 countries.





### ARM-based prototypes at BSC









**2011** Tibidabo

**2012** KAYLA **2013** Pedraforca

**2014**Mont-Blanc

ARM multicore

ARM + GPU CUDA on ARM ARM + GPU Inifinband RDMA Single chip ARM+GPU OpenCL on ARM GPU











#### Tibidabo: The first ARM HPC multicore cluster



Q7 Tegra 2 2 x Cortex-A9 @ 1GHz 2 GFLOPS 5 Watts (?) 0.4 GFLOPS / W



Q7 carrier board 2 x Cortex-A9 2 GFLOPS 1 GbE + 100 MbE 7 Watts 0.3 GFLOPS / W



1U Rackable blade 8 nodes 16 GFLOPS 65 Watts 0.25 GFLOPS / W



2 Racks

32 blade containers 256 nodes 512 cores 9x 48-port 1GbE switch

512 GFLOPS 3.4 Kwatt 0.15 GFLOPS / W



- ( Proof of concept
  - It is possible to deploy a cluster of smartphone processors
- ( Enable software stack development



#### Mont-Blanc HPC Stack for ARM



#### Industrial applications



#### **Applications**











#### System software













#### Hardware













## **Press Impacts**

















scientific computing world







gizmología

































































## BullSequana compute blade: X1310 Marvell ThunderX2™ (ARMv8) processor







#### Atos and ARMv8

- Atos is the industrial pivot of MontBlanc3
- ARM is one of the Atos strategic directions for the next years
- Europe is leading in ARM development
- Montblanc project is proceeding as expected

#### BullSequana X1310 blade

- Up to 288 nodes in one BullSequana X1000 and up to 96 nodes in one BullSequana XH2000 with:
- 3 compute nodes with 2 Marvell ThunderX2 (ARMv8) processors
- Up to 1024 GB of Memory per node DDR4 @2666 MT/s (w/64GB DIMMs)
- High-speed Ethernet, InfiniBand EDR, HDR or HDR100 on the mezzanine interconnect
- Up to 192 cores per blade









ARM processor – a credible alternative to X86 processor clusters



## **Fujitsu Processor Development**





3



#### Sandia Labs:

- HPE, Astra Supercomputer
- 2592 nodes, 28 core dual
- 2.3 petaflops/peak, 1.529, Linpack)
- # 203, Top500 (Nov. 2018)
- #36, HPCG (Nov. 2018)



#### Others: (smaller systems)

- Nercs Labs: Cray, 1080 cores
- Argonne Labs: HPE, Comanche system

## Why Europe needs its own Processors

- Processors now control almost every aspect of our lives
- Security (back doors etc.)
- Possible future restrictions on exports to EU due to increasing protectionism
- A competitive EU supply chain for HPC technologies will create jobs and growth in Europe



Images courtesy of European Processor Initiative

## **HPC** is a global competition

"The country with the strongest computing capability will host the world's next scientific breakthroughs".

US House Science, Space and Technology Committee Chairman

Lamar Smith (R-TX)





"Our goal is for Europe to become one of the top 3 world leaders in high-performance computing by 2020".

European Commission President **Jean-Claude Juncker** (27 October 2015)

"Europe can develop an exascale machine with ARM technology. Maybe we need an **AIRBUS** consortium for HPC and Big Data".

Seymour Cray Award Ceremony Nov. 2015
Mateo Valero





#### **BSC** and the EC



#### Final plenary panel at ICT - Innovate, Connect, Transform conference, 22 October 2015 Lisbon, Portugal.

the transformational impact of excellent science in research and innovation

""Europe needs to develop an entire domestic exascale stack from the processor all the way to the system and application software", Mateo Valero, Director of Barcelona Supercomputing Center Director of Barcelona Supercomputing Center, Mateo Valero, makes a pledge for developing a strong HPC ecosystem.

Published on 12/04/2016

Europe has the competence and skills to engage in the global competition towards Exascale Supercomputing. To fully benefit from the opportunities of the digital single market, Europe must strengthen the fundamental research on which digital transformation is based and build a stronger European High Performance Computing (HPC) erosystem.

In a quest blog post on Commissioner Günther Oettinger's website Mateo Valero stresses the need for Europe to join the race towards Exascale supercomputing. According to him, there is an open window of opportunity for the High Performance Computing (HPC) development that would stimulate scientific breakthroughs and have tremendous impact on society and







## **EuroHPC & EPI (European Processor Initiative)**

- High Performance General Purpose Processor for HPC
- High-performance RISC-V based accelerator
- Computing platform for autonomous cars
- Will also target the AI, Big Data and other markets in order to be economically sustainable



Images courtesy of European Processor Initiative

## The Open-Source Hardware Opportunity

- In 2015 I said I believed a European Supercomputer based on ARM was possible (Mont-Blanc).
- Even though ARM is no longer European, it can form part of the short-term solution
- The fastest-growing movement in computing at the moment is Open-Source and is called RISC-V
- The future is Open and RISC-V is democratising chip-design



# EuroHPC opens a window of opportunity to create the Airbus/Galileo of HPC









## Mare Nostrum RISC-V inauguration 202X



Por el autor de El código Da Vinci







## Thank you !!!

For further information please contact mateo.valero@bsc.es