Collaboration among European research initiatives is crucial for the optimization of novel technologies and optimization of software. Thanks to the participation of Barcelona Supercomputing Center (BSC) and Polytechnic University of Valencia (UPV) researchers in the European eFlows4HPC and EPI projects, they developed optimised kernels for machine and deep learning (DL) operators that can replace the compute-intensive simulations in application workflows. EPI has provided a set of hardware/software tools for testing EPAC-VEC demonstrating that this technology can significantly accelerate the performance of the convolution.
After three years of research, the European project eFlows4HPC came to its end in February 2024. One of its target architectures was to investigate the benefits of novel computer architectures, such as the ones from EPI. eFlows4HPC experts optimized kernels for specific heterogeneous architectures. In particular, they optimised kernels on ARM-based architectures, equipped with “narrow” SIMD arithmetic units (single instruction, multiple data) arithmetic units, and RISC-V architectures, equipped with long vector processing units such as the one proposed in EPI, putting a strong emphasis on portability.
In addition, BSC and UPV experts migrated some of the computational kernels in the eFlows4HPC workflows to the EPI. eflows4HPC experts tested SVD on EPI hardware obtaining a clear the reduction of execution time from using the VPU (vector version of the routines) compared with an execution only using the RISC-V core (scalar version).
One of the eFlows4HPC Key Exploitable results (KER) is Convolution operators on multicore ARM and RISC-V architectures (CONVLIB), fully developed by UPV. CONVLIB is a library containing high performance implementations of convolution algorithms for multicore platforms with ARM and RISC-V architectures. It contains a driver routine that identifies the best values for four hyper-parameters: micro-kernel, cache configuration parameters, parallelization loop and algorithm, automatically adapting the call to the dimensions of the convolution operator. At EPAC-VEC level, UPV researchers in collaboration with BSC experts migrated ConvLIB to exploit the VPU accelerator in this architecture showing that the EPAC-VEC can significantly accelerate the performance of the convolution. “These experiments show that the EPAC-VEC can significantly accelerate the performance of the convolution, but it does require a very careful implementation of the codes”, states Enrique S. Quintana, UPV professor and eFlows4HPC work package leader. This library is now available at the open software repository here: https://github.com/hpca-uji/ConvLIB
The results of this research have also been published in the following peer-reviews publication:
Ramírez, A., Castelló, and E. S. Quintana-Ortí, “A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor,” J. of Supercomputing, 2022. https://doi.org/10.1007/s11227-022-04581-6.
About eFlows4HPC
eFlows4HPC is a European-funded project with a budget of €7.6M that started on 1 January 2021 and lasted 3 years and 2 months. Coordinated by BSC (Spain), the project brings together a multidisciplinary consortium: CIMNE (Spain), FZJ (Germany), UPV (Spain), ATOS (France), DtoK Lab (Italy), CMCC (Italy), INRIA (France), SISSA (Italy), PSNC (Poland), UMA (Spain), AWI (Germany), INGV (Italy), ETHZ (Switzerland), Siemens (Germany), and NGI (Norway).
The eFlows4HPC project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Germany, France, Italy, Poland, Switzerland, Norway. It also received funding from MCIN/AEI/10.13039/501100011033 and the European Union NextGenerationEU/PRTR (PCI2021-121957).