The 4th of July, I was in Toulouse, France, to present our work on GPU virtualization solution for HPC, at the French COMPAS conference on parallelism, archictures and systems. The presentation was about OpenCL accelerator API remoting for HPC computin, and GPU hardware-assisted pass-through. The poster was about virtual machine live and incremental checkpointing.
OpenCL API Remoting and Qemu live and incremental checkpointing is part of our ExaNoDe activities.
GPU virtualization solutions for HPC
Kevin Pouget, Alvise Rigo, Daniel Raho (Virtual Open Systems)
- OpenCL API Remoting
- GPU Hardware-Assisted Pass-through
Today, my colleage Radoslav Dimitrov is at Ljubljana, Slovenia, at the eXdci European HPC Summit Week to present our work on virtualisation at the second ExascaleHPC joint-Workshop between ExaNoDe, ExaNeSt, ECOSCALE and EuroEXA projects.
The talk is entitled:
Virtualization technologies in modern HPC systems
It presents two aspects of our virtualization work:
- Software switches
- API Remoting in OpenCL and MPI
Virtual Open Systems March newsletter was published today, with an article about my work:
Checkpointing for HPC: High performance live checkpointing
At the 2018 HiPEAC ExascaleHPC workshop organized in the context of the ExaNoDe EC project, Virtual Open Systems has presented the progress of its implementation of live and incremental checkpointing, for Qemu-KVM.
The live aspect of this work reduce the virtual machine (VM) downtime to a few milliseconds, while the RAM is copied to disk in background; and the incremental aspect allows to checkpoint only the pages actually modified since the previous checkpoint. Periodic virtual machine checkpointing improves the reliability of HPC and cloud-computing environments, as it prevents the loss of volatile data in case of hardware failure. The live aspect makes it virtually transparent for the user, whose VM keeps running unaltered during the checkpointing. The incremental aspect further reduces the checkpoint impact on the system, as only part of the RAM is saved, and also reduces the footprint of the checkpoints on the disk.
The challenges behind both aspects of the checkpointing are related to the tracking and handling of the memory pages being modified by the guest system. Virtual Open Systems developed a novel approach to track these changes in Qemu, which guarantees the consistency of every checkpoint, regardless the activity of the guest system.
The 28th of January, I was in Manchester, UK, to present our work in ExaNoDe and ExaNeSt projects at the ExascaleHPC workshop of the HiPEAC conference.
K Pouget, A. Mouzakitis, R. Dimitrov, A. Rigo, D. Raho (Virtual Open Systems)
- Live and Incremental Checkpointing
- Unimem RDMA Para-Virtualization
The paper is entitled Paving the way towards a highly energy-efficient and highly integrated compute node for the Exascale revolution: the ExaNode approach:
Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution. Current architectural design and manufacturing technologies are not able to provide the requested level of density and power efficiency to realise an operational Exascale machine. A disruptive change in the hardware design and integration process is needed in order to cope with the requirements of this forthcoming computing target. This paper presents the ExaNoDe H2020 research project aiming to design a highly energy efficient and highly integrated heterogeneous compute node targeting Exascale level computing, mixing low-power processors, heterogeneous co-processors and using advanced hardware integration technologies with the novel UNIMEM Global Address Space memory system.
The journal article I co-authored with B. Videau within the Mont-Blanc project is now published in The International Journal of High Performance Computing Applications.
- 5.3 Porting SPECFEM3D application kernels: From CUDA to OpenCL using BOAST
- 4.3 Non-regression testing using trace debugging
The portability of real high-performance computing (HPC) applications on new platforms is an open and very delicate problem. Especially, the performance portability of the underlying computing kernels is problematic as they need to be tuned for each and every platform the application encounters. This article presents BOAST, a metaprogramming framework dedicated to computing kernels. BOAST allows the description of a kernel and its possible optimizations using a domain-specific language. BOAST runtime will then compare the different versions’performance as well as verify their exactness. BOAST is applied to three use cases: a Laplace kernel in OpenCL and two HPC applications BigDFT (electronic density computation) and SPECFEM3D (seismic and wave propagation).
The article I co-authored with R. Jakse got accepted at the 28th ISSRE conference. It will take place in Toulouse, France, on October 23-26th, 2017.
Monitoring is the study of a system at runtime, looking for input and output events to discover, check or enforce behavioral properties. Interactive debugging is the study of a system at runtime in order to discover and understand its bugs and fix them, inspecting interactively its internal state.
Interactive Runtime Verification (i-RV) combines monitoring and interactive debugging. We define an efficient and convenient way to check behavioral properties automatically on a program using a debugger. We aim at helping bug discovery while keeping the classical debugging techniques and interactivity, which allow understanding and fixing bugs.
Since April 1st, 2017, I joined Virtual Open Systems SME, in Grenoble, France.
Virtual Open Systems is an innovative, agile and dynamic start-up company operating in linux, android, SMP virtualization and cloud computing software solutions. The company delivers most efficient software architectures products and services for heterogeneous embedded multi-core platforms that increase value to customers, helps them lower costs and reduce time to market while improving control, security and meeting new business requirements. The company’s core business is on virtualization solutions and virtualization custom extensions for complex heterogeneous multi-core SoC spanning from embedded to HPC, including the exascale. Our team consists of talented engineers with strong technical skills on KVM virtualization and Linux.
In these projects, Virtual Open Systems is responsible for the Qemu virtualization layer, and in particular the creation of VM snapshots and checkpoints, as well as the virtualization of compute accelerators.
After three years of PhD thesis and two years on the DEMA project, the work on model-centric debugging and mcGDB is over! There is currently no plan of continuation, so the project will stay in its current state.
These presentations summarize the five years of work:
And these documents describe it with more details: