NHR PerfLab Seminar: General-Purpose GPU Hashing Data Structures and their Application in Accelerated Genomics

Symbolic picture for the article. The link opens the image in a large view.

Speakers: Daniel Jünger and Prof. Bertil Schmidt (University of Mainz)

Title: General-purpose GPU Hashing Data Structures and their Application in Accelerated Genomics

Date and time: Tuesday, November 23, 2 p.m. – 3 p.m.

Slides

Abstract:

A broad variety of applications relies on associative data structures that exclusively support insert, retrieve, and delete operations. Hash maps represent such a class of effective dictionary implementations. Properties such as amortized constant time complexity for these table operations as well as a compact memory layout make them versatile data structures with manifold applications in data analytics and artificial intelligence. The rapidly growing amount of data emerging in many scientific fields can often only be tackled with modern massively parallel accelerators such as GPUs. Numerous GPU hash table implementations have been proposed over the recent years. However, most of these implementations lack flexibility in order to be used in existing analytics pipelines or suffer from significant performance degradation for certain application scenarios. As a more recent approach, the WarpCore framework aims to alleviate these aforementioned restrictions by placing a focus on both versatility and performance. In this talk we reflect the key concepts of the WarpCore library and provide a performance evaluation against the state-of-the-art. We further explore how WarpCore can be used for accelerating two bioinformatics applications (metagenomic classification and k-mer counting) with significant speedups. WarpCore is open source software written in C++/CUDA-C and can be downloaded at https://github.com/sleeepyjack/warpcore.

Short Bios:

Daniel JüngerDaniel Jünger is a Ph.D. candidate in the Parallel and Distributed Architectures group at JGU Mainz in Germany. Daniel’s main focus is accelerating bioinformatics applications targeting massively parallel accelerators and designing associated sparse in-memory data structures such as bloom filters and hash maps. Daniel’s research has been published in the Cluster Computing journal, the prestigious IEEE International Parallel & Distributed Processing Symposium (IPDPS), and the IEEE International Conference on High Performance Computing, Data, and Analytics (Best Paper Award winner 2020).

Bertil SchmidtBertil Schmidt is a full professor at the Institute of Computer Science at JGU Mainz. Prior to that he was faculty member at Nanyang Technological University (NTU) in Singapore and the University of New South Wales. His research group has designed and implemented a variety of massively parallel algorithms and tools focusing on the analysis of large-scale datasets on numerous platforms including GPUs, FPGAs, Supercomputers, and Systolic Arrays. For his pioneering research work on GPU computing, he has received a CUDA Academic Partnership award, a CUDA Professor Partnership award and three Best Paper Awards (IEEE ASAP 2009, IEEE ASAP 2015, and IEEE HiPC 2020). His active collaboration with Shandong University has led to various parallel methods for life science applications that can scale towards many thousands of nodes on world-leading supercomputers (such as Sunway Taihu Light and Tianhe-2). His recent work also focuses on the AnyDSL system in collaboration with Saarland University. He serves as Associate Editor of the Journal of Parallel and Distributed Computing (JPDC) and the Journal of Computational Science. He also authored the textbook “Parallel Programming: Concepts and Practice” (published with Morgan Kaufmann) which provides an upper-level introduction to parallel programming.