PostgreSQL Conference Europe 2018 - Speaker Interview - KaiGai Kohei

Speaker Interview: KaiGai Kohei

GPU and NVME accelerates PostgreSQL - a challenge beyond 10GB/s for big-data query execution Friday 11:50 Casablanca

Twitter: @kkaigai Blog: kaigai.hatenablog.com Facebook: kkaigai Company website: heterodb.com

Could you briefly introduce yourself?

I’ve joined PostgreSQL community since 2006; contributed for security features, foreign-data-wrappers, custom-scan interface and so on.

By 2017, I spend most of my job career for open-source development in NEC. I’ve contributed various features of Linux kernel, PostgreSQL and so on.

After that, I founded a startup company (HeteroDB,Inc) at Tokyo, to deliver a big-data solution for PostgreSQL based on the heterogeneous-computing technology.

How do you engage with the PostgreSQL Community?

We now live in very exciting era. Moore’s law faces the limitation, so processor is moving to heterogeneous architecture. Storage layer is revised by flash-storage or persistent memory.

I like to contribute PostgreSQL to utilize these evolutions of the modern hardware, to pull out maximum capability for SQL workloads.

Have you enjoyed previous pgconf.eu or FOSDEM conferences, either as attendee or as speaker?

I’ve attended PGconf.EU 2012 (Prague) and 2015 (Wien) as speaker for both times. I presented the initial prototype of PG-Strom based on FDW in 2012, then its enhanced and matured implementation was shown in 2015.

What will your talk be about, exactly? Why this topic?

You might have an impression “GPU is an accelerator for heavy computing workloads”, like HPC, simulation, 3D-gaming, or machine-learning. It is right, but not all.

We will present that a proper software architecture optimized for heterogeneous hardware can utilize GPU devices for I/O major workloads also. Our benchmark results shows 10GB/s query execution throughput on a single PostgreSQL instance by utilization of GPU and NVME-SSD.

What is the audience for your talk?

People who are interested in or concern about big-data processing, especially, log-data including M2M scenario.

What existing knowledge should the attendee have?

In addition to basic PostgreSQL knowledge, brief understanding of x86_64 hardware architecture is preferable. But I will pay attention to keep tangible representation.

What is the one feature in PostgreSQL 11 which you like most?

Hash-based partitioning and parallel-append

It enables to distribute mass-data into multiple storage devices. If we would not have these features, we are fundamentally bound by the I/O bus bandwidth even if GPU run SQL workloads with thousands cores in parallel.

Which other talk at this year’s conference would you like to see?

Partition a 10TB table (by Reinier Haasjes, Marco van Eck)
Pluggable Storage in PostgreSQL (by Andres Freund)
Towards more efficient query plans: PostgreSQL 11 and beyond (by Alexander Kuzmenkov)