Speaker Interview: Burak Yucesoy

The HyperLogLog Algorithm: How it works and why you will love it   Thursday 12:50   Casablanca

Twitter: @byucesoy Blog: www.citusdata.com/blog/ LinkedIn: byucesoy Company website: citusdata.com

Could you briefly introduce yourself?

I'm Burak Yucesoy. I work at Citus Data as a software engineer. In the past, I spend time on a wide range of topics from flying robots to machine learning. Now I'm working on in distributed databases and PostgreSQL.

How do you engage with the PostgreSQL Community?

Mostly by developing extensions on top of PostgreSQL and attending conferences. I was at PostgresOpen SV, pgconf.eu, PGDay Istanbul either as speaker or attendee. Apart from that, I'm part of the team which develops our open source PostgreSQL extension, Citus, which makes PostgreSQL a distributed database. I'm also the maintainer of postgresql-hll extension.

Have you enjoyed previous pgconf.eu or FOSDEM conferences, either as attendee or as speaker?

I attended the previous pgconf.eu conference at Warsaw as a speaker. I enjoyed a lot thanks to good talk selections and opportunity of meeting new people from PostgreSQL community.

What will your talk be about, exactly? Why this topic?

I'll talk about HyperLogLog algorithm for estimating COUNT(DISTINCT) queries. Estimating COUNT(DISTINCT) sounds like a very specific thing with small application area but when you think about it you'd realize almost any application would need to run some sort of COUNT(DISTINCT) query.

HyperLogLog can estimate cardinality very quickly and with very little memory footprint, but more importantly, it does this so elegantly, maybe as elegant as binary search. The idea behind HyperLogLog so simple, I was amazed by the accuracy of it when I first saw the algorithm.

What is the audience for your talk?

I'll talk about HyperLogLog algorithm, its internals and real-world applications of HyperLogLog in production. Since the talk will be self-contained and a complete overview of HyperLogLog algorithm; from internals to real world applications, I believe both application developers and algorithm enthusiast can enjoy the talk.

What existing knowledge should the attendee have?

Not much, but some knowledge about aggregations in PostgreSQL would be nice.

What is the one feature in PostgreSQL 11 which you like most?

Improvements on partitioning and JIT.

Which other talk at this year’s conference would you like to see?

  • Advanced Logical Replication
  • Review of Patch Reviewing
  • Towards more efficient query plans: PostgreSQL 11 and beyond