Computer Science Seminar: ‘Distinct Elements in Streams: An Algorithm for the (Text) Book,’ Featuring Vinodchandran Variyam

Join us Thursday, June 6, for a lecture titled “Distinct Elements in Streams: An Algorithm for the (Text) Book” by speaker Vinodchandran Variyam, professor of computer science and engineering at University of Nebraska–Lincoln. The talk will take place from 12:45–1:45 p.m. in Stuart Building, room 111.

Abstract

The basic computational task of counting the number of distinct elements in datasets, known as the Distinct Elements Problem, is critical in numerous applications. Whether it’s identifying unique users on a website, distinct species in an ecological study, or different words in a large corpus of text, the ability to count distinct elements is crucial for data-driven decision making. The problem becomes challenging when the dataset is too large to be stored or processed in its entirety.

This talk will discuss recent algorithmic progress on the Distinct Elements Problem within the data streaming model. The data streaming model is popular for handling large amounts of streaming data. In this framework, data items arrive sequentially from a finite set, with the model’s parameters imposing strict limitations on storage and processing capabilities. While earlier approaches relied on hashing techniques, a purely sampling-based algorithm remained elusive for many years. We have recently discovered a simple sampling-based algorithm for the Distinct Elements Problem. This talk will present the algorithm and its generalization in a broader data streaming scenario.

The talk is based on joint work with Sourav Chakraborty of the Indian Statistical Institute and Kuldeep S. Meel of the University of Toronto.