Join us on Thursday, January 30, 2025, from 12:45–1:45 p.m. in the Stuart Building, room 113, for a Computer Science Seminar titled “New Models for Effective Data Sharing and Communication” featuring speaker Valerie Hoyot-Sasson.
Abstract
With the increasing heterogeneity of computing infrastructure and the evolution of scientific applications from long-running simulations to artificial intelligence and machine learning workflows, new task-based computing frameworks (e.g., Ray, Dask, Globus Compute) have been created to take advantage of federated computing infrastructure. However, while these frameworks facilitate distributed execution, much of the responsibility for managing data remains the responsibility of the user. For example, efficiently managing and transferring data in these federated environments is challenging, as data producers are unaware of their potential consumers and communication may be temporally and referentially decoupled. We developed ProxyStore, a Python library for just-in-time resolution of Python objects, that enables fast federated communication through its Proxy abstraction. The ProxyStore Proxy object is a lightweight reference to the serialized data object located in one of the various connectors (e.g., Redis, local shared file system, P2P federated endpoints) supported by ProxyStore. Several high-level abstractions, such as futures, streaming and ownership, build on the Proxy abstraction to facilitate the expression of complex data communication patterns found commonly in scientific computing. Benchmark results demonstrate that the use of ProxyStore in scientific applications reduces runtime and improves both CPU and GPU utilization. I will conclude by discussing important future challenges related to efficient communication in federated applications, specifically looking at the intersection of open data initiatives that have transformed scientific computing and the adoption of data streaming as a method for rapid and scalable data processing.