Javierdlrm

Javierdlrm

Python and the AI Lakehouses

Event: PyData NYC | Date: Friday, November 8th 2024

Talk: Link

Abstract

The Lakehouse is a new open set of standards for separating data storage from data query engines, and it is becoming the dominant data platform for analytics. But the Lakehouse is not sufficient to augment applications with AI to make them more intelligent. Enterprises that have adopted the Lakehouse still have problems getting AI systems into production. The current Lakehouse architecture lacks native support for Python and real-time AI systems.

In this talk, we describe the capabilities that need to be added to Lakehouse to make it an AI Lakehouse that can support building and operating AI-enabled batch and real-time applications as well as LLM-powered applications. We will also talk about support for Python in the Lakehouse and work we have done on the Hopsworks Query Service that provides high-speed access to Lakehouse tables using Apache Arrow and temporal join support to create point-in-time correct training data from Lakehouse tables.