Home Big Data Introducing Predictive Optimization: Quicker Queries, Cheaper Storage, No Sweat

Introducing Predictive Optimization: Quicker Queries, Cheaper Storage, No Sweat

Introducing Predictive Optimization: Quicker Queries, Cheaper Storage, No Sweat


We’re excited to announce the Public Preview of Databricks Predictive Optimization. This functionality intelligently optimizes your desk information layouts for improved efficiency and cost-efficiency.

Predictive Optimization leverages Unity Catalog and Lakehouse AI to find out the most effective optimizations to carry out in your information, after which runs these operations on purpose-built serverless infrastructure. This considerably simplifies your lakehouse journey, liberating up your time to deal with getting enterprise worth out of your information.

This functionality is the most recent in a protracted line of Databricks capabilities which harness AI to predictively carry out actions primarily based in your information and its entry patterns. Beforehand, we launched Predictive I/O for reads and updates, which apply these strategies when executing learn and replace queries. 


Lakehouse tables enormously profit from background optimizations which enhance their information layouts. This consists of compaction of recordsdata to make sure correct file sizes, or vacuuming to scrub up unneeded information recordsdata. Correct optimization considerably improves efficiency whereas driving down prices.

Nonetheless, this creates an ongoing problem for information engineering groups, who want to determine: 

  • Which optimizations to run?
  • Which tables ought to be optimized?
  • How typically to run these optimizations?

As lakehouse platforms develop in scale, and develop into more and more self-service, platform groups discover it nearly inconceivable to reply these questions successfully. A recurring sentiment we have now heard from our prospects is that they can’t sustain with optimizing the variety of tables created from all the brand new enterprise use instances.

Moreover, even as soon as these thorny questions are answered, groups nonetheless should take care of the operational burden of scheduling and operating these optimizations – e.g., scheduling jobs, diagnosing failures, and managing the underlying infrastructure. 

How Predictive Optimization works

With Predictive Optimization, Databricks tackles these thorny issues for you, liberating up your invaluable time to deal with driving enterprise worth along with your information. Predictive Optimization might be enabled with a single button click on. From there, it does all of the heavy lifting.

Databricks intelligently determines the best schedule of optimizations, runs those optimizations, and logs their impact in a systems table for easy observability

First, Predictive Optimization intelligently determines which optimizations to run, and the way typically to run them. Our AI mannequin considers a variety of inputs, together with the utilization patterns of your tables, and their present information format and efficiency traits. It then outputs the perfect optimization schedule, weighing the anticipated advantages of optimization in opposition to the anticipated compute prices. 

As soon as the schedule is generated, Predictive Optimization mechanically runs these optimizations on the purpose-built serverless infrastructure. It mechanically handles spinning up the proper quantity and dimension of machines, and ensures that optimization duties are correctly binpacked and scheduled for optimum effectivity. 

The entire system runs end-to-end with out the necessity for handbook tweaking and tuning, and learns out of your group’s utilization over time, optimizing the tables that matter to your group whereas deprioritizing people who don’t. You might be billed just for the serverless compute required to carry out the optimizations. Out-of-the-box, all operations are logged in a system desk, so you’ll be able to simply audit and perceive the affect and value of the operations.


In the previous couple of months, we have now enrolled quite a few prospects within the personal preview program for Predictive Optimization. Many have noticed that it is ready to discover the candy spot between two widespread extremes:

Side by side images show the tradeoffs between query performance and cost between no optimizations at all and daily, manual optimizations.

On one excessive, some organizations haven’t but stood up subtle desk optimization pipelines. With Predictive Optimization, they will immediately begin optimizing their tables with out determining the most effective optimization schedule or managing infrastructure.

On the opposite excessive, some organizations could also be over-investing in optimization. For instance, for a group automating their optimization pipelines, it’s tempting to run hourly or each day OPTIMIZE or VACUUM jobs. Nonetheless, these stand the danger of diminishing returns. May the identical efficiency positive factors be achieved with fewer optimization operations? 

Predictive Optimization helps discover the correct stability, guaranteeing that optimizations are run solely with excessive return on funding:

Side by side graphs show that for both query performance and cost, Predictive Optimization finds the right balance and only runs optimizations with high return on investment.

As a concrete instance, the Information Engineering group at Anker enabled Predictive Optimization and rapidly realized these advantages: 


Anker company logo2x question speed-up

50% discount in annual storage prices

graph of annual storage costs over time

“Databricks’ Predictive Optimizations intelligently optimized our Unity Catalog storage, which saved us 50% in annual storage prices whereas dashing up our queries by >2x. It realized to prioritize our largest and most-accessed tables. And, it did all of this mechanically, saving our group invaluable time.”

— Shu Li, Information Engineering Lead, Anker

Get began

Beginning at this time, Predictive Optimization is offered in Public Preview. Enabling it ought to take lower than 5 minutes. As an account admin, merely go to the account console > settings > function enablement tab, and toggle on the Predictive Optimization setting:

Set the Predictive optimization field in Account console > Settings > Feature Enablement

In only a click on, you’ll get the facility of AI-optimized information layouts throughout your Unity Catalog managed tables, making your information quicker and cheaper. See the documentation for extra data.

And we’re simply getting began right here. Within the coming months, we are going to proceed so as to add extra optimizations to the potential. Keep tuned for rather more to come back.



Please enter your comment!
Please enter your name here