Home Big Data Implement mannequin versioning with Amazon Redshift ML

Implement mannequin versioning with Amazon Redshift ML

Implement mannequin versioning with Amazon Redshift ML


Amazon Redshift ML permits knowledge analysts, builders, and knowledge scientists to coach machine studying (ML) fashions utilizing SQL. In earlier posts, we demonstrated how you should utilize the automated mannequin coaching functionality of Redshift ML to coach classification and regression fashions. Redshift ML means that you can create a mannequin utilizing SQL and specify your algorithm, equivalent to XGBoost. You need to use Redshift ML to automate knowledge preparation, preprocessing, and number of your drawback kind (for extra info, confer with Create, practice, and deploy machine studying fashions in Amazon Redshift utilizing SQL with Amazon Redshift ML). You may also convey a mannequin beforehand educated in Amazon SageMaker into Amazon Redshift by way of Redshift ML for native inference. For native inference on fashions created in SageMaker, the ML mannequin kind have to be supported by Redshift ML. Nonetheless, distant inference is offered for mannequin varieties that aren’t natively obtainable in Redshift ML.

Over time, ML fashions develop outdated, and even when nothing drastic occurs, small adjustments accumulate. Frequent explanation why ML fashions must be retrained or audited embody:

  • Knowledge drift – As a result of your knowledge has modified over time, the prediction accuracy of your ML fashions could start to lower in comparison with the accuracy exhibited throughout testing
  • Idea drift – The ML algorithm that was initially used could should be modified because of totally different enterprise environments and different altering wants

You might have to refresh the mannequin regularly, automate the method, and reevaluate your mannequin’s improved accuracy. As of this writing, Amazon Redshift doesn’t help versioning of ML fashions. On this submit, we present how you should utilize the convey your personal mannequin (BYOM) performance of Redshift ML to implement versioning of Redshift ML fashions.

We use native inference to implement mannequin versioning as a part of operationalizing ML fashions. We assume that you’ve got a great understanding of your knowledge and the issue kind that’s most relevant in your use case, and have created and deployed fashions to manufacturing.

Answer overview

On this submit, we use Redshift ML to construct a regression mannequin that predicts the variety of folks that will use town of Toronto’s bike sharing service at any given hour of a day. The mannequin accounts for varied elements, together with holidays and climate circumstances, and since we have to predict a numerical consequence, we used a regression mannequin. We use knowledge drift as a purpose for retraining the mannequin, and use mannequin versioning as a part of the answer.

After a mannequin is validated and is getting used regularly for working predictions, you possibly can create variations of the fashions, which requires you to retrain the mannequin utilizing an up to date coaching set and presumably a special algorithm. Versioning serves two principal functions:

  • You’ll be able to confer with prior variations of a mannequin for troubleshooting or audit functions. This allows you to make sure that your mannequin nonetheless retains excessive accuracy earlier than switching to a more moderen mannequin model.
  • You’ll be able to proceed to run inference queries on the present model of a mannequin in the course of the mannequin coaching technique of the brand new model.

On the time of this writing, Redshift ML doesn’t have native versioning capabilities, however you possibly can nonetheless obtain versioning by implementing a couple of easy SQL strategies by utilizing the BYOM functionality. BYOM was launched to help pre-trained SageMaker fashions to run your inference queries in Amazon Redshift. On this submit, we use the identical BYOM approach to create a model of an current mannequin constructed utilizing Redshift ML.

The next determine illustrates this workflow.

Within the following sections, we present you the best way to can create a model from an current mannequin after which carry out mannequin retraining.


As a prerequisite for implementing the instance on this submit, you should arrange a Redshift cluster or Amazon Redshift Serverless endpoint. For the preliminary steps to get began and arrange your atmosphere, confer with Create, practice, and deploy machine studying fashions in Amazon Redshift utilizing SQL with Amazon Redshift ML.

We use the regression mannequin created within the submit Construct regression fashions with Amazon Redshift ML. We assume that it’s already been deployed and use this mannequin to create new variations and retrain the mannequin.

Create a model from the present mannequin

Step one is to create a model of the present mannequin (which implies saving developmental adjustments of the mannequin) so {that a} historical past is maintained and the mannequin is offered for comparability in a while.

The next code is the generic format of the CREATE MODEL command syntax; within the subsequent step, you get the knowledge wanted to make use of this command to create a brand new model:

CREATE MODEL model_name
    FROM ('job_name' | 's3_path' )
    FUNCTION function_name ( data_type [, ...] )
    RETURNS data_type
    IAM_ROLE { default }
    [ SETTINGS (
      S3_BUCKET 'bucket', | --required
      KMS_KEY_ID 'kms_string') --optional

Subsequent, we accumulate and apply the enter parameters to the previous CREATE MODEL code to the mannequin. We’d like the job identify and the info forms of the mannequin enter and output values. We accumulate these by working the present mannequin command on our current mannequin. Run the next command in Amazon Redshift Question Editor v2:

present mannequin predict_rental_count;

Word the values for AutoML Job Title, Perform Parameter Sorts, and the Goal Column (trip_count) from the mannequin output. We use these values within the CREATE MODEL command to create the model.

The next CREATE MODEL assertion creates a model of the present mannequin utilizing the values collected from our present mannequin command. We append the date (the instance format is YYYYMMDD) to the top of the mannequin and performance names to trace when this new model was created.

CREATE MODEL predict_rental_count_20230706 
FROM 'redshiftml-20230706171639810624' 
FUNCTION predict_rental_count_20230706 (int4, int4, int4, int4, int4, int4, int4, numeric, numeric, int4)
RETURNS float8 
IAM_ROLE default
S3_BUCKET '<<your S3 Bucket>>');

This command could take jiffy to finish. When it’s full, run the next command:

present mannequin predict_rental_count_20230706;

We are able to observe the next within the output:

  • AutoML Job Title is similar as the unique model of the mannequin
  • Perform Title reveals the brand new identify, as anticipated
  • Inference Kind reveals Native, which designates that is BYOM with native inference

You’ll be able to run inference queries utilizing each variations of the mannequin to validate the inference outputs.

The next screenshot reveals the output of the mannequin inference utilizing the unique model.

The next screenshot reveals the output of mannequin inference utilizing the model copy.

As you possibly can see, the inference outputs are the identical.

You have got now discovered the best way to create a model of a beforehand educated Redshift ML mannequin.

Retrain your Redshift ML mannequin

After you create a model of an current mannequin, you possibly can retrain the present mannequin by merely creating a brand new mannequin.

You’ll be able to create and practice a brand new mannequin utilizing identical CREATE MODEL command however utilizing totally different enter parameters, datasets, or drawback varieties as relevant. For this submit, we retrain the mannequin on newer datasets. We append _new to the mannequin identify so it’s much like the present mannequin for identification functions.

Within the following code, we use the CREATE MODEL command with a brand new dataset obtainable within the training_data desk:

CREATE MODEL predict_rental_count_new
FROM training_data
TARGET trip_count
FUNCTION predict_rental_count_new
IAM_ROLE 'arn:aws:iam::<accountid>:function/RedshiftML'
PROBLEM_TYPE regression
SETTINGS (s3_bucket 'redshiftml-<your-account-id>',
          s3_garbage_collect off,
          max_runtime 5000);

Run the next command to test the standing of the brand new mannequin:

present mannequin predict_rental_count_new;

Change the present Redshift ML mannequin with the retrained mannequin

The final step is to exchange the present mannequin with the retrained mannequin. We do that by dropping the unique model of the mannequin and recreating a mannequin utilizing the BYOM approach.

First, test your retrained mannequin to make sure the MSE/RMSE scores are staying secure between mannequin coaching runs. To validate the fashions, you possibly can run inferences by every of the mannequin capabilities in your dataset and evaluate the outcomes. We use the inference queries supplied in Construct regression fashions with Amazon Redshift ML.

After validation, you possibly can change your mannequin.

Begin by accumulating the main points of the predict_rental_count_new mannequin.

Word the AutoML Job Title worth, the Perform Parameter Sorts values, and the Goal Column identify within the mannequin output.

Change the unique mannequin by dropping the unique mannequin after which creating the mannequin with the unique mannequin and performance names to ensure the present references to the mannequin and performance names work:

drop mannequin predict_rental_count;
CREATE MODEL predict_rental_count
FROM 'redshiftml-20230706171639810624' 
FUNCTION predict_rental_count(int4, int4, int4, int4, int4, int4, int4, numeric, numeric, int4)
RETURNS float8 
IAM_ROLE default
S3_BUCKET ’<<your S3 Bucket>>’);

The mannequin creation ought to full in a couple of minutes. You’ll be able to test the standing of the mannequin by working the next command:

present mannequin predict_rental_count;

When the mannequin standing is prepared, the newer model predict_rental_count of your current mannequin is offered for inference and the unique model of the ML mannequin predict_rental_count_20230706 is offered for reference if wanted.

Please confer with this GitHub repository for pattern scripts to automate mannequin versioning.


On this submit, we confirmed how you should utilize the BYOM characteristic of Redshift ML to do mannequin versioning. This lets you have a historical past of your fashions so that you could evaluate mannequin scores over time, reply to audit requests, and run inferences whereas coaching a brand new mannequin.

For extra details about constructing totally different fashions with Redshift ML, confer with Amazon Redshift ML.

Concerning the Authors

Rohit Bansal is an Analytics Specialist Options Architect at AWS. He focuses on Amazon Redshift and works with prospects to construct next-generation analytics options utilizing different AWS Analytics companies.

Phil Bates is a Senior Analytics Specialist Options Architect at AWS. He has greater than 25 years of expertise implementing large-scale knowledge warehouse options. He’s obsessed with serving to prospects via their cloud journey and utilizing the facility of ML inside their knowledge warehouse.



Please enter your comment!
Please enter your name here