Kipi.ai / Insights / Blogs / Efficiently Manage ML Models with Snowflake Model Registry

Efficiently Manage ML Models with Snowflake Model Registry

Authors: Dikshant Jopat, Sovan Saha, and Vijaysai Turai

Unlocking the Power of Machine Learning

The Snowflake Model Registry allows you to securely manage models and their metadata in Snowflake, regardless of their origin. The Model Registry stores machine learning models as first-class, schema-level objects in Snowflake so they can be easily found and used by others in your organization. You can create registries and store models in them using the Snowpark ML library classes. Models can have multiple versions, and you can set one version as the default version.

Once you have saved a model, you can call its methods (which correspond to functions or stored procedures) to perform model operations, such as inferences, in a Snowflake virtual warehouse.

Unlock the power of Data

The most important classes in the Snowflake Model Registry API are:

  • snowflake.ml.registry.Registry: Manages models within a schema.
  • snowflake.ml.model.Model: Represents a model.
  • snowflake.ml.model.ModelVersion: Represents a version of a model.

The Snowflake Model Registry supports the following types of models.

  • Snowpark ML Modeling
  • scikit-learn
  • XGBoost
  • LightGBM
  • CatBoost
  • PyTorch
  • TensorFlow
  • MLFlow PyFunc
  • Sentence Transformer
  • Hugging Face pipeline

Availability

The Snowflake Model Registry is now generally available as of package version 1.5.01. It is accessible from both Python and SQL3.

How to Integrate Model Registry into the ML Workflow:

  • Creating a session
  • Create a simple XGBoost regression model
  • Evaluate the model
  • Registering the model in Model Registry
  • Inferencing with the registered model
  • Summary

1. Creating a session

Snowflake Notebooks sets up a Snowpark session when the notebook is connected to the kernel. Let’s use this session object to verify our connection.

2. Create a simple XGBoost regression model

3. Evaluate the model

4. Registering the model in Model Registry

Adding a model to the registry is called logging the model. Log a model by calling the log_model method of the registry. This method:

  • Serializes the model, a Python object, and creates a Snowflake model object from it.
  • Adds metadata, such as a description, to the model as specified in the log_model call.
  • Each model can have any number of versions. To log additional versions of the model, call log_model again with the same model_name but a different version_name.

5. Inferencing with the registered model

6. Changing Default Version from V0 to V1

You can assign an alias to a model version by using the SQL command ALTER MODEL. You can use an alias wherever a version name is required, e.g. when retrieving a reference to a model version in Python or in SQL.

In addition to the aliases you create, the following system aliases are available in all models.

  • DEFAULT refers to the default version of the model.
  • FIRST refers to the oldest version of the model after the time of creation.
  • LAST refers to the latest version of the model after the time of creation.

Alias names that you create must not match existing version names or aliases in the model, including system aliases.

7. Grant access to the model for other users

We are in the process of setting up a role and a user, then linking the role to the user and allowing access to the database, schema, and model. This allows the model to be used by another user.

Navigations

Show versions 

native_registry.get_model(model_name).show_versions()

Shows the versions of a model for machine learning. Models can have multiple versions, one of which must be set as the default version (see ALTER MODEL). The output returns the metadata and properties of the table, sorted lexicographically by database, schema, and model name

Show Models 

native_registry.show_models()

Lists the machine learning models to which you have access rights. The output returns the metadata and properties of the table, organized lexicographically by database, schema, and model name

DROP MODEL

Removes a model for machine learning from the current/specified schema.

Syntax:

DROP MODEL <name>

Example:

native_registry.delete_model(model_name)

Required Privileges

To create a model, you must either be the owner of the schema in which the model is created or you must have the CREATE MODEL privilege for this schema. To use a model, you must either be the owner of the model or have the USAGE privilege for the model. With the USAGE privilege, you can use the model for inference without being able to view its internals.

How Model Registry Can Simplify the ML Model Development Lifecycle

1. Automated training and inference pipeline

Model Registry can be used to automate the training and inference pipeline for machine learning models. This includes training a model, storing it in the Model Registry and then using the latest version of the model for inference on new data.

2. Parallelized Inference

Model registration can be used to perform parallelized inference on new data. The latest version of the model is applied to new data in parallel, which can significantly speed up the inference process.

3. Model Management

The model registry can be used to manage multiple versions of a model. To do this, different versions of a model are stored in the model registry and their performance metrics are tracked.

4. Model Serving

The model registry can be used to serve models for real-time inference. This is done by serving a model to the model registry and using it to make predictions about new data in real-time.

5. Collaboration and Reproducibility

The Model Registry can be used to collaborate with other data scientists and engineers. Models are stored in the Model Registry and shared with others, which can improve collaboration and reproducibility.

6. Model Verification and Compliance

The Model Registry can be used to check and track model performance and data sequencing. To do this, models are stored in the Model Registry and their performance metrics are tracked, which can improve model auditing and compliance.

Current Restrictions and Problems

The Snowflake Model Registry currently has the following limitations:

  • The registry cannot be used in Snowflake Native Apps.
  • Models cannot be shared or cloned, and are skipped during replication.

Versions 1.5.0 and 1.5.1 of the snowflake-ml-python package have the following known issues. Until these are resolved, use the provided workaround.

  • In Snowflake version 8.23 and earlier, the library does not work in owner’s rights stored procedures. Instead, use stored procedures with the caller’s privileges.
  • In stored procedures, logging a model requires embedding a copy of the Snowpark ML local library into the model. Specify the embed_local_ml_library option in the log_model call as follows.

registry.log_model(, options={“embed_local_ml_library”: True,})

The following limits apply to models and model versions.

ModelsMaximum of 50 versions
Model versions
Maximum  10 methodsMaximum  10 importsMaximum  500 arguments per methodMaximum metadata (including metrics) of 100 KBMaximum total model size of 5 GBMaximum configuration file size of 250 KB, including conda.yml and other manifest files that log_model generates internally. (For example, if a model has many functions, all of which have many arguments, this limit may be exceeded)

Cost Considerations

Using the Snowflake Model Registry incurs the usual Snowflake consumption-based costs. These include the cost of :

  • Storing model artifacts, metadata, and functions.
  • Copying files between stages to Snowflake. 
  • Serverless model object operations using the Snowsight user interface or the SQL or Python interface, such as viewing models and model versions and modifying model comments, tags, and metrics.
  • Warehouse computational costs, vary depending on the type of model and the amount of data used for inference. Warehouse computing costs are incurred for:
    • The creation of models and versions
    • Calling the methods of a model

Conclusion

The advantages of Snowflake’s model registration are summarized below:

  • Faster time-to-value: automated model deployment and versioning allows data scientists to focus on building models instead of managing infrastructure.
  • Improved collaboration: Centralized model management enables seamless collaboration and knowledge sharing between teams.
  • Increased transparency and reproducibility: Model tracking and versioning ensure that models are transparent, reproducible and verifiable
  • Scalability and reliability: Snowflake’s cloud-based architecture ensures that models can be deployed at scale and with high reliability.

Snowflake’s Model Registry is a powerful tool that enables organizations to realize the full potential of their machine learning investments. By providing a centralized, automated and scalable platform for model management, Snowflake Model Registry is poised to revolutionize the way organizations approach machine learning and AI.

Get started with Snowflake’s Model Registry today and discover the power of optimized machine learning workflows!

References

August 20, 2024