by Author: Prashant Sharma, Gaurav Bhandari
Co-Authors: Vivek Kanna
Purpose
As machine learning models grow in complexity—particularly those involving deep learning and GPU-based inference—organizations are looking for ways to operationalize these models without the burden of managing container infrastructure. Traditionally, deploying such models required building Docker containers, pushing them to registries, and orchestrating them in external environments—introducing friction, delays, and infrastructure overhead. With the advent of Model Serving in Snowflake via Snowpark Container Services (SPCS), teams can now deploy and scale externally trained models without ever writing Dockerfiles or managing containers manually. Snowflake handles the underlying container orchestration, letting data scientists focus solely on their models and inference logic. This blog walks through how Model Serving, in combination with the Model Registry, enables streamlined, GPU-accelerated inference entirely within Snowflake—eliminating data movement, reducing operational complexity, and accelerating time to insight.
In this blog, we demonstrate how to deploy and scale machine learning inference in Snowflake using externally trained ML models. We will explore how Snowflake Model Registry and Model Serving enable easy deployment of custom inference solutions on Snowpark Container Services (SPCS), abstracting away the complexity of container management.
Use Cases
With Model Serving in SPCS, we can now deploy use cases with models requiring GPU compute or high inferencing parallelism directly within Snowflake without having to worry about the underlying infrastructure. Some of such use cases are listed below:
- Deep Learning Models
- Image Processing: Models for tasks such as image classification, and object detection.
- Natural Language Processing: Transformer-based models like BERT, GPT, and their variants for tasks like text summarization, sentiment, analysis, or translations.
- Advanced Computer Vision
- Surveillance and Security: Real-time analysis of images for threat detection or facial recognition.
- Large-Scale Data Applications
- Big Data Analytics: Classification, pattern recognition, and clustering in large datasets.
- Predictive Maintenance: Forecast equipment failures by analyzing sensor data from industrial systems.
Introducing Model Serving in Snowpark Container Services (SPCS)
Inference is a big part of any Data Science / Machine Learning workflow.
Snowflake now offers two powerful options for running ML inference: traditional Warehouse-based inferencing and the new Model Serving via SPCS. Let’s explore the scenarios where to use each approach based on model complexity, performance needs, and integration requirements:
- Warehouse Inferencing
- Models are deployed as UDFs (User-Defined Functions) Stored Procedures inside Snowflake.
- Inference runs within Snowflake’s virtual warehouses
- Best for:
- Lightweight models (e.g., linear regression, basic classifiers)
- Business intelligence & dashboarding scenarios
- Model Serving in SPCS
- Models run as containerized applications, providing specialized compute resources like Nvidia GPUs.
- Allows serving models via REST APIs, enabling external systems to interact with predictions in real-time.
- Best for:
- Large-scale, complex models (e.g., deep learning, NLP, time-series forecasting) requiring GPUs or are bigger than 5GB
- Near Real-time, low-latency inference.
- Cases where models require additional dependencies beyond standard SQL/Python UDFs.
- Need integration with external applications
Key benefits of Model Serving via SPCS
Model Serving via SPCS enables seamless model deployment, eliminating the need for external tools like VSCode or PyCharm for Python-stored procedures. Developers can build, test, and deploy models directly in Snowflake without manually building and pushing Docker containers, reducing development cycles and operational overhead.
Other key benefits include:
- In-Snowflake Inference – Run models directly within Snowflake with no data movement.
- GPU-Powered Execution – Ideal for deep learning and NLP workloads.
- Custom Environments – Utilize PyPI packages and external dependencies effortlessly.
- Real-Time Inference – API-based model serving with low latency.
- Simplified Infrastructure – No need for separate model hosting or external compute resources.
SPCS also supports both batch and real-time inferencing, ensuring efficient scaling and minimal inference latency. By integrating model deployment within Snowflake, machine learning workloads become more efficient, streamlined, and scalable.
For customers, this results in faster and more reliable access to model-driven insights, as inferencing is directly integrated with their data in Snowflake. There is no need to move data between platforms, reducing security risks and operational overhead. Additionally, this simplifies compliance and governance, as all processing remains within Snowflake’s environment.
From a competitive perspective, SPCS provides an advantage over traditional cloud-based model deployment solutions that require additional infrastructure and complex integrations. By consolidating model deployment and inferencing within Snowflake, organizations can reduce costs and complexity while improving the performance and accessibility of AI-driven applications.
Let’s explore the end-to-end flow of Model Deployment and Serving via SPCS:
- Snowflake Notebooks for Development and Deployment: Snowflake Notebooks on Container Runtime provides an integrated environment for building, testing, and deploying models without requiring external development tools. This ensures consistency and efficiency throughout the development lifecycle.
- Image Registry: Before logging the model, the associated environment (including custom dependencies and packages) must be pushed to Snowflake’s Image Registry. This ensures that the model runs in a consistent and reproducible environment when deployed via SPCS. The registered image is referenced during model logging, linking the model to its runtime context.
- Logging the Model into the Model Registry: Once the model is trained and validated, it is logged into Snowflake’s Model Registry with a single command. This ensures proper version control and easy access for future use.
- Deployment on SPCS: Deploying the model for inferencing in Snowpark Container Services requires a single command, significantly simplifying the process.
- Inference Service Attachment to Model Versions: Each logged model version can have an inference service attached, enabling real-time and batch inferencing capabilities within Snowflake. After deployment, batch inferencing can be conducted within Snowflake, leveraging its optimized infrastructure for large-scale processing.
- Model Registry UI: Snowflake’s UI provides a structured interface to manage, monitor, and analyze model performance, ensuring ease of access and operational efficiency.
Lets glance through the Model Registry UI
Navigate to the Models tab in the snowsight UI:
- Displays a list of registered models.
- Shows metadata like Model name, inference services status, type, owner & created date
- Below, we have a SENTENCE_TRANSFORMER model registered in Model Registry, installed from snowflake-ml-python package. Refer an example notebook here
Inference Services tab in the snowsight UI:
- Displays a list of inference services created.
- Shows metadata like Service name, Status, served model & updated date
- On select shows logs and other details like model being served, endpoint, compute pool etc
- Below, we see an inference service KIPI_EMBEDDING_SERVICE created for the SENTENCE_TRANSFORMER model which uses a compute pool with name GPU_NV_S.
Below shows the detailed view of a selected model from the registry.
- Displays metadata such as model name, description, and tags.
- Lists all available versions of the model with status (e.g., active, archived).
- Allows users to create a new version or navigate to version-specific details.
- Useful for managing multiple iterations of the same model.
- Each versioned model has a Lineage tab that shows the full data flow lineage information for the model, including any datasets that were used to train the model, any feature views from Feature Store, and the source data tables.
Next, let’s check out the details of the inference service associated with a model version.
- Displays data like endpoint, compute pool, number of instances, functions
- Shows the current status of the inference service and logs in case service is running
Finally, let’s take a look at the files associated with a model version.
- Displays artifacts such as serialized model files (e.g., .pkl, .pt, .onnx), config files, or supporting scripts.
- Allows users to upload, download, or delete files.
- Ensures version traceability by associating specific files with each model version.
- Below we see the files present in the SENTENCE_TRANSFORMER model we registered via snowflake_ml_python package.
Model Registry UI is a friendly and one stop shop for all things related to Model management and associated metadata.
Conclusion
By combining Snowpark Container Services and Model Registry, Snowflake ML enables seamless execution of GPU-based inference for externally trained ML models, regardless of where they were built. This approach:
- Simplifies code deployment.
- Reduces infrastructure management(docker dependency) overhead.
- Ensures scalability for enterprise ML applications.
Leverage Snowflake’s capabilities to unlock the full potential of your machine learning workflows.
Have questions or want to explore how this solution can work for your organization?
Feel free to reach out to us at appsupport@kipi.ai — we’re here to help!
Appendix
Snowflake Notebooks code snippets for Model Serving
#1 Deploy a Hugging Face sentence transformer for GPU-powered inference using Snowflake Notebook:
- Install dependencies using “pip” – requires External Access Integration for pypi & external model #2 #3
!pip install sentence_transformers snowflake-ml-python==1.7.5 |
- Import libraries – snowflake snowpark, snowflake ml and sentence transformer
from snowflake.ml.registry import registryfrom sentence_transformers import SentenceTransformerfrom snowflake.snowpark.context import get_active_session |
- Define name for Model, Image repository, Compute pool & service. Other than these, Compute pool configurations needs to be defined i.e. # Nodes, Instance family.
model_name = “<model_name>”image_repo_name = “<snowflake_image_repository_name>”cp_name = “<compute_pool_name>”num_spcs_nodes = “<number_of_nodes>”spcs_instance_family = “<compute_pool_instance_family>”service_name = “<service_name>” current_database = session.get_current_database().replace(‘”‘, ”)current_schema = session.get_current_schema().replace(‘”‘, ”)extended_image_repo_name = f”{current_database}.{current_schema}.{image_repo_name}”extended_service_name = f'{current_database}.{current_schema}.{service_name}’ |
- Establish a snowflake connection & create a Model registry using the session.
session = get_active_session()reg = registry.Registry(session=session) |
- Download sentence transformer – hugging face model
embed_model = SentenceTransformer(‘sentence-transformers/all-MiniLM-L6-v2’, token=False) |
- Prepare input data and test embeddings
input_data = [ “This is the first sentence.”, “Here’s another sentence for testing.”, “The quick brown fox jumps over the lazy dog.”, “I love coding and programming.”, “Machine learning is an exciting field.”, “Python is a popular programming language.”, “I enjoy working with data.”, “Deep learning models are powerful.”, “Natural language processing is fascinating.”, “I want to improve my NLP skills.”,] embeddings = embed_model.encode(input_data)print(embeddings) |
- Log the model to the registry.
_ = reg.log_model( embed_model, model_name=model_name, sample_input_data=input_data, pip_requirements=[“sentence-transformers”, “torch”, “transformers”]) |
Required parameters are:
- model
- mode_name
- version[Optional]
- sample_input_data
- pip_requirements
- other parameters
- Get all available model versions
# Get the logged model m = reg.get_model(model_name)version_df = m.show_versions()version_df.head(100) |
- Use the latest version of the logged Model
# Select the model based on versionlast_version_name = version_df[‘name’].iloc[-1]pip_model = m.version(last_version_name)pip_model |
- Validate compute pool – if no compute pool exists create one
session.sql(f”show compute pools”).show() session.sql(f”alter compute pool if exists {cp_name} stop all”).collect() session.sql(f”drop compute pool if exists {cp_name}”).collect() session.sql(f”create compute pool {cp_name} min_nodes={num_spcs_nodes} max_nodes={num_spcs_nodes} instance_family={spcs_instance_family} auto_resume=True auto_suspend_secs=300″).collect() session.sql(f”describe compute pool {cp_name}”).show() |
- Create Image Repository for storing the images built by the service
session.sql(f”create or replace image repository {extended_image_repo_name}”).collect() |
- Create service using the selected Model version.
pip_model.create_service( service_name=extended_service_name, service_compute_pool=cp_name, image_repo=extended_image_repo_name, gpu_requests=”1″, max_instances=int(num_spcs_nodes)) |
Required parameters are:
- service_name
- service_compute_pool
- image_repo
- ingress_enabled
- gpu_requests
- max_instances
- Build_external_access_integration
- other parameters
- To list all services running in a model
pip_model.list_services() |
- To get the service status
session.sql(f”SELECT VALUE:status::VARCHAR as SERVICESTATUS, VALUE:message::VARCHAR as SERVICEMESSAGE FROM TABLE(FLATTEN(input => parse_json(system$get_service_status(‘{service_name}’)), outer => true)) f”).show(100) |
- To use the SQL based Service function
session.sql(f”SELECT {KIPI_EMBEDDING_SERVICE}!encode(‘This is a test sentence.’)”); |
- Run model using the service and model version – batch inferencing.
# Run on SPCSpip_model.run(input_data, function_name=”encode”, service_name=service_name) |
Required parameters:
- input_data
- service_name
- function_name
- other parameters#4
References
- Deploy a Hugging Face sentence transformer for GPU-powered inference
- External access for Snowflake Notebooks – Create an external access integration for PyPI
- External access for Snowflake Notebooks – Create an external access integration for HuggingFace
About kipi.ai
Kipi.ai, a WNS company, is a leading analytics and AI services provider, specializing in transforming data into actionable insights through advanced analytics, AI, and machine learning. As an Elite Snowflake Partner, we are committed to helping organizations optimize their data strategies, migrate to the cloud, and unlock the full potential of their data. Our deep expertise in the Snowflake AI Data Cloud enables us to drive seamless data migration, enhanced data governance, and scalable analytics solutions tailored to your business needs. At kipi.ai, we empower clients across industries to accelerate their data-driven transformation and achieve unprecedented business outcomes.
For more information, visit www.kipi.ai and www.wns.com.