Authors: Dikshant Dharamnarain Jopat, Sovan Saha, Vijaysai Turai, and Sridevi Jaddu
Streamlining Your Data Workflow
In today’s fast-paced data landscape, effective version control and seamless integration are paramount. Snowflake’s Git integration offers a secure and streamlined way to connect your Snowflake environment with a remote Git repository. By leveraging this integration, you can synchronize files from your Git repository to a specialized stage within Snowflake, known as a repository stage. This repository stage functions like a local Git repository, containing a full clone of your remote repository complete with branches, tags, and commits.
With this setup, Snowflake provides a single source of truth for all your SQL scripts, Snowpark functions, procedures, Native Apps, and Streamlit Apps, ensuring that your codebase is consistent and up-to-date across your entire team. By having the repository contents synchronized to your Snowflake account, you can reference and utilize these files directly within Snowflake, just as you would with any other staged file.
Currently, Snowflake supports Git integration with several major platforms, including GitHub, GitLab, BitBucket, and Azure DevOps. This broad compatibility ensures that no matter where your code is hosted, you can seamlessly integrate it into your Snowflake workflows, enhancing collaboration, efficiency, and version control in your data operations.
Steps to set up
Step 1: Create a secret, to contain credentials for authenticating with the repository.
CREATE OR REPLACE SECRET my_secret
TYPE = PASSWORD
USERNAME = ‘test’
PASSWORD = ‘github_sjhsdjkfdsdcbcuilbjdsskahcncmjkdlhfsjdcnmdlaskjkcxm’
COMMENT = ‘my git login’ ;
Step 2: Create an API integration to specify details about Snowflake interaction with the Git repository API.
CREATE OR REPLACE API INTEGRATION git_api_integration
API_PROVIDER = git_https_api
API_ALLOWED_PREFIXES = (‘https://github.com/test-repo’)
ALLOWED_AUTHENTICATION_SECRETS = (my_secret)
ENABLED = TRUE;
Step 3: Create a Snowflake Git repository stage to which you can synchronize files from the repository.
CREATE OR REPLACE GIT REPOSITORY git_repo
API_INTEGRATION = git_api_integration
ORIGIN = ‘https://github.com/test-repo/test-main.git’
GIT_CREDENTIALS = ‘my_secret’
COMMENT = ‘my git repo’ ;
Git Operations
-> Refresh a repository stage from the repository
alter git repository git_repo fetch;
-> View a list of repository branches or tags
show git branches in git_repo;
-> View a list of repository files
list by branch name: ls @git_repo/branches/adding-schema-call-block-to-ci-cd-main;
list by tag name: ls @repository_stage_name/tags/tag_name;
list by commit hash: ls @git_repo/commits/ac9aaa8a09755ffbf7543deb7656fb06cf604703;
-> View repository stage properties
describe git repository git_repo;
-> Execute code from a repository
execute immediate from ‘@git_repo/branches/main/Code Base/SPFUNCTIONCHANGES.sql’;
-> View available repositories
show git repositories;
-> View available tags in the repository
show git tags in git_repo;
-> To drop an existing repository
drop git repository git_repo;
Use Cases
SQL Scripts
With Snowflake’s git integration, we can have all the SQL scripts like stored procedures, tasks, table create statements, etc., maintained in git repo and can be used directly in Snowflake. With execute immediate statements, we can execute SQL scripts in git repository in Snowflake
EXECUTE IMMEDIATE FROM @snowflake_extensions/branches/main/sql/create-database.sql;
Snowpark Python
You can have your Snowflake stored procedure that imports a file from git repo and uses it as a source for the stored procedure. As and when you update the file in the git repo and refresh the git stage in Snowflake stored procedure will automatically update.
create or replace procedure filter_by_role(tablename varchar, role varchar)
returns table(id number, name varchar, role varchar)
language python
runtime_version = ‘3.8’
packages = (‘snowflake-snowpark-python’)
imports = (‘@example_db.example_schema.snowflake_extensions/branches/main/python-handlers/filter.py’)
handler = ‘filter.filter_by_role’;
Use a Git repository file to configure new accounts
With EXECUTE IMMEDIATE FROM, you can execute (from any Snowflake session) scripts you manage in your Git repository. For example, you might have a script that sets up every new Snowflake account in your organization. The script might contain statements to create users, roles, objects, and grant privileges on the account and objects.
- Create the file setup.sql with the following contents:
- CREATE ROLE analyst;
- CREATE USER gladys;
- GRANT ROLE analyst TO USER gladys;
- SHOW GRANTS TO USER gladys;
- Commit your SQL file to your Git repository. Use the git command line tool to commit the file to your Git repository:
- git add scripts/setup.sql
- git commit -m “Adding code to set up new accounts”
- git push
- In Snowflake, refresh the repository stage:
- alter git repository configuration_repo fetch;
- In Snowflake, execute the file in your repository stage:
- execute immediate from @configuration_repo/branches/main/scripts/setup.sql;
Streamlit
You can have streamlit code maintained in git repository and use it in your Snowflake account
create streamlit git_sample
root_location=’@snowflake_extensions/branches/main/streamlit’
main_file=’streamlitapp.py’;
Current limitations
->Interaction between Snowflake and a Git repository is currently read-only, in which repository files may be read from the repository, but not written to it.
->Access to Git repositories from Snowflake is supported only over the public internet. For example, access is not supported to repositories that are behind a private network.
->Sharing repository stages is not supported through data sharing or apps built on the Snowflake Native App Framework.
->Creating repository stages inside application packages is not supported and might be blocked in the future.
->Creating repository stages inside native applications on the consumer side is not supported.
Cost Implication
Storage cost: Incurs stage storage cost
Compute cost: All the git operations utilize compute of custom warehouse and compute cost depends on the time it takes for each git operation
Conclusion
This overview covers key scenarios, though it’s not exhaustive there are additional use cases, such as integrating git with Native apps packages. Hopefully, this guide provides a solid foundation to start leveraging Snowflake’s git integration. This blog represents the initial phase of a border integration between git and snowflake. Stay tuned for future updates and enhancements that will further streamline the management of your snowflake objects through source control.