Skip to content

GoogleCloudPlatform/nl2sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NL2SQL - A framework to convert natural language questions into SQL queries

Introduction

NL2SQL is a library for building Natural Language to SQL workflows that are composable, explainable and extensible.

  • Composability : The NL2SQL library breaks down the process of translating a business question into a SQL query into smaller, atomic tasks, and provides specialised modules for each of these tasks, allowing you to create end-to-end NL2SQL flows that are fine-tuned and custom built for your data pipelines and your business requirements.

  • Explainability : All of the tasks provide Chain-Of-Thoughts based options that allow you to gain insight into how the LLM is interpreting the problem and strategising a solution. These "thoughts" not only allow post-hoc optimisations to prompts and parameters, but can also be exposed to the end user to help them draft their questions better.

  • Extensibility : The tasks come with tested, well-performing default parameters, but also allow you to deeply customise them. Be it providing a new prompt template, a custom set of examples from your database, or a different LLM - each task is purpose built to accommodate diverse business needs. You can also build your own tasks and chain them with the rest of the workflow to extend your pipeline further.

Logging and Analytics

This framework does not collect any metrics or logs by default. However, you can locally collect requests and execution logs by setting the NL2SQL_ENABLE_ANALYTICS environment variable to any truth-y value. To have these logs sent to a GCS bucket of your choice, please set the NL2SQL_LOG_BUCKET environment variable to the bucket name. To prevent these logs from containing information about the machine running the code, please set the NL2SQL_DISABLE_SYSINFO environment variable to any value.

Warnings

  • SQL Accuracy : The SQL generated by this tool may be inefficient, inaccurate or incomplete. Always review and test the generated code before using it. We also recommend setting up periodic audits of the generated results.

  • Data Sensitivity : Exercise caution when using this tool with sensitive or personal data. This framework can send information (sample rows, schema, comments etc.) from the database to LLMs, Vector Databases, etc. as part of the SQL generation pipeline. Ensure this does not violate your privacy policies and regulations. The framework may return improperly constructed SQL queries that can be exploited to gain unauthorized access or cause damage to your database. Always sanitize input parameters and validate generated SQL against known vulnerabilities.

  • Security Risks : Please follow the the principle of least privilege while using this framework. This framework does not handle auth and relies on you to correctly configure access control mechanisms for the environment the code will be running in, so please ensure sufficient access restrictions for the account used to run this framework to prevent unintended operations, bills etc. This framework may also auto-execute generated SQL queries for validation purposes, please ensure this is always used with read-only permissions to avoid accidental modifications to the database.

Disclaimer

This is not an official Google product.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages