NL2SQL is a library for building Natural Language to SQL workflows that are composable, explainable and extensible.
-
Composability : The NL2SQL library breaks down the process of translating a business question into a SQL query into smaller, atomic tasks, and provides specialised modules for each of these tasks, allowing you to create end-to-end NL2SQL flows that are fine-tuned and custom built for your data pipelines and your business requirements.
-
Explainability : All of the tasks provide Chain-Of-Thoughts based options that allow you to gain insight into how the LLM is interpreting the problem and strategising a solution. These "thoughts" not only allow post-hoc optimisations to prompts and parameters, but can also be exposed to the end user to help them draft their questions better.
-
Extensibility : The tasks come with tested, well-performing default parameters, but also allow you to deeply customise them. Be it providing a new prompt template, a custom set of examples from your database, or a different LLM - each task is purpose built to accommodate diverse business needs. You can also build your own tasks and chain them with the rest of the workflow to extend your pipeline further.
This framework does not collect any metrics or logs by default. However, you can locally collect requests and execution logs by setting the NL2SQL_ENABLE_ANALYTICS
environment variable to any truth-y value. To have these logs sent to a GCS bucket of your choice, please set the NL2SQL_LOG_BUCKET
environment variable to the bucket name. To prevent these logs from containing information about the machine running the code, please set the NL2SQL_DISABLE_SYSINFO
environment variable to any value.
-
SQL Accuracy : The SQL generated by this tool may be inefficient, inaccurate or incomplete. Always review and test the generated code before using it. We also recommend setting up periodic audits of the generated results.
-
Data Sensitivity : Exercise caution when using this tool with sensitive or personal data. This framework can send information (sample rows, schema, comments etc.) from the database to LLMs, Vector Databases, etc. as part of the SQL generation pipeline. Ensure this does not violate your privacy policies and regulations. The framework may return improperly constructed SQL queries that can be exploited to gain unauthorized access or cause damage to your database. Always sanitize input parameters and validate generated SQL against known vulnerabilities.
-
Security Risks : Please follow the the principle of least privilege while using this framework. This framework does not handle auth and relies on you to correctly configure access control mechanisms for the environment the code will be running in, so please ensure sufficient access restrictions for the account used to run this framework to prevent unintended operations, bills etc. This framework may also auto-execute generated SQL queries for validation purposes, please ensure this is always used with read-only permissions to avoid accidental modifications to the database.
This is not an official Google product.