LLM isn't still following pydantic format for SQL #1906

Emekaborisama · 2024-12-08T13:41:10Z

I have made several attempts, but the inconsistency is concerning. Is there another way to ensure the LLM adheres to the guided pedantic format in Dspy?

import dspy
import os
from dotenv import load_dotenv
load_dotenv()

class SQLResponse(BaseModel):
    """Model for SQL query responses"""
    query: str = Field(description="The generated SQL query")
    is_safe: bool = Field(
        default=True,
        description="Whether the query is considered safe to execute"
    )

lm = dspy.LM(model=os.getenv("AZURE_LLAMA_MODEL_NAME", None), api_key=os.getenv("AZURE_LLAMA_API_KEY", None), api_base=os.getenv("AZURE_LLAMA_ENDPOINT", None))
dspy.settings.configure(lm=lm)

class DivideConquerAgent(dspy.Signature):
    """
    # SQL Query Generator

    You are a Divide-and-Conquer Senior SQL Programmer at Bombe (customer behavior analytics platform at postcode level), You decompose a question into a set of sub-questions. Your task is to transform natural language queries into simple optimized SQL while following these guidelines:

      ## Query Development Strategy
      - Break down complex queries into smaller components:
        - Keep your query simple.
        - Identify core tables and relationships.
        - Build base queries first.
        - Add filters and aggregations incrementally.

      ## Technical Standards.
        - Use indexes where available.
        - Minimize subqueries.
        - LIMIT 100 for all queries.
        - Use documented columns only.
        - Only use pattern match with LIKE.
        - No dynamic parameters.
        - Maintain proper key relationships in joins.

      ## JOIN RULES:
        - Default to LEFT JOIN to keep all left table records
        - Use INNER JOIN only for mandatory matches or to exclude NULLs
      
      RETURN RULE:
        - Just the sql query without any comments or markdown formatting.


      <examples>

      - question: Postcodes with the top number of conservative shoppers
      - SQL query:
      WITH shopper_base AS (
      -- Step 1: Get base shopper data
      SELECT
        external_id,
        value
      FROM persona_geography_data.geography_data_rows
      WHERE segment IN (
        SELECT code
        FROM personas
        WHERE label LIKE '%Conservative%'
      )
    ),
    postcode_aggregation AS (
      -- Step 2: Aggregate by postcode
      SELECT
        g.name AS postcode,
        SUM(sb.value) AS total_conservative
      FROM shopper_base sb
      LEFT JOIN geographies g ON sb.external_id = g.external_id
      GROUP BY g.name
    )
    -- Step 3: Final output with ranking
    SELECT
      postcode,
      total_conservative,
      RANK() OVER (ORDER BY total_conservative DESC) as conservative_rank
    FROM postcode_aggregation
    ORDER BY total_conservative DESC
    LIMIT 100;

      - question: Get persona data for postcode E11 3QA
      - query:
        SELECT *
        FROM postal_code_uks pc
        LEFT JOIN geography_data_rows gdr
        ON pc.pcd = 'E11 3QA'
        AND pc.oa01 = gdr.external_id
        LEFT JOIN personas p
        ON gdr.segment = p.code;
    </examples>
    """

    context: str = dspy.InputField(desc="Will contain the table names and its columns names, data types, can be multiple tables")
    question: str = dspy.InputField()
    answer: SQLResponse = dspy.OutputField(desc="SQL query that will work always, without syntax errors" )


class DivideConquerSqlProgrammerAgent(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_answer = dspy.ChainOfThought(DivideConquerAgent)


    def forward(self, question: str, re_tables: list[str], relevant_table_data):
        context =f"""

        <relevant_tables>
        {re_tables}

        </relevant_tables>

        <schemas>

        {relevant_table_data}

        </schemas>

        <notes>:
        - Use date({str(datetime.now())}) for current date.

        </notes>
        """
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

The text was updated successfully, but these errors were encountered:

okhat · 2024-12-08T14:23:26Z

Hey @Emekaborisama ! How are you setting up your LM and which LM is it?

What is the pydantic element here? I don't see one. Is it SQLResponse? What's the definition of this?

I don't really see much of DSPy in the example code above. This is more or less a hand-written prompt.

Emekaborisama · 2024-12-08T15:04:52Z

Hi @okhat Thanks alot for your response. really appreciate

just updated the code. interesting.

How would you do this differently?

thomasahle · 2024-12-10T15:40:14Z

Have you tried adding a pydantic validator for sql?
e.g.

class SQLValidator(BaseModel):
    sql_statement: str

    @validator('sql_statement')
    def validate_sql(cls, v):
        if not isinstance(v, str):
            raise ValueError('SQL statement must be a string')
            
        try:
            parsed = sqlparse.parse(v)
            if not parsed:
                raise ValueError('Empty SQL statement')
            return v
        except Exception as e:
            raise ValueError(f'Invalid SQL syntax: {str(e)}')

Emekaborisama · 2024-12-10T17:51:32Z

yes, i did

okhat changed the title ~~I am tired of using Dspy and the llm isn't still following the guided output pydantic format.~~ LLM isn't still following pydantic format for SQL Dec 8, 2024

okhat closed this as completed Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM isn't still following pydantic format for SQL #1906

LLM isn't still following pydantic format for SQL #1906

Emekaborisama commented Dec 8, 2024 •

edited

Loading

okhat commented Dec 8, 2024

Emekaborisama commented Dec 8, 2024 •

edited

Loading

thomasahle commented Dec 10, 2024

Emekaborisama commented Dec 10, 2024

LLM isn't still following pydantic format for SQL #1906

LLM isn't still following pydantic format for SQL #1906

Comments

Emekaborisama commented Dec 8, 2024 • edited Loading

okhat commented Dec 8, 2024

Emekaborisama commented Dec 8, 2024 • edited Loading

thomasahle commented Dec 10, 2024

Emekaborisama commented Dec 10, 2024

Emekaborisama commented Dec 8, 2024 •

edited

Loading

Emekaborisama commented Dec 8, 2024 •

edited

Loading