Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply more fixes for Pydantic schema incompatibilities with OpenAI structured outputs #1659

Open
1 task done
mcantrell opened this issue Aug 17, 2024 · 2 comments
Open
1 task done

Comments

@mcantrell
Copy link

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

I noticed that you guys are doing some manipulation of Pydantic's generated schema to ensure compatibility with the API's schema validation. I found a few more instances that can be addressed:

Issues:

  • optional fields with pydantic defaults generate an unsupported 'default' field in the schema
  • date fields generate a format='date-time' field in the schema which is not supported

The test cases below builds on your to_strict_json_schema function and removes addresses these problematic fields with the remove_property_from_schema function:

class Publisher(BaseModel):
    name: str = Field(description="The name publisher")
    url: Optional[str] = Field(None, description="The URL of the publisher's website")
    class Config:
        json_schema_extra = {
            "additionalProperties": False
        }

class Article(BaseModel):
    title: str = Field(description="The title of the news article")
    published: Optional[datetime] = Field(None, description="The date the article was published. Use ISO 8601 to format this value.")
    publisher: Optional[Publisher] = Field(None, description="The publisher of the article")
    class Config:
        json_schema_extra = {
            "additionalProperties": False
        }
        
class NewsArticles(BaseModel):
    query: str = Field(description="The query used to search for news articles")
    articles: List[Article] = Field(description="The list of news articles returned by the query")
    class Config:
        json_schema_extra = {
            "additionalProperties": False
        }
    

def test_schema_compatible():
    client = OpenAI()
    
    # build on the internals that the openai client uses to clean up the pydantic schema for the openai API
    schema = to_strict_json_schema(NewsArticles)
    
    # optional fields with pydantic defaults generate an unsupported 'default' field in the schema
    remove_property_from_schema(schema, "default")
    # date fields generate a format='date-time' field in the schema which is not supported
    remove_property_from_schema(schema, "format")
        
    logger.info("Generated Schema: %s", json.dumps(schema, indent=2))
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        temperature=0,
        messages=[
            {
                "role": "user",
                "content":  "What where the top headlines in the US for January 6th, 2021?",
            }
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "schema": schema,
                "name": "NewsArticles",
                "strict": True,
            }
        }
    )
    result = NewsArticles.model_validate_json(completion.choices[0].message.content)
    assert result is not None



def remove_property_from_schema(schema: dict, property_name: str):
    if 'properties' in schema:
        for field_name, field in schema['properties'].items():
            if 'properties' in field:
                remove_property_from_schema(field, property_name)
            if 'anyOf' in field: 
                for any_of in field['anyOf']:
                    any_of.pop(property_name, None)
            field.pop(property_name, None)
    if '$defs' in schema:                    
        for definition_name, definition in schema['$defs'].items():
            remove_property_from_schema(definition, property_name)

Additional context

No response

@micahstairs
Copy link

@RobertCraigie Thanks for fixing one of the issues! Do you have an ETA on the fix for the "format" issue?

@RobertCraigie
Copy link
Collaborator

There are currently no plans to automatically remove "format": "date-time" as it breaks .parse()'s promise that it will either generate valid data or refuse to generate any data.

We're considering opt-in flags to remove certain features that the API doesn't support yet but I don't have an ETA to share unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants