Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(wren-ai-service): column-based batch to generate semantics #923

Merged
merged 6 commits into from
Nov 19, 2024

Conversation

paopa
Copy link
Member

@paopa paopa commented Nov 18, 2024

This PR enhances the semantics description pipeline with improved model handling, batching capabilities, and API updates.

Key Changes

1. Model Processing Improvements

  • Refactored picked_models function for better column handling:
    • Added relation_filter to handle relationship columns
    • Introduced column_formatter for consistent column structure
    • Enhanced model property extraction with default description fields

2. Batching Implementation

  • Implemented column-based batching strategy with a default size of 50 columns
  • Added chunking logic to process large models efficiently:
    • Models are processed in chunks while maintaining model integrity
    • Each chunk maintains the complete context for accurate description generation

3. API Enhancements

  • Added project_id field to request models for better tracking
  • Updated request/response structures in both semantics description and relationship recommendation endpoints
  • Improved error handling and response aggregation

4. Documentation Updates

  • Updated API documentation with new fields and examples
  • Simplified model structure in system prompts

Testing

  • Test in API
{
  "selected_models": [
    "example_model"
  ],
  "user_prompt": "",
  "mdl": "{\"models\": [{\"name\": \"example_model\", \"properties\": {\"description\": \"An example model with 70 columns.\"}, \"tableReference\": {\"catalog\": \"wrenai\", \"schema\": \"woocommerce\", \"table\": \"example_table\"}, \"columns\": [{\"name\": \"id\", \"type\": \"bigint\", \"notNull\": true, \"properties\": {\"displayName\": \"ID\", \"description\": \"Unique identifier for the resource.\"}}, {\"name\": \"name\", \"type\": \"varchar\", \"notNull\": true, \"properties\": {\"displayName\": \"Name\", \"description\": \"Name of the resource.\"}}, {\"name\": \"description\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Description\", \"description\": \"Description of the resource.\"}}, {\"name\": \"created_at\", \"type\": \"date\", \"notNull\": true, \"properties\": {\"displayName\": \"Created At\", \"description\": \"Date when the resource was created.\"}}, {\"name\": \"updated_at\", \"type\": \"date\", \"notNull\": true, \"properties\": {\"displayName\": \"Updated At\", \"description\": \"Date when the resource was last updated.\"}}, {\"name\": \"status\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Status\", \"description\": \"Current status of the resource.\"}}, {\"name\": \"price\", \"type\": \"decimal\", \"notNull\": true, \"properties\": {\"displayName\": \"Price\", \"description\": \"Price of the resource.\"}}, {\"name\": \"quantity\", \"type\": \"bigint\", \"notNull\": true, \"properties\": {\"displayName\": \"Quantity\", \"description\": \"Available quantity of the resource.\"}}, {\"name\": \"sku\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"SKU\", \"description\": \"Stock Keeping Unit identifier.\"}}, {\"name\": \"category\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Category\", \"description\": \"Category of the resource.\"}}, {\"name\": \"tags\", \"type\": \"json\", \"notNull\": false, \"properties\": {\"displayName\": \"Tags\", \"description\": \"Tags associated with the resource.\"}}, {\"name\": \"image_url\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Image URL\", \"description\": \"URL of the resource image.\"}}, {\"name\": \"is_active\", \"type\": \"boolean\", \"notNull\": true, \"properties\": {\"displayName\": \"Is Active\", \"description\": \"Indicates if the resource is active.\"}}, {\"name\": \"rating\", \"type\": \"decimal\", \"notNull\": false, \"properties\": {\"displayName\": \"Rating\", \"description\": \"Average rating of the resource.\"}}, {\"name\": \"review_count\", \"type\": \"bigint\", \"notNull\": false, \"properties\": {\"displayName\": \"Review Count\", \"description\": \"Number of reviews for the resource.\"}}, {\"name\": \"shipping_class\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Shipping Class\", \"description\": \"Shipping class for the resource.\"}}, {\"name\": \"weight\", \"type\": \"decimal\", \"notNull\": false, \"properties\": {\"displayName\": \"Weight\", \"description\": \"Weight of the resource.\"}}, {\"name\": \"dimensions\", \"type\": \"json\", \"notNull\": false, \"properties\": {\"displayName\": \"Dimensions\", \"description\": \"Dimensions of the resource.\"}}, {\"name\": \"meta_data\", \"type\": \"json\", \"notNull\": false, \"properties\": {\"displayName\": \"Meta Data\", \"description\": \"Additional metadata for the resource.\"}}, {\"name\": \"created_by\", \"type\": \"bigint\", \"notNull\": false, \"properties\": {\"displayName\": \"Created By\", \"description\": \"ID of the user who created the resource.\"}}, {\"name\": \"updated_by\", \"type\": \"bigint\", \"notNull\": false, \"properties\": {\"displayName\": \"Updated By\", \"description\": \"ID of the user who last updated the resource.\"}}, {\"name\": \"visibility\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Visibility\", \"description\": \"Visibility status of the resource.\"}}, {\"name\": \"is_featured\", \"type\": \"boolean\", \"notNull\": false, \"properties\": {\"displayName\": \"Is Featured\", \"description\": \"Indicates if the resource is featured.\"}}, {\"name\": \"meta_title\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Meta Title\", \"description\": \"SEO title for the resource.\"}}, {\"name\": \"meta_description\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Meta Description\", \"description\": \"SEO description for the resource.\"}}, {\"name\": \"meta_keywords\", \"type\": \"json\", \"notNull\": false, \"properties\": {\"displayName\": \"Meta Keywords\", \"description\": \"SEO keywords for the resource.\"}}, {\"name\": \"custom_field_1\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 1\", \"description\": \"Custom field for additional information.\"}}, {\"name\": \"custom_field_2\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 2\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_3\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 3\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_4\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 4\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_5\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 5\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_6\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 6\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_7\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 7\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_8\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 8\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_9\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 9\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_10\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 10\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_11\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 11\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_12\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 12\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_13\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 13\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_14\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 14\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_15\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 15\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_16\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 16\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_17\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 17\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_18\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 18\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_19\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 19\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_20\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 20\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_21\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 21\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_22\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 22\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_23\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 23\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_24\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 24\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_25\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 25\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_26\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 26\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_27\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 27\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_28\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 28\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_29\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 29\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_30\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 30\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_31\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 31\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_32\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 32\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_33\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 33\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_34\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 34\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_35\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 35\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_36\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 36\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_37\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 37\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_38\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 38\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_39\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 39\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_40\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 40\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_41\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 41\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_42\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 42\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_43\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 43\", \"description\": \"Another custom field for additional information.\"}}, {\"name\": \"custom_field_44\", \"type\": \"varchar\", \"notNull\": false, \"properties\": {\"displayName\": \"Custom Field 44\", \"description\": \"Another custom field for additional information.\"}}], \"primaryKey\": \"id\"}]}",
  "project_id": "123",
  "configuration": {
    "language": "English",
    "timezone": {
      "name": "Asia/Taipei"
    }
  }
}
  • Test in Service
if __name__ == "__main__":
    import json
    import uuid

    from langfuse.decorators import langfuse_context

    from src.config import settings
    from src.globals import create_service_container
    from src.providers import generate_components
    from src.utils import load_env_vars

    env = load_env_vars()
    with open("sample/xxx_mdl.json", "r") as file:
        mdl = json.load(file)
    models = ["products"]

    pipe_components = generate_components(settings.components)
    container = create_service_container(pipe_components, settings)
    service: SemanticsDescription = container.semantics_description
    id = str(uuid.uuid4())
    service[id] = SemanticsDescription.Resource(id=id)

    input = SemanticsDescription.Input(
        id=id,
        selected_models=models,
        user_prompt="",
        mdl=json.dumps(mdl),
    )

    asyncio.run(service.generate(input))
    print(service[id])

    langfuse_context.flush()

Screenshots

image image image image

@paopa paopa added module/ai-service ai-service related ci/ai-service ai-service related labels Nov 18, 2024
@paopa paopa requested a review from cyyeh November 18, 2024 17:53
@paopa paopa marked this pull request as ready for review November 18, 2024 17:53
Copy link
Member

@cyyeh cyyeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments

Copy link
Member

@cyyeh cyyeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM

@cyyeh cyyeh merged commit fde7b48 into main Nov 19, 2024
8 checks passed
@paopa paopa deleted the feat/column-based-batch-generate-semantics branch November 20, 2024 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/ai-service ai-service related module/ai-service ai-service related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants