Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize spec for index schema and fields #103

Merged
merged 10 commits into from
Feb 1, 2024
Merged

Conversation

tylerhutcherson
Copy link
Collaborator

@tylerhutcherson tylerhutcherson commented Jan 30, 2024

Advanced search operations require the ability to define rules and configurations about what data should be indexed, how it should be indexed, and how the database should handle this data on read/write. Redis handles this in a declarative manner, allowing for the dev to explicitly control these components of the search patterns.

An IndexSchema is what RedisVL uses to simplify this approach for users. The goal is not to replace the core search API.. but rather to create an easier onramp for less redis-savy developers, i.e. AI-native developers.

This PR applies a significant overhaul to the v0 implementation of schema in redisvl; with an additional goal of incorporating feedback from across the company/product/users for a better longterm approach in the ecosystem.

Below is a quick example from the class docstring:

      from redisvl.schema import IndexSchema

      # From YAML
      schema = IndexSchema.from_yaml("schema.yaml")

      # From Dict
      schema = IndexSchema.from_dict({
          "index": {
              "name": "user-index",
              "prefix": "user",
              "storage_type": "json",
          },
          "fields": [
              {"name": "user", "type": "tag"},
              {"name": "credit_score", "type": "tag"},
              {
                  "name": "embedding",
                  "type": "vector",
                  "attrs": {
                      "algorithm": "flat",
                      "dims": 3,
                      "distance_metrics": "cosine",
                      "datatype": "float32"
                  }
              }
          ]
      })
    Note:
        The `fields` attribute in the schema must contain unique field names to ensure
        correct and unambiguous field references.

IndexSchema Components

Index Info

Each index has a set of attributes like the index name, the chosen key prefix, and the underlying storage_type. In RedisVL, we also introduce the setting for key_separator which allows you to customize what char is used to separate the prefix from the identifier in the Redis key.

class IndexInfo(BaseModel):
    """
    Represents the basic configuration information for an index in Redis.

    This class includes the essential details required to define an index, such as
    its name, prefix, key separator, and storage type.
    """

    name: str
    """The unique name of the index."""
    prefix: str = "rvl"
    """The prefix used for Redis keys associated with this index."""
    key_separator: str = ":"
    """The separator character used in Redis keys."""
    storage_type: StorageType = StorageType.HASH
    """The storage type used in Redis (e.g., 'hash' or 'json')."""

Index info can be parsed from a YAML or dict-like representation:

index:
  name: user-index-v1
  prefix: user
  key_separator: ':'
  storage_type: json

Fields

Fields are a list of.... field definitions that are to be included in the redis index (info as described above). A field has a name, type, path (optional, only for JSON index), and attrs (optional settings per field):

class BaseField(BaseModel):
    """Base field"""
    name: str
    """Field name"""
    type: str
    """Field type"""
    path: Optional[str] = None
    """Field path (within JSON object)"""
    attrs: Optional[Union[BaseFieldAttributes, BaseVectorFieldAttributes]] = None
    """Specified field attributes"""

Fields can be listed in either a dictionary or YAML representation like the following:

fields:
  - name: user
    type: tag
    path: '.user'
  - name: credit_score
    type: tag
    path: '$.credit_score'
  - name: embedding
    type: vector
    path: '$.embedding'
    attrs:
      algorithm: flat
      dims: 3
      distance_metric: cosine
      datatype: float32

Version

The schema will be locked in a 0.1.0. The pydantic model for IndexSchema prevents the user from fatfingering or using the incorrect version of the schema for this version of the library. It is a fixed variable.

version: '0.1.0'

@tylerhutcherson tylerhutcherson added the enhancement New feature or request label Jan 30, 2024
@tylerhutcherson tylerhutcherson marked this pull request as ready for review January 30, 2024 15:58
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2024

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (c3e036b) 77.16% compared to head (212083b) 77.32%.

Files Patch % Lines
redisvl/schema/schema.py 90.12% 8 Missing ⚠️
redisvl/index.py 92.30% 1 Missing ⚠️
redisvl/llmcache/semantic.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
+ Coverage   77.16%   77.32%   +0.15%     
==========================================
  Files          22       23       +1     
  Lines        1384     1429      +45     
==========================================
+ Hits         1068     1105      +37     
- Misses        316      324       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tylerhutcherson tylerhutcherson added the documentation Improvements or additions to documentation label Jan 31, 2024
@tylerhutcherson tylerhutcherson changed the title Harden index schema and fields Finalize spec for index schema and fields Jan 31, 2024
@tylerhutcherson tylerhutcherson added the breakingchange breaking change to API label Jan 31, 2024
@tylerhutcherson tylerhutcherson merged commit 8a55e35 into main Feb 1, 2024
18 checks passed
@tylerhutcherson tylerhutcherson deleted the harden-0.1-schema branch February 6, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breakingchange breaking change to API documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants