Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorIndexDefinition: Adds Support for Partitioned DiskANN #4792

Merged

Conversation

kundadebdatta
Copy link
Member

@kundadebdatta kundadebdatta commented Oct 10, 2024

Pull Request Template

Description

This PR adds optional attributes in the VectorIndexDefinition class to support partitioned DiskANN. A typical index definition would be something like the below:

{
    "indexingPolicy": {
        "automatic": true,
        "indexingMode": "Consistent",
        "includedPaths": [
            {
                "path": "/*",
                "indexes": []
            }
        ],
        "excludedPaths": [],
        "compositeIndexes": [],
        "spatialIndexes": [],
        "vectorIndexes": [
            {
                "path": "/vector1",
                "type": "flat"
            },
            {
                "path": "/vector2",
                "type": "quantizedFlat",
                "quantizationByteSize": 3,
                "vectorIndexShardKey": [
                    "/Country"
                ]
            },
            {
                "path": "/vector3",
                "type": "diskANN",
                "quantizationByteSize": 2,
                "indexingSearchListSize": 100,
                "vectorIndexShardKey": [
                    "/ZipCode"
                ]
            }
        ]
    },
    "vectorEmbeddingPolicy": {
        "vectorEmbeddings": [
            {
                "path": "/vector1",
                "dataType": "int8",
                "dimensions": 1200,
                "distanceFunction": "dotproduct"
            },
            {
                "path": "/vector2",
                "dataType": "uint8",
                "dimensions": 3,
                "distanceFunction": "cosine"
            },
            {
                "path": "/vector3",
                "dataType": "float32",
                "dimensions": 400,
                "distanceFunction": "euclidean"
            }
        ]
    },
    "id": "test_binary_vector_container_6",
    "partitionKey": {
        "paths": [
            "/pk"
        ],
        "kind": "Hash"
    }
}

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #4628

- Code changes to add partitioned disk ann changes.

- Code changes to add vector index specs for partitioned disk ann.

- Code changes to update vector indexing definition.

- Code changes to remove unsupported data types.
@kundadebdatta kundadebdatta marked this pull request as ready for review October 15, 2024 18:22
@kundadebdatta kundadebdatta self-assigned this Oct 15, 2024
@kirankumarkolli kirankumarkolli merged commit ac9d503 into master Oct 18, 2024
23 checks passed
@kirankumarkolli kirankumarkolli deleted the users/kundadebdatta/4628_add_partitioned_diskann_changes branch October 18, 2024 16:52
sourabh1007 pushed a commit that referenced this pull request Oct 22, 2024
# Pull Request Template

## Description

This PR adds optional attributes in the `VectorIndexDefinition` class to
support partitioned DiskANN. A typical index definition would be
something like the below:

```
{
    "indexingPolicy": {
        "automatic": true,
        "indexingMode": "Consistent",
        "includedPaths": [
            {
                "path": "/*",
                "indexes": []
            }
        ],
        "excludedPaths": [],
        "compositeIndexes": [],
        "spatialIndexes": [],
        "vectorIndexes": [
            {
                "path": "/vector1",
                "type": "flat"
            },
            {
                "path": "/vector2",
                "type": "quantizedFlat",
                "quantizationByteSize": 3,
                "vectorIndexShardKey": [
                    "/Country"
                ]
            },
            {
                "path": "/vector3",
                "type": "diskANN",
                "quantizationByteSize": 2,
                "indexingSearchListSize": 100,
                "vectorIndexShardKey": [
                    "/ZipCode"
                ]
            }
        ]
    },
    "vectorEmbeddingPolicy": {
        "vectorEmbeddings": [
            {
                "path": "/vector1",
                "dataType": "int8",
                "dimensions": 1200,
                "distanceFunction": "dotproduct"
            },
            {
                "path": "/vector2",
                "dataType": "uint8",
                "dimensions": 3,
                "distanceFunction": "cosine"
            },
            {
                "path": "/vector3",
                "dataType": "float32",
                "dimensions": 400,
                "distanceFunction": "euclidean"
            }
        ]
    },
    "id": "test_binary_vector_container_6",
    "partitionKey": {
        "paths": [
            "/pk"
        ],
        "kind": "Hash"
    }
}
```

## Type of change

Please delete options that are not relevant.

- [x] New feature (non-breaking change which adds functionality)

## Closing issues

To automatically close an issue: closes #4628

---------

Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
microsoft-github-policy-service bot pushed a commit that referenced this pull request Oct 24, 2024
…nterfaces to Mark Them as Public for GA (#4845)

# Pull Request Template

## Description

The purpose of this PR is to mark the new `VectorEmbeddingPolicy` in the
`ContainerProperties` as a public surface interface for `GA` release,
and introducing new `VectorIndexes` in the `IndexingPolicy` to enable
Vector Similarity Search in Cosmos DB ecosystem.

Relevant PRs for the vector similarity work: 

- [ContainerProperties: Adds Vector Embedding and Indexing
Policy](#4379)
- [ContainerProperties: Refactors Vector Embedding and Indexing Policy
Interfaces to Mark Them as Public for
Preview](#4486)
- [VectorIndexDefinition: Adds Support for Partitioned
DiskANN](#4792)

## Type of change

Please delete options that are not relevant.

- [x] New feature (non-breaking change which adds functionality)

## Closing issues

To automatically close an issue: closes #4825
github-merge-queue bot pushed a commit to microsoft/semantic-kernel that referenced this pull request Nov 18, 2024
…/dotnet (#9678)

Due to the drop of float16 support in the package, additional change to
drop from the connector was needed to comply with the latest breaking
change.

- Azure/azure-cosmos-dotnet-v3#4792


Bumps
[Microsoft.Azure.Cosmos](https://github.com/Azure/azure-cosmos-dotnet-v3)
from 3.44.0-preview.1 to 3.45.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/releases">Microsoft.Azure.Cosmos's
releases</a>.</em></p>
<blockquote>
<h2>3.45.1</h2>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.1">3.45.1</a>
- 2024-11-11</h3>
<h4>Added</h4>
<ul>
<li><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4863">4863</a>
VectorIndexDefinition: Refactors Code to Remove Support for
VectorIndexShardKey from Preview Contract.</li>
</ul>
<h2>3.45.0</h2>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.0">3.45.0</a>
- 2024-10-25</h3>
<h4>Added</h4>
<ul>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4781">4781</a>
AppInsights: Adds classic attribute back to cosmos db to support
appinsights sdk.</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4709">4709</a>
Availability: Adds account-level read regions as effective preferred
regions when preferred regions is not set on client.</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4810">4810</a>
Package Upgrade: Refactors code to upgrade DiagnosticSource Library from
6.0.1 to 8.0.1</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4794">4794</a>
Query: Adds hybrid search query pipeline stage</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4819">4819</a>
Azurecore: Fixes upgrading azure core dependency to latest</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4814">4814</a>
DeleteAllItemsByPartitionKeyStreamAsync: Adds
DeleteAllItemsByPartitionKeyStreamAsync API to GA</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4845">4845</a>
ContainerProperties: Refactors Vector Embedding and Indexing Policy
Interfaces to Mark Them as Public for GA</p>
</li>
</ul>
<h4>Fixed</h4>
<ul>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4777">4777</a>
Regions: Fixes Removes decommissioned regions.</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4765">4765</a>
Open Telemetry: Fixes attribute name following otel convention</p>
</li>
</ul>
<h2>3.45.0-preview.1</h2>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.0-preview.1">3.45.0-preview.1</a>
- 2024-10-16</h3>
<h4>Fixed</h4>
<ul>
<li><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4799">4799</a>
Open Telemetry: Re-added deprecated attribute to support Application
Insights SDK by default. For OpenTelemetry attributes, set the
environment variable
OTEL_SEMCONV_STABILITY_OPT_IN=<code>database/dupe</code>.</li>
</ul>
<h2>3.45.0-preview.0</h2>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.0-preview.0">3.45.0-preview.0</a>
- 2024-10-07</h3>
<h4>Added</h4>
<ul>
<li><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4566">4566</a>
Container: Added support for IsFeedRangePartOfAsync, enabling precise
comparisons to determine relationships between FeedRanges.</li>
</ul>
<h2>3.44.1</h2>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.44.1">3.44.1</a>
- 2024-10-16</h3>
<h4>Fixed</h4>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/changelog.md">Microsoft.Azure.Cosmos's
changelog</a>.</em></p>
<blockquote>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.2">3.45.2</a>
- 2024-11-12</h3>
<h4>Added</h4>
<ul>
<li><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4866">4866</a>
JSON Binary Encoding: Adds support for encoding uniform arrays.</li>
</ul>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.46.0-preview.1">3.46.0-preview.1</a>
- 2024-11-06</h3>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.1">3.45.1</a>
- 2024-11-06</h3>
<h4>Added</h4>
<ul>
<li><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4863">4863</a>
VectorIndexDefinition: Refactors Code to Remove Support for
VectorIndexShardKey from Preview Contract.</li>
</ul>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.46.0-preview.0">3.46.0-preview.0</a>
- 2024-10-25</h3>
<h4>Added</h4>
<ul>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4792">4792</a>
VectorIndexDefinition: Adds Support for Partitioned DiskANN</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4837">4837</a>
ContainerProperties: Adds Full Text Search and Indexing Policy.</p>
</li>
</ul>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.0">3.45.0</a>
- 2024-10-25</h3>
<h4>Added</h4>
<ul>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4781">4781</a>
AppInsights: Adds classic attribute back to cosmos db to support
appinsights sdk.</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4709">4709</a>
Availability: Adds account-level read regions as effective preferred
regions when preferred regions is not set on client.</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4810">4810</a>
Package Upgrade: Refactors code to upgrade DiagnosticSource Library from
6.0.1 to 8.0.1</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4794">4794</a>
Query: Adds hybrid search query pipeline stage</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4819">4819</a>
Azurecore: Fixes upgrading azure core dependency to latest</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4814">4814</a>
DeleteAllItemsByPartitionKeyStreamAsync: Adds
DeleteAllItemsByPartitionKeyStreamAsync API to GA</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4845">4845</a>
ContainerProperties: Refactors Vector Embedding and Indexing Policy
Interfaces to Mark Them as Public for GA</p>
</li>
</ul>
<h4>Fixed</h4>
<ul>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4777">4777</a>
Regions: Fixes Removes decommissioned regions.</p>
</li>
<li>
<p><a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4765">4765</a>
Open Telemetry: Fixes attribute name following otel convention</p>
</li>
</ul>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.45.0-preview.1">3.45.0-preview.1</a>
- 2024-10-07</h3>
<h3><!-- raw HTML omitted --> <a
href="https://www.nuget.org/packages/Microsoft.Azure.Cosmos/3.44.1">3.44.1</a>
- 2024-10-16</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/Azure/azure-cosmos-dotnet-v3/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Microsoft.Azure.Cosmos&package-manager=nuget&previous-version=3.44.0-preview.1&new-version=3.45.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Vector Index] Remove Unsupported Data Types from Embedding Contract
4 participants