Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary key field is lost during upsert #399

Open
rakuzen25 opened this issue Dec 26, 2024 · 4 comments
Open

Primary key field is lost during upsert #399

rakuzen25 opened this issue Dec 26, 2024 · 4 comments

Comments

@rakuzen25
Copy link

Describe the bug:

When both auto ID and dynamic field are turned on in the collection, the primary key gets moved to the dynamic $meta field or lost along the way, which creates an issue for the upsert (and insert?) operation.

Steps to reproduce:

Do something like:

import { MilvusClient } from "@zilliz/milvus2-sdk-node";

const client = new MilvusClient("address");

const collection_name = "test";

const { data } = await client.query({
    collection_name,
    output_fields: ["*"],
    filter: 'source like "%test%"',
});

console.log(data);

// Do something to the data, or do nothing

const res = await client.upsert({
    collection_name,
    data,
});

console.log(res);

Output:

[
  {
    source: 'test.txt',
    langchain_primaryid: '1',     
    '$meta': {},
    langchain_vector: [
        // ...
    ],
    langchain_text: "...",       
    // ...
  }
]
{
  succ_index: [],
  err_index: [ 0 ],
  status: {
    extra_info: {},
    error_code: 'IllegalArgument',
    reason: 'fieldSchema(langchain_primaryid) has no corresponding fieldData pass in: invalid parameter',
    code: 1100,
    retriable: false,
    detail: 'fieldSchema(langchain_primaryid) has no corresponding fieldData pass in: invalid parameter'
  },
  IDs: null,
  acknowledged: false,
  insert_cnt: '0',
  delete_cnt: '0',
  upsert_cnt: '0',
  timestamp: '0'
}

I've tried to debug the source code a little. First, a field is not added to the field map if it's a primary key and auto ID is on:

const fieldMap = new Map<string, _Field>(
collectionInfo.schema.fields.reduce((acc, v) => {
if (v.is_function_output) {
functionOutputFields.push(v.name); // ignore function field
} else if (!v.is_primary_key || !v.autoID) {
acc.push([
v.name,

Then, I observed that the primary key gets dropped in rowData and added as a dynamic field (with a catch explained later):

// Loop through each row and set the corresponding field values in the Map.
data.fields_data.forEach((rowData, rowIndex) => {
// if support dynamic field, all field not in the schema would be grouped to a dynamic field
rowData = isDynamic
? buildDynamicRow(
rowData,
fieldMap,
DEFAULT_DYNAMIC_FIELD,
functionOutputFields
)
: rowData;

// console.log(Object.keys(rowData), rowData.$meta) before L186
[
  'source',
  'langchain_vector',
  'langchain_text',
  'langchain_primaryid',
  '$meta',
  // ...
]
{}

// console.log(Object.keys(rowData), rowData.$meta) after L193
[
  'source',
  'langchain_vector',
  'langchain_text',
  '$meta',
  // ...
]
{ 'langchain_primaryid': 1 }

Stepping into buildDynamicRow:

export const buildDynamicRow = (
rowData: RowData,
fieldMap: Map<string, _Field>,
dynamicFieldName: string,
functionOutputFields: string[]
) => {
const originRow = cloneObj(rowData);
const row: RowData = {};
// iterate through each key in the input data object
for (let key in originRow) {
row[dynamicFieldName] = row[dynamicFieldName] || ({} as JSON); // initialize the dynamic field object
if (fieldMap.has(key)) {
// if the key is in the fieldMap, add it to the non-dynamic fields
row[key] = originRow[key];
} else {
if (!functionOutputFields.includes(key)) {
const obj: JSON = row[dynamicFieldName] as JSON;
// otherwise, add it to the dynamic field
obj[key] = originRow[key];
}
}
}

// console.log("fieldMap has:", key) before L460
// and console.log("No fieldMap:", key) before L463
fieldMap has: $meta // This is problematic
fieldMap has: source
fieldMap has: langchain_text
No fieldMap: langchain_primaryid
fieldMap has: langchain_vector

Here I observed two issues. First, as described by the issue itself, langchain_primaryid is not present in the final fields passed to the gRPC client. Second, I noticed that one of the other runs gave me this output:

fieldMap has: source
fieldMap has: langchain_vector
fieldMap has: langchain_text
No fieldMap: langchain_primaryid
fieldMap has: $meta

// console.log(rowData.$meta)
{}

This is because $meta is registered in the field map:

// dynamic field is enabled, create $meta field
const isDynamic = collectionInfo.schema.enable_dynamic_field;
if (isDynamic) {
fieldMap.set(DEFAULT_DYNAMIC_FIELD, {
name: DEFAULT_DYNAMIC_FIELD,
type: 'JSON',
elementType: 'None',
data: [], // value container
nullable: false,
});
}

Hence, when key is $meta, row[key] = originRow[key] in L460 will be executed, which overwrites the existing state of row.$meta. In other words, if present in the original data, the $meta key can overwrite itself in the loop. Since the order of the loop does not seem to be guaranteed, the primary key can be completely lost if it's processed before the $meta key.

For this second issue, I am planning to open a PR to check for key === dynamicFieldName in case a user does something like the example, where they directly modify data from client.query (which includes the $meta field). But for the first issue, I think it might be related to the design of the !v.is_primary_key || !v.autoID condition.

Milvus-node-sdk version:

@zilliz/milvus2-sdk-node 2.5.2

Milvus version:

Zilliz Cloud Vector Database(Compatible with Milvus 2.4)

@shanghaikid
Copy link
Contributor

Ok, I will test it and make a fix soon.

rakuzen25 added a commit to rakuzen25/milvus-sdk-node that referenced this issue Dec 26, 2024
rakuzen25 added a commit to rakuzen25/milvus-sdk-node that referenced this issue Dec 26, 2024
…mic row

see milvus-io#399

Signed-off-by: Frank <25704248+rakuzen25@users.noreply.github.com>
@rakuzen25
Copy link
Author

Hi @shanghaikid - thank you for the fixes. Are there any plans on when the next version will be published?

@shanghaikid
Copy link
Contributor

Hi @shanghaikid - thank you for the fixes. Are there any plans on when the next version will be published?

tomorrow:)

@shanghaikid
Copy link
Contributor

New version released
if you are using milvus 2.5.x, https://github.com/milvus-io/milvus-sdk-node/releases/tag/v2.5.3
if you are using milvus 2.4.x, https://github.com/milvus-io/milvus-sdk-node/releases/tag/v2.4.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants