Skip to content
This repository has been archived by the owner on Mar 8, 2024. It is now read-only.

Process immense terms during database initialisation #17

Closed
To-om opened this issue Dec 14, 2021 · 0 comments
Closed

Process immense terms during database initialisation #17

To-om opened this issue Dec 14, 2021 · 0 comments
Assignees
Labels
bug Something isn't working contains-docs enhancement New feature or request

Comments

@To-om
Copy link
Contributor

To-om commented Dec 14, 2021

Request Type

Bug

Problem Description

The index engine fails to process the document if it contains a "non full-text" field with more than 32766 bytes.

During document creation, the document won't be indexed and become invisible (even if it stored in the database).
During a data reindex, the process stops and a part of the data is not indexed.

A huge data can break the application.

Solution

During database initialisation, add a process that finds immense terms and fixes them. Several strategies can be applied:

  • truncate: truncate the data
  • delete: remove the document
  • log: show the document in logs

A custom strategy (store the data in a file storage for example) can also be considered but it cannot be implemented in Scalligraph.

The process requires a full scan of the database (because the index cannot be used). It is triggered only if the configuration is present. The configuration consists of field name and the strategy to apply on. The strategy can contain an optional parameter which define the size threshold in characters (a character may occupy 4 bytes in UTF-8).

db.janusgraph {
  immenseTermProcessing: {
    data:  "delete(2048)"   // Delete document that contains a field "data" with size greater that 2048
    title: "truncate(4096)" // Truncate the field "title"
    name:  "truncate"       // Truncate the field "name" (default threshold is 8191)
  }
}

IMPORTANT The configuration should be present only for one startup to fix the data. It should be removed as soon as the process if finished.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working contains-docs enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants