Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't upload documents with different languages #352

Open
2 of 6 tasks
thomashacker opened this issue Dec 14, 2024 · 0 comments
Open
2 of 6 tasks

Can't upload documents with different languages #352

thomashacker opened this issue Dec 14, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@thomashacker
Copy link
Collaborator

Description

Import fails at the tokenization step when the document contains multiple languages. This is due merging different spaCy docs with different vocab objects.

Installation

  • pip install goldenverba
  • pip install from source
  • Docker installation

If you installed via pip, please specify the version:

Weaviate Deployment

  • Local Deployment
  • Docker Deployment
  • Cloud Deployment

Configuration

Reader: Any
Chunker: /
Embedder: /
Retriever: /
Generator: /

Steps to Reproduce

Upload longer documents over 500.000 tokens with different languages

Additional context

This will be fixed with v2.2

@thomashacker thomashacker added the bug Something isn't working label Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant