Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automation of file upload and embedding generation for MyData collection through api #1897

Open
llmwesee opened this issue Nov 8, 2024 · 0 comments

Comments

@llmwesee
Copy link

llmwesee commented Nov 8, 2024

When I'm uploading files through the '/upload_api' and '/add_file_api' then they upload the files in the user_path and embedding are automatically generated in the vector store. However, the same i'm not able to replicate in for MyData collection.

Below is the sample code for uploading files in UserData through api:

`import os
import time
from tqdm import tqdm
from gradio_client import Client

class DocumentUploader:
def init(self, host_url: str, api_key: str = 'EMPTY'):
self.client = Client(host_url)
self.api_key = api_key

def upload_document(self, local_file_path: str) -> str:
    """Uploads a document to the server and returns the server file path."""
    with tqdm(total=100, desc=f"Uploading {os.path.basename(local_file_path)}", unit='%', ncols=80) as pbar:
        _, server_file_path = self.client.predict(local_file_path, api_name='/upload_api')
        pbar.update(100)
    return server_file_path

def add_document_and_ocr(self, server_file_path: str, loaders: list) -> dict:
    """Adds the document to the server with OCR processing."""
    with tqdm(total=100, desc=f"Processing {os.path.basename(server_file_path)}", unit='%', ncols=80) as pbar:
        res = self.client.predict(
            server_file_path, "UserData", True, 512, True, *loaders, api_name='/add_file_api'
        )
        pbar.update(100)
    print("Document processing completed and is ready for querying.")
    return res

def process_all_files_in_folder(self, folder_path: str):
    """Process all PDF files in the specified folder."""
    pdf_files = [f for f in os.listdir(folder_path) if f.endswith('.pdf')]
    if not pdf_files:
        print("No PDF files found in the folder.")
        return

    print(f"Found {len(pdf_files)} PDF files. Starting upload and processing...")

    loaders = [None] * 6  # Adjust loaders as needed
    for pdf_file in pdf_files:
        file_path = os.path.join(folder_path, pdf_file)
        server_file_path = self.upload_document(file_path)
        self.add_document_and_ocr(server_file_path, loaders)`

Please help me for automate the same in the MyData collection!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant