Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pdf2parquet to Docling v2 #756

Merged
merged 16 commits into from
Oct 31, 2024
Merged

Update pdf2parquet to Docling v2 #756

merged 16 commits into from
Oct 31, 2024

Commits on Oct 29, 2024

  1. update to docling v2 and expose new parameters

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    f90f134 View commit details
    Browse the repository at this point in the history
  2. update to docling v2

    new parameters and input formats
    faster backend
    revalidated the test results
    
    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    1436480 View commit details
    Browse the repository at this point in the history

Commits on Oct 30, 2024

  1. add lock

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    8c22f0d View commit details
    Browse the repository at this point in the history
  2. add batch_size

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    d55e6bd View commit details
    Browse the repository at this point in the history
  3. update parameter in README

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    261230c View commit details
    Browse the repository at this point in the history
  4. fix multilock with default parameters

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    e396e16 View commit details
    Browse the repository at this point in the history
  5. use multilock with fix

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    5095c1b View commit details
    Browse the repository at this point in the history
  6. propagate new param to kfp

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    f62e6de View commit details
    Browse the repository at this point in the history
  7. update new models download in Dockerfile

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    7e5ea90 View commit details
    Browse the repository at this point in the history
  8. update doc_chunk with new docling v2

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    e929903 View commit details
    Browse the repository at this point in the history
  9. update to 2.3.1 with initialize_pipeline

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    b4eb978 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    622ea4c View commit details
    Browse the repository at this point in the history
  11. remove debug log

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    8395a25 View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2024

  1. improve parsing of metadata

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 31, 2024
    Configuration menu
    Copy the full SHA
    4c693f9 View commit details
    Browse the repository at this point in the history
  2. add test case for batch_size

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 31, 2024
    Configuration menu
    Copy the full SHA
    26b429a View commit details
    Browse the repository at this point in the history
  3. notify users about the deprecated argument

    Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
    dolfim-ibm committed Oct 31, 2024
    Configuration menu
    Copy the full SHA
    269d732 View commit details
    Browse the repository at this point in the history