Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy import error on tests #1467

Closed
liquidsec opened this issue Jun 17, 2024 · 2 comments
Closed

Numpy import error on tests #1467

liquidsec opened this issue Jun 17, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@liquidsec
Copy link
Collaborator

All tests are currently failing due to the following error, indicating a dependency was updated which broke this import:

[ERRR] Error in unstructured.handle_event(FILESYSTEM("{'path': '/tmp/.bbot_test/scans/testexcavaterawdata_test_g2ykldx164/filedownload...", module=filedownload, tags={'in-scope', 'filedownload', 'file'})): /home/runner/work/bbot/bbot/bbot/modules/unstructured.py:101:handle_event(): numpy.core.multiarray failed to import
[TRCE] concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/runner/work/bbot/bbot/bbot/modules/unstructured.py", line 123, in extract_text
    from unstructured.partition.auto import partition
  File "/home/runner/.cache/pypoetry/virtualenvs/bbot-pd-UZ8Fz-py3.9/lib/python3.9/site-packages/unstructured/partition/auto.py", line 83, in <module>
    from unstructured.partition.pdf import partition_pdf
  File "/home/runner/.cache/pypoetry/virtualenvs/bbot-pd-UZ8Fz-py3.9/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 50, in <module>
    from unstructured.partition.pdf_image.pdf_image_utils import (
  File "/home/runner/.cache/pypoetry/virtualenvs/bbot-pd-UZ8Fz-py3.9/lib/python3.9/site-packages/unstructured/partition/pdf_image/pdf_image_utils.py", line 13, in <module>
    import cv2
ImportError: numpy.core.multiarray failed to import
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/bbot/bbot/bbot/scanner/scanner.py", line 1062, in _acatch
    yield
  File "/home/runner/work/bbot/bbot/bbot/modules/base.py", line 629, in _worker
    await handle_event_task
  File "/home/runner/work/bbot/bbot/bbot/modules/unstructured.py", line 101, in handle_event
    content = await self.scan.run_in_executor_mp(extract_text, file_path)
ImportError: numpy.core.multiarray failed to import
@liquidsec liquidsec added the bug Something isn't working label Jun 17, 2024
@TheTechromancer
Copy link
Collaborator

TheTechromancer commented Jun 17, 2024

unstructured has a lot of dependencies, which makes it unwieldy. However its functionality is really important to BBOT, so we need to find a way to prevent this kind of thing.

As we keep adding BBOT modules, there is more and more of a need for some kind of system that will let us cleanly package these bigger tools.

Docker is one solution, but we should keep an eye out for something more lightweight that doesn't require a running daemon. Something like zipapp, but better?

Ideally, this solution would not rely on the tests of the upstream package maintainer. Instead, it would cache a known-working version of the tool (including all its dependencies), and only upgrade it if all of our tests passed.

Getting a system like this in place will help us package/deploy these things in a reproduceable way across multiple linux distros, and make sure they don't break unexpectedly when an upstream dependency collapses.

@liquidsec
Copy link
Collaborator Author

Closing the issue as it seems to have resolved itself. Someone must have fixed their upstream oopsie ⚡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants