[MRG] add support for file_bytes argument with managed_file_context() #270

cscanlin · 2021-10-08T07:17:05Z

cscanlin · 2021-10-08T07:18:27Z

camelot/handlers.py

@@ -24,19 +27,33 @@ class PDFHandler(object):

    Parameters
    ----------
-    filepath : str
-        Filepath or URL of the PDF file.
+    filepath : str | pathlib.Path, optional (default: None)


I elected to keep the arguments separate instead of combining them like pandas.read_csv (or any of the others) do. Mostly to preserve the existing API kwargs

i.e. I did not want to rename this argument file_path_or_bytes

cscanlin · 2021-10-08T07:19:19Z

camelot/handlers.py

@@ -49,6 +66,28 @@ def __init__(self, filepath, pages="1", password=None):
                self.password = self.password.encode("ascii")
        self.pages = self._get_pages(pages)

+    @contextmanager


This is the meat of it. Variably opens a file handle or passes the bytes through, depending on the case.

cscanlin · 2021-10-08T07:26:10Z

camelot/utils.py

-def download_url(url):
-    """Download file from specified URL.
+def get_url_bytes(url):
+    """Get a stream of bytes for url

    Parameters
    ----------
    url : str or unicode

    Returns
    -------


This change is only moderately involved with my feature, but this is an anti-pattern in my opinion.

with tempfile.NamedTemporaryFile("wb", delete=False) as f: ... filepath = os.path.join(os.path.dirname(f.name), filename) shutil.move(f.name, filepath)

Trying to maintain this file outside of the context manager provided by tempfile is at best not the intention of the module. Using BytesIO is going to incur somewhat more memory strain in these cases, but I could easily see the existing implementation causing bugs in some cases (either now or in the future).

cscanlin · 2021-10-08T09:55:51Z

camelot/handlers.py

@@ -107,7 +146,7 @@ def _save_page(self, filepath, page, temp):
            Tmp directory.

        """
-        with open(filepath, "rb") as fileobj:
+        with self.managed_file_context() as fileobj:


This looks like a bug

chris-decker · 2021-12-21T18:10:56Z

I needed this functionality and since the original author hasn't yet merged it I cloned the repo and merged locally. Seems to be working as intended, thanks for contributing.

clcarver1130 · 2022-04-19T20:47:25Z

Is this still stalled? Would love this functionality if still possible

cscanlin · 2022-04-21T01:03:35Z

Unfortunately this repo looks to be somewhat abandoned, @vinayak-mehta has not merged any code since July 2021.

It's a really useful tool, so it would be a shame to let it decay. Does anybody have interest in being a maintainer?

ramSeraph · 2022-04-21T11:32:34Z

Consider starting a discussion on gitter?

bosd

Thanks for this cocntribution!!

Quick code review, without functionally testing it.
Would love to see if the tests are green.

bosd · 2023-07-15T07:14:43Z

@foarsitter Can you trigger the tests? :Pray:

sisrfeng · 2023-08-30T14:33:13Z

Any update?

foarsitter · 2023-10-06T05:56:07Z

@cscanlin, by any change, do you have time to rebase this and run black/isort?

MartinThoma · 2024-02-25T11:13:39Z

Hey!

As camelot is dead, we try to build a maintained fork at pypdf_table_extraction.

Do you want to open the PR against that branch so that we can merge your improvement?

Johnmaras · 2024-03-22T13:18:14Z

@MartinThoma Hi. Has anyone merged, or plan to do so, this PR on your fork? I could use this feature. I guess I could clone it and open the PR in your project.

bosd · 2024-03-22T14:09:32Z

@MartinThoma Hi. Has anyone merged, or plan to do so, this PR on your fork? I could use this feature. I guess I could clone it and open the PR in your project.

@Johnmaras Please go ahead and open a PR there. 🙂

Johnmaras · 2024-03-26T11:55:41Z

Hi @bosd.
I can open a PR but it looks like there are many conflicts between cscanlin:file-bytes-support and pypdf_table_extraction:main.
If I open the PR do you think a contributor of pypdf_table_extraction could work on resolving the conflicts? I can't currently work on it myself.

Let me know.
Thank you

bosd · 2024-03-26T12:50:24Z

Hi @bosd.
I can open a PR but it looks like there are many conflicts between cscanlin:file-bytes-support and pypdf_table_extraction:main.
If I open the PR do you think a contributor of pypdf_table_extraction could work on resolving the conflicts? I can't currently work on it myself.

Let me know.
Thank you

I'm a contributor / maintainer there. I can have a look to resolve the conflicts. ( May take some time, kinda busy lately)

Johnmaras · 2024-03-26T13:03:36Z

Understood. I'll proceed with the PR as soon as possible.

add support for file_bytes argument with managed_file_context()

3d27547

cscanlin commented Oct 8, 2021

View reviewed changes

bosd approved these changes Jul 14, 2023

View reviewed changes

Merge branch 'master' into file-bytes-support

be4dde9

bosd mentioned this pull request Jul 15, 2023

Fix GH Actions: apt-get update #387

Closed

Merge branch 'master' into file-bytes-support

ab51f3c

This was referenced Mar 26, 2024

add support for file_bytes argument with managed_file_context() Johnmaras/pypdf_table_extraction#1

Closed

add support for file_bytes argument with managed_file_context() py-pdf/pypdf_table_extraction#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] add support for file_bytes argument with managed_file_context() #270

[MRG] add support for file_bytes argument with managed_file_context() #270

cscanlin commented Oct 8, 2021 •

edited

Loading

cscanlin Oct 8, 2021 •

edited

Loading

cscanlin Oct 8, 2021

cscanlin Oct 8, 2021

cscanlin Oct 8, 2021

chris-decker commented Dec 21, 2021

clcarver1130 commented Apr 19, 2022

cscanlin commented Apr 21, 2022

ramSeraph commented Apr 21, 2022

bosd left a comment

bosd commented Jul 15, 2023

sisrfeng commented Aug 30, 2023

foarsitter commented Oct 6, 2023

MartinThoma commented Feb 25, 2024

Johnmaras commented Mar 22, 2024

bosd commented Mar 22, 2024

Johnmaras commented Mar 26, 2024

bosd commented Mar 26, 2024

Johnmaras commented Mar 26, 2024

[MRG] add support for file_bytes argument with managed_file_context() #270

Are you sure you want to change the base?

[MRG] add support for file_bytes argument with managed_file_context() #270

Conversation

cscanlin commented Oct 8, 2021 • edited Loading

cscanlin Oct 8, 2021 • edited Loading

Choose a reason for hiding this comment

cscanlin Oct 8, 2021

Choose a reason for hiding this comment

cscanlin Oct 8, 2021

Choose a reason for hiding this comment

cscanlin Oct 8, 2021

Choose a reason for hiding this comment

chris-decker commented Dec 21, 2021

clcarver1130 commented Apr 19, 2022

cscanlin commented Apr 21, 2022

ramSeraph commented Apr 21, 2022

bosd left a comment

Choose a reason for hiding this comment

bosd commented Jul 15, 2023

sisrfeng commented Aug 30, 2023

foarsitter commented Oct 6, 2023

MartinThoma commented Feb 25, 2024

Johnmaras commented Mar 22, 2024

bosd commented Mar 22, 2024

Johnmaras commented Mar 26, 2024

bosd commented Mar 26, 2024

Johnmaras commented Mar 26, 2024

cscanlin commented Oct 8, 2021 •

edited

Loading

cscanlin Oct 8, 2021 •

edited

Loading