Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission Error with PDF loader #2698

Closed
realglyph123 opened this issue Apr 11, 2023 · 6 comments · Fixed by #6170
Closed

Permission Error with PDF loader #2698

realglyph123 opened this issue Apr 11, 2023 · 6 comments · Fixed by #6170

Comments

@realglyph123
Copy link

realglyph123 commented Apr 11, 2023

I was testing OnlinePDFLoader yesterday iirc and it was working fine. Today I tried experimenting and I keep getting this error

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\REALGL~1\\AppData\\Local\\Temp\\tmp3chr08y0

it may be occurring because the tempfile.NamedTemporaryFile() in pdf.py is still open when the PDF partitioning function is trying to access it

@tmtsmrsl
Copy link

I have the same problem when running PDF loader of local environment (no problem when running on Colab). Do you have any solution yet?

@realglyph123
Copy link
Author

realglyph123 commented Apr 15, 2023

I have the same problem when running PDF loader of local environment (no problem when running on Colab). Do you have any solution yet?

Go to the pdf.py and replace the function using the code below located on the BasePDFLoader class. Note that my current version of langchain is .137

    file_path: str
    web_path: Optional[str] = None
    temp_file: Optional[tempfile.NamedTemporaryFile] = None

    def __init__(self, file_path: str):
        """Initialize with file path."""
        self.file_path = file_path
        if "~" in self.file_path:
            self.file_path = os.path.expanduser(self.file_path)

        if not os.path.isfile(self.file_path) and self._is_valid_url(self.file_path):
            r = requests.get(self.file_path)

            if r.status_code != 200:
                raise ValueError(
                    "Check the url of your file; returned status code %s"
                    % r.status_code
                )

            self.web_path = self.file_path
            self.temp_file = tempfile.NamedTemporaryFile(delete=False)
            self.temp_file.write(r.content)
            self.temp_file.close()
            self.file_path = self.temp_file.name
        elif not os.path.isfile(self.file_path):
            raise ValueError("File path %s is not a valid file or url" % self.file_path)

    def __del__(self) -> None:
        if self.temp_file is not None:
            self.temp_file.close()
            os.unlink(self.temp_file.name)

@nickmuchi87
Copy link

facing the same error for a while as well

@luca-git
Copy link

luca-git commented May 27, 2023

I still have this issue with 0.0.1.81 windows 11 bitdefender AV
issue discussed here: https://stackoverflow.com/questions/76200691/issue-with-loading-online-pdf-in-python-notebook-using-langchain-pypdfloader
Hope there might be a cross platform solution.

@smharvey
Copy link

@realglyph123 Your fix worked for running on my local windows system. Thank you.

hwchase17 pushed a commit that referenced this issue Jun 18, 2023
Fixed PermissionError that occurred when downloading PDF files via http
in BasePDFLoader on windows.

When downloading PDF files via http in BasePDFLoader, NamedTemporaryFile
is used.
This function cannot open the file again on **Windows**.[Python
Doc](https://docs.python.org/3.9/library/tempfile.html#tempfile.NamedTemporaryFile)

So, we created a **temporary directory** with TemporaryDirectory and
placed the downloaded file there.
temporary directory is deleted in the deconstruct.

Fixes #2698

#### Who can review?

Tag maintainers/contributors who might be interested:

  - @eyurtsev
  - @hwchase17
@M9S9D
Copy link

M9S9D commented Jan 27, 2024

i want save my histogram with this code :
big_data=random.uniform(0.0,10.0,10000)

print(big_data)

import matplotlib.pyplot as plt
plt.hist(big_data,500)
plt.savefig("big_data.pdf", format='pdf')
but i have this error :
PermissionError: [Errno 13] Permission denied: 'big_data.pdf'
what should i do ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants