Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create temporary directory in a more safe manner. #88

Merged
merged 3 commits into from
Jul 8, 2021
Merged

Conversation

jerri
Copy link
Contributor

@jerri jerri commented May 1, 2021

Also make sure the temporary directory gets removed at the end of the script, unless prohibited.

Fixes #87

The code should still be compatible with python2 and python3.

Also make sure the temporary directory gets removed at the end of the script, unless prohibited.
@ickc
Copy link
Collaborator

ickc commented May 1, 2021

Default to be backward incompatible might not be a good idea. Although temp dir should be temporary and therefore should not be a problem when removed, some could rely this side effect as a cache. Removing this by default could means unnecessary work each time it is invoked.

This is not spoken from experience, if you believe people using this expected the temp to be cleared, then it’d be fine.

@jgm
Copy link
Owner

jgm commented May 2, 2021

That's a good point: I believe many people may want the temp dir to persist. Suppose you're converting tikz diagrams to images, for example. If the temp dir disappears after every pandoc run, then every time you run pandoc the filter will need to regenerate every diagram. With the persisting temp directory, only diagrams that are changed (or new) need be regenerated.

@jerri
Copy link
Contributor Author

jerri commented May 2, 2021

I understand. Problematic situation. I don't have too much experience with a lot of filters. Too be honest I just have experience with a filter for mermaid. The directory created for the conversion of the files caught me by surprise and I immediately thought there was a bug in the filter I used. Right now I added an additional script in all my automations to remove those files after use. What is odd, in my opinion, is that the file is generated in the directory where the script is executed. In the worst case, you might not even have permissions to create the directory and would just get a strange error message, that the directory couldn't be created.
If it is expected functionality, that the directory gets created and is used, I guess, this PR doesn't make a lot of sense. On the other hand I wonder, is there really a caching effect? Who is checking if the file already exists and prevents the filter to recreate the file? Is this part of pandocfilter library? Haven't checked right now, I have to admit.

@jerri
Copy link
Contributor Author

jerri commented May 2, 2021

Just checked. So tikz.py itself is doing the caching. Mhm... That would mean one would have to add a boolean to allow for the old behaviour of creating the file in the "predictable" location. Mhm... I understand the reason, but still am somehow unsure if I like the approach. The original name of the directory is not very special and could easily clash with an existing directory, which would be filled up by strange files.
And worse, if you are using several filters, you get several directories filled with files that might not be necessary in some cases.
I wonder if it would make more sense to add an environment variable that could enable the new behaviour with the real temp directories and could be set by people like me to fully avoid the generation of those directories. In that case it even would be fully transparent for the filters. What do you think?

@ickc
Copy link
Collaborator

ickc commented May 2, 2021

The world of pandoc filters is fragmented. The easiest way probably is to have this function defaulted to the old behavior, then submit a PR To the filter that has buggy behavior, in this case a mermaid filter and hope they merge it, if abandoned, fork it.

Also, if you have pandoc filter in a location that it has no power to create a directory, it’s probably not correct. Eg you might have accidentally install it using sudo pip. It is often even recommended, but make sure you only sudo install something you trust very much. A better approach is pip install --user or have virtual environments like conda.

@jerri
Copy link
Contributor Author

jerri commented May 3, 2021

Hello ickc: Maybe a misunderstanding regarding the permission topic. The pandocfilter-librarys code creates the directory in the current directory you are in, when you start the filter. So, if you try to use pandoc while you are e.g. in /usr/bin, it will not work, as users usually don't have permission to create files or directories in that environment.
This doesn't have anything to do with where the library has been installed.
In any case, I will think about the idea with the environment variable, because this probably fixes all the problems while still being completely backwards compatible.

@ickc
Copy link
Collaborator

ickc commented May 4, 2021

A way to control the behavior is definitely needed, env var is easiest among all options.

if you believe the upstream filter should have cleared the temp, a PR will be useful to benefit other people. But if it is subjective (Eg some want them be persistent as a cache) then of course nothing can be done there.

With the environment variable `PANDOCFILTER_CLEANUP` set to `true` the
user can force pandoc-filter to create a real temporary directory and
clean it up after the run instead of leaving `...-images`-directories
behind.
@jerri
Copy link
Contributor Author

jerri commented May 5, 2021

I implemented the environment variable idea with the env variable PANDOCFILTER_CLEANUP, that a user could e.g. set in his .bashrc. This variable will prevent the creation of the images-directories and will therefore also disable any possible caching.

@jerri
Copy link
Contributor Author

jerri commented May 5, 2021

Sorry, I forgot to add documentation to the README.

@ickc
Copy link
Collaborator

ickc commented Jul 8, 2021

There's a minor problem that the error is raised instead of passed. I think it follows the design of pandoc that garbage in, garbage out, but not "error out". So it tries to finish performing the job even if there's an error, and in that case it would be leaving as is. I can change that behavior after merging since that the PR is already some time ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_filename4code should have an accompanying function to get rid of the temporary directories
3 participants