Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear tmp dir #344

Closed
wants to merge 18 commits into from
Closed

Clear tmp dir #344

wants to merge 18 commits into from

Conversation

obasekiosa
Copy link
Contributor

@obasekiosa obasekiosa commented Aug 9, 2021

Description

This creates a scheduled process to clear the /tmp dir of files older than a day every single day
fixes #200 #279

Motivation and Context

Previous to this the issue of large files(pull workflow repositories) being stuck in the /tmp dir caused issues on the server and were cleared manually or with a system cron job. This PR moves that process within the CWL viewer application.

How Has This Been Tested?

Running the application and checking that the /tmp files get cleared as scheduled

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@tetron tetron self-requested a review August 16, 2021 16:08
// wipe tmp dir and log info
try {
// Run command
String[] command = {"find", "/tmp", "-ctime", "+" + TMP_DIR_AGE_LIMIT_DAYS, "-exec", "rm", "-rf", "{}", "+"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I suggest investigating a java-based solution instead of using a subprocess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


@Scheduled(cron = "${cron.clearTmpDir}")
public void clearTmpDir() {
// TODO: find source of tmp dir creation and delete from there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we can change the PR to draft until the TODOs are resolved.

// wipe tmp dir and log info
try {
// Run command
String[] command = {"find", "/tmp", "-ctime", "+" + TMP_DIR_AGE_LIMIT_DAYS, "-exec", "rm", "-rf", "{}", "+"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

if (exitCode == 0) {
inputGobbler.join();
logger.info(inputGobbler.getContent());
logger.info("Successfully Cleared /tmp directory");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Cleared/cleared for consistency with other messages? 😬

Also, once the TODO above about locating the tmp dir is solved, we should probably include the location in the logs to help in case a sysadmin needs to check the directory permissions, modes, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Cleared/cleared for consistency with other messages? 😬

Also, once the TODO above about locating the tmp dir is solved, we should probably include the location in the logs to help in case a sysadmin needs to check the directory permissions, modes, etc.

Hi @kinow ,
The idea was to delete the file within the process that actually creates these files.
But that would be a mistake since the file may still be in use.

Another idea was to delete the file the moment it is no longer in use, but an exact time would be difficult to determine

Hence a solution that works would be to delete all files above a certain age.

so I would be removing the TODO remark.
But before I do that, do you have any comments on a much better approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. The other comment about using a Java native solution is still valid I think.

So instead of forking a new process we could try first using Java to iterate/walk the temp dir, filter for files/dirs older than the threshold, and finally delete or mark them for deletion (I assume we will try to delete it and in case it fails, call deleteOnExit or just log it... maybe Commons IO has something that we can use... just food for thought 👍 )

@@ -48,6 +48,7 @@ spring.data.mongodb.port = 27017
sparql.endpoint = http://localhost:3030/cwlviewer/



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-picking, but unnecessary blank line 😬


String[] command = {"find", directoryPath, "-ctime", "+" + days, "-exec", "rm", "-rf", "{}", "+"};
ProcessBuilder clearProcess = new ProcessBuilder(command);
Process process = clearProcess.start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was in Scheduler.java before in this PR, and then was moved here, so it's not showing @tetron 's review comment. The comment is still visible in the GitHub UI, but will copy it here just in case that disappears:

As discussed, I suggest investigating a java-based solution instead of using a subprocess.


}

private void deleteWithinDirectoryCMD(String directoryPath, int days) throws IOException, InterruptedException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably CMD -> Cmd or Command? I don't have a strong preference for CMD or Cmd, but given we have classes like CwlViewerApplication and not CWLViewerApplication, I take it the project is using the convention of keeping only the first letter of an abbreviation upper-case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I searched this file in GitHub UI for where it was used... but apparently it's not used anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Got it! We are now using deleteWithinDirectory. IMO we probably don't need/want this method (and the one above if not used elsewhere) now that we have deleteWithinDirectory. So probably delete deleteWithinDirectoryCMD and other private functions that are used only in deleteWithinDirectoryCMD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do that. Left them both for the review.

if (files != null) {
for (File subfile : files) {
if (subfile.isDirectory()) {
deleteWithinDirectory(subfile, days);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation.

subfile.delete();
logger.info("Deleted directory " + subfile.getPath());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if/for/if`... Could possibly be rewritten in Java 8's streams and filters... but not a blocker, can be done in a follow-up.


public FileUtils(Logger logger) {
this.logger = logger;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary? ☝️

Normally utility classes have private constructors and static public methods. But since this is not using static methods, I think we should delete this constructor, and create a logger here too, as it's done in other files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled a bit with this decision because

  1. I wanted the ability to log the individual steps in a way that should show the scheduled process
  2. I didn't want the schedule function to deal with exceptions.

but looking at your comment. I guess I could have different logs at the scheduled function level and also the file utils level. That way the util logs would be nested and surrounded by the scheduled functions logs


public class FileUtils {

private final Logger logger;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Logger logger;
private final Logger logger = LoggerFactory.getLogger(this.getClass());

public void clearTmpDir() {
// wipe tmp dir and log info
FileUtils utils = new FileUtils(logger);
utils.clearDirectory("/tmp", TMP_DIR_AGE_LIMIT_DAYS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run cwlviewer in my laptop, directly from IntelliJ, executing the SpringBoot application. If I understood correctly, here if I were to leave the process running until midnight, then it would go over my /tmp directory removing any files older than the age limit.

I have other applications running that also use /tmp to store temporary files. That would probably cause issues for me. So in this case, I would probably store temporary data for cwlviewer elsewhere, and expect the Scheduler to be able to cope with that too.

So I think we should:

  • confirm cwlviewer is using /tmp (not sure if anyone uses Windows with cwlviewer, but someone could be using a different TMPDIR in linux too, normally in Java we rely on java.io.tmpdir)
  • if we are using /tmp, or something that resolves /tmp, then we need to make sure it doesn't affect other processes running too, so we either
    • delete only cwlviewer files
    • move the temporary directory to another configurable location

kinow added a commit to kinow/cwlviewer that referenced this pull request Jun 21, 2022
… and Commons IO to traverse and delete old files and directories.
kinow added a commit to kinow/cwlviewer that referenced this pull request Jul 11, 2022
… and Commons IO to traverse and delete old files and directories.
kinow added a commit to kinow/cwlviewer that referenced this pull request Aug 14, 2022
… and Commons IO to traverse and delete old files and directories.
mr-c pushed a commit to kinow/cwlviewer that referenced this pull request Dec 30, 2022
… and Commons IO to traverse and delete old files and directories.
mr-c pushed a commit that referenced this pull request Dec 30, 2022
…rse and delete old files and directories.
@mr-c mr-c closed this in #426 Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/tmp fills up unecessarily
4 participants