Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear tmp dir #344

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions src/main/java/org/commonwl/view/Scheduler.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package org.commonwl.view;


import org.commonwl.view.util.StreamGobbler;
import org.commonwl.view.util.FileUtils;
import org.commonwl.view.workflow.QueuedWorkflowRepository;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand All @@ -9,6 +11,11 @@
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

import java.io.File;


import java.io.IOException;

import java.util.Calendar;
import java.util.Date;

Expand All @@ -24,6 +31,9 @@ public class Scheduler {
@Value("${queuedWorkflowAgeLimitHours}")
private Integer QUEUED_WORKFLOW_AGE_LIMIT_HOURS;

@Value("${tmpDirAgeLimitDays}")
private Integer TMP_DIR_AGE_LIMIT_DAYS;

@Autowired
public Scheduler(QueuedWorkflowRepository queuedWorkflowRepository) {
this.queuedWorkflowRepository = queuedWorkflowRepository;
Expand Down Expand Up @@ -55,4 +65,12 @@ public void removeOldQueuedWorkflowEntries() {
logger.info(queuedWorkflowRepository.deleteByTempRepresentation_RetrievedOnLessThanEqual(removeTime)
+ " Old queued workflows removed");
}


@Scheduled(cron = "${cron.clearTmpDir}")
public void clearTmpDir() {
// wipe tmp dir and log info
FileUtils utils = new FileUtils(logger);
utils.clearDirectory("/tmp", TMP_DIR_AGE_LIMIT_DAYS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run cwlviewer in my laptop, directly from IntelliJ, executing the SpringBoot application. If I understood correctly, here if I were to leave the process running until midnight, then it would go over my /tmp directory removing any files older than the age limit.

I have other applications running that also use /tmp to store temporary files. That would probably cause issues for me. So in this case, I would probably store temporary data for cwlviewer elsewhere, and expect the Scheduler to be able to cope with that too.

So I think we should:

  • confirm cwlviewer is using /tmp (not sure if anyone uses Windows with cwlviewer, but someone could be using a different TMPDIR in linux too, normally in Java we rely on java.io.tmpdir)
  • if we are using /tmp, or something that resolves /tmp, then we need to make sure it doesn't affect other processes running too, so we either
    • delete only cwlviewer files
    • move the temporary directory to another configurable location

}
}
105 changes: 105 additions & 0 deletions src/main/java/org/commonwl/view/util/FileUtils.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
package org.commonwl.view.util;


import java.io.File;
import java.nio.file.Files;
import java.nio.file.attribute.FileTime;
import java.io.IOException;
import java.time.Instant;
import java.time.Clock;
import java.time.Duration;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.commonwl.view.util.StreamGobbler;

public class FileUtils {

private final Logger logger;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Logger logger;
private final Logger logger = LoggerFactory.getLogger(this.getClass());


public FileUtils(Logger logger) {
this.logger = logger;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary? ☝️

Normally utility classes have private constructors and static public methods. But since this is not using static methods, I think we should delete this constructor, and create a logger here too, as it's done in other files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled a bit with this decision because

  1. I wanted the ability to log the individual steps in a way that should show the scheduled process
  2. I didn't want the schedule function to deal with exceptions.

but looking at your comment. I guess I could have different logs at the scheduled function level and also the file utils level. That way the util logs would be nested and surrounded by the scheduled functions logs



public void clearDirectory(String directoryPath, long days) {
try {
logger.info("Clearing "+ directoryPath + " directory for content older than " + days + " day" + (days > 1 ? "s" : "") + "...");
File file = new File(directoryPath);
deleteWithinDirectory(file, days);
logger.info("Successfully cleared " + directoryPath + " directory");
} catch(IOException e) {
logger.warn("Could not clear " + directoryPath + " directory");
logger.error("Error running clear " + directoryPath + " dir process", e);
}

}


private void deleteWithinDirectory(File file, long days) throws IOException{
File[] files = file.listFiles();

if (files != null) {
for (File subfile : files) {
if (subfile.isDirectory()) {
deleteWithinDirectory(subfile, days);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation.

}

long daysOld = fileAge(subfile);

if (daysOld >= days ) {
if (!subfile.isDirectory()) {
logger.info("Deleting file " + subfile.getPath());
subfile.delete();
logger.info("Deleted file " + subfile.getPath());
} else {
File[] contents = subfile.listFiles();
if (contents != null && contents.length == 0) {
logger.info("Deleting directory " + subfile.getPath());
subfile.delete();
logger.info("Deleted directory " + subfile.getPath());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if/for/if`... Could possibly be rewritten in Java 8's streams and filters... but not a blocker, can be done in a follow-up.

}
}
}
}

private long fileAge(File file) throws IOException {

FileTime t = Files.getLastModifiedTime(file.toPath());
Instant fileInstant = t.toInstant();
Instant now = (Clock.systemUTC()).instant();
Duration difference = Duration.between(fileInstant, now);
long days = difference.toDays();
return days;

}

private void deleteWithinDirectoryCMD(String directoryPath, int days) throws IOException, InterruptedException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably CMD -> Cmd or Command? I don't have a strong preference for CMD or Cmd, but given we have classes like CwlViewerApplication and not CWLViewerApplication, I take it the project is using the convention of keeping only the first letter of an abbreviation upper-case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I searched this file in GitHub UI for where it was used... but apparently it's not used anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Got it! We are now using deleteWithinDirectory. IMO we probably don't need/want this method (and the one above if not used elsewhere) now that we have deleteWithinDirectory. So probably delete deleteWithinDirectoryCMD and other private functions that are used only in deleteWithinDirectoryCMD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do that. Left them both for the review.


String[] command = {"find", directoryPath, "-ctime", "+" + days, "-exec", "rm", "-rf", "{}", "+"};
ProcessBuilder clearProcess = new ProcessBuilder(command);
Process process = clearProcess.start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was in Scheduler.java before in this PR, and then was moved here, so it's not showing @tetron 's review comment. The comment is still visible in the GitHub UI, but will copy it here just in case that disappears:

As discussed, I suggest investigating a java-based solution instead of using a subprocess.


// Read output from the process using threads
StreamGobbler inputGobbler = new StreamGobbler(process.getInputStream());
StreamGobbler errorGobbler = new StreamGobbler(process.getErrorStream());
errorGobbler.start();
inputGobbler.start();

// Wait for process to complete
int exitCode = process.waitFor();
if (exitCode == 0) {
inputGobbler.join();
logger.info(inputGobbler.getContent());
logger.info("Successfully cleared " + directoryPath + " directory");
} else {
errorGobbler.join();
logger.warn("Could not clear " + directoryPath + " directory");
logger.warn(errorGobbler.getContent());

}
}

}
2 changes: 1 addition & 1 deletion src/main/java/org/commonwl/view/util/StreamGobbler.java
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ public class StreamGobbler extends Thread {
private final String lineSeparator = System.getProperty("line.separator");

private InputStream is;
private String content = "";
private String content = ""; // TODO: optimize - update to string builder

public StreamGobbler(InputStream is) {
this.is = is;
Expand Down
10 changes: 9 additions & 1 deletion src/main/resources/application.properties
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ spring.data.mongodb.port = 27017
sparql.endpoint = http://localhost:3030/cwlviewer/



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-picking, but unnecessary blank line 😬

#=======================
# Scheduler settings
#=======================
Expand All @@ -57,5 +58,12 @@ sparql.endpoint = http://localhost:3030/cwlviewer/
# The expression below implies every hour at the 0th second and 0th minute i.e (01:00:00, 02:00::00,... etc)
cron.deleteOldQueuedWorkflows = 0 0 * * * ?

# The expression below implies every day at the 0th second, 0th minute and 24th(0th) hour i.e ( time 00:00:00, every day)
cron.clearTmpDir = 0 0 0 * * ?

# Age limit for queued workflows in hours.
queuedWorkflowAgeLimitHours = 24
queuedWorkflowAgeLimitHours = 24

# Age limit for tmp directories in days.
tmpDirAgeLimitDays = 1