Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Inference script logging #194

Merged
merged 4 commits into from
Sep 26, 2022
Merged

Conversation

af-a
Copy link
Contributor

@af-a af-a commented Sep 22, 2022

Description

In the current feature/vinn implementation, run_prediction.py (inference) logs are saved in ./FastSurferVINN/logs/expr_fastsurfer.log. The main directory (FastSurferVINN) only can be changed by modifying the LOG_DIR variable in the model config files (such as this one) or setting the out_dir argument. If FastSurfer is run multiple times with the same config, the same file is appended to.
In addition, a separate log file is already saved within run_fastsurfer.sh under $SUBJECT_DIR/scripts/deep-seg.log.

Following some discussions with @m-reuter, we decided it would be better to log to one file within run_prediction.py, whose path should be specified through an argument: log_name. If not provided, the logs will not be written to a file; this will be the default behaviour.
In the call to run_prediction.py in run_fastsurfer.sh, this argument will be set to $SUBJECT_DIR/scripts/deep-seg.log.
One advantage of writing the logs within the python script instead of dumping the output of its command within the shell script (run_fastsurfer.sh) is that the tqdm output is skipped. In the latter case, the output is included but is not formatted properly (see example below in screenshot below).

Changes

This PR applies these changes, which involve the following:

  • In logging.py: Modifying the setup_logging function to accept a single argument: log_file_path. If an empty string is provided, logging messages are only printed on screen and not saved to a file. The LOG_DIR variable is not used in this function anymore.
  • In run_prediction.py: implementing the optional arg log_name and adapting the setup_logging call.
  • In run_fastsurfer: adding log_name arg to the run_prediction.py call and removing the piping of its output to the log file.

Note:

  • If multiple subjects are processed using run_prediction.py in a single, all logs are saved to the same file (if specified).
  • If the specified log_name file exists, it is overwritten.

Extra

tqdm output in deep-seg.log when written from run_fastsurfer.sh

Screenshot from 2022-09-22 15-24-17

The function now accepts a single argument: log_file_path.
If an empty string is provided, logging messages are only printed on stdout
and not saved to a file.
@LeHenschel
Copy link
Member

LeHenschel commented Sep 22, 2022

If multiple subjects are processed using run_prediction.py in a single, all logs are saved to the same file (if specified).

This does not make sense to me. Let's say I have a GPU and a CPU cluster. I create all my subjects segmentations on the GPU cluster first in FreeSurfer/FastSurfer conform output (i.e. storing the output in $sid/mri/aparc.DKTatlas+aseg.mgz). In this case I want the log-files for the segmentation in $sid/scripts/deep-seg.log and not all appended to one file. Depending on how many subjects I have this might get massive and is really hard to find a single subject which might have failed. What is the point of appending all outputs to one file?

@af-a
Copy link
Contributor Author

af-a commented Sep 22, 2022

This does not make sense to me. Let's say I have a GPU and a CPU cluster. I create all my subjects segmentations on the GPU cluster first in FreeSurfer/FastSurfer conform output (i.e. storing the output in $sid/mri/aparc.DKTatlas+aseg.mgz). In this case I want the log-files for the segmentation in $sid/scripts/deep-seg.log and not all appended to one file. Depending on how many subjects I have this might get massive and is really hard to find a single subject which might have failed. What is the point of appending all outputs to one file?

One issue may be that, since we now have to specify log_name when calling run_prediction.py, we would need to somehow define the log paths for each subject in a multiple-subject case.
This could be solved for example by assuming that the input directory has the $DIR/$SID/scripts structure in any multiple subject case, and saving to deep-seg.log in each.

@LeHenschel
Copy link
Member

Yes, the output directory is known (passed via --outdir or --pred_name), same with the subject ID, so why not just use this for the logging? i.e. create a separate scripts directory under subjectid as before. Could just have a log flag to indicate if you want to generate the log-file or not

@dkuegler
Copy link
Member

dkuegler commented Sep 22, 2022

$SUBJECT_DIR/scripts/deep-seg.log

Why is the path scripts and not logs? and similarly

One issue may be that, since we now have to specify log_name when calling run_prediction.py, we would need to somehow define the log paths for each subject in a multiple-subject case.

In any case, with these replacements, the use can have either: per-subject or general logging, it is just a question what would be the default.
As per @LeHenschel, I think the per-subject logging is better for accessibility of logs.

whose path should be specified through an argument: log_name.

I understand your reasoning, but I would prefer this to be an environment variable, I would just additionally make it an environment variable. Also, maybe you even want to allow "$SUBJECT_DIR" to be a replaced by the current subject directory withing the script....

within the shell script (run_fastsurfer.sh) is that the tqdm output is skipped

I don't quite understand, tqdm has options to work with the logging... that is why you do the

with logging_redirect_tqdm():

What you are doing here is not logging, but just "printing" use tqdm.wite() https://tqdm.github.io/docs/tqdm/#write instead of print = e.g. from tqdm import write as print

Copy link
Member

@dkuegler dkuegler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@af-a
Copy link
Contributor Author

af-a commented Sep 22, 2022

Yes, the output directory is known (passed via --outdir or --pred_name), same with the subject ID, so why not just use this for the logging? i.e. create a separate scripts directory under subjectid as before. Could just have a log flag to indicate if you want to generate the log-file or not

Right, but to keep with the Freesurfer directory structure this would have to be handled differently for the two cases:
i) out_dir: log in out_dir/$SID/scripts/deep-seg.log
ii) pred_name: log in in pred_name/../scripts/deep-seg.log (assuming segmentations are usually saved in $SID/mri/pred.mgz).

The original idea was a boolean --log flag, but we later decided to infer whether to save the file from whether log_name was provided (@m-reuter).

@af-a
Copy link
Contributor Author

af-a commented Sep 22, 2022

@dkuegler

Why is the path scripts and not logs?

To be consistent with the Freesurfer directory structure (recon-all.log and recon-surf.log are in scripts)

As per @LeHenschel, I think the per-subject logging is better for accessibility of logs.

I see your points.

I understand your reasoning, but I would prefer this to be an environment variable, I would just additionally make it an environment variable. Also, maybe you even want to allow "$SUBJECT_DIR" to be a replaced by the current subject directory withing the script....

Maybe it would be less convenient to have to set the log file path (if the user wants a custom path) with an environment variable? I think a script argument may be better especially if run_prediction.py is used directly.

About tqdm: the idea is that the progress bar is probably more useful on the GUI, and maybe not so much in the log file and so it can be excluded (if we log only from run_prediction.py, it already is, since LOGGER does not register its prints).

@LeHenschel
Copy link
Member

LeHenschel commented Sep 22, 2022

I agree with @AhmedFaisal95. Command line arguments are probably easier for most people than ENV variables.

For the logging, we could also do the same as for the --pred_name/orig_name routine:

  1. If --log points to a file, write everything to it
  2. If --log is a directory, append sbjid/scripts/deep-seg.log for each individual case
  3. If --log is empty (i.e. not specified), do not log to file (only stdout)

@dkuegler
Copy link
Member

About tqdm: the idea is that the progress bar is probably more useful on the GUI, and maybe not so much in the log file and so it can be excluded (if we log only from run_prediction.py, it already is, since LOGGER does not register its prints).

I thought tqdm was superior on both command line and logfile level.

@AhmedFaisal95 Is there any question left on why the stdout with tqdm is "broken" or how to fix it, or is this resolved?

@af-a
Copy link
Contributor Author

af-a commented Sep 22, 2022

@dkuegler

I thought tqdm was superior on both command line and logfile level.

@AhmedFaisal95 Is there any question left on why the stdout with tqdm is "broken" or how to fix it, or is this resolved?

No technical question; if its output should still be included in the log file, I can figure out how to fix the formatting in the written file (for e.g. what you mentioned above). (Note: the stdout output in the terminal is fine.)

I think a question from an earlier discussion is whether its output should be included in the log file at all.

@m-reuter
Copy link
Member

Maybe we should have a meeting about these points. I don't really like run_prediction to process multiple files at once. For the use case described above (GPU cluster for segmentation, followed by CPU cluster for recon-surf) once might as well loop through subjects and call run_fastsurfer --segonly etc. That will guarantee that outputs go into FreeSurfer directory structures and log files will be there. Batch processing inside run_prediction is more targeted at users who want only the segmentation on lots of files. Since we do not provide any statistics (volumes of structures etc), they need their own downstream processing anyway. They know what they are doing and they can get a single log file. Or they can loop over their subjects and call run_prediction individually. In fact, I see no real reason to further support batch processing inside run_prediction.

@dkuegler
Copy link
Member

I think a question from an earlier discussion is whether its output should be included in the log file at all.

I think there was a confusion about log files and logging in shell and python scripts. It should be resolved by your changes that move logging into the python script instead of doing it in the shell script with tee.

Further, the progress bar should not be part of the logfile at all in my opinion.

With respect to processing, this really is a scheduling problem. It would probably be easier, if we keep support for multiple subjects for segmentation-only-mode, just because they are indivdually so fast, that it would be extra hassle to involve a cluster scheduler... (i.e. better usability)

@m-reuter
Copy link
Member

Yes, that is what we decided yesterday in our meeting also. @LeHenschel will work on logging for batch mode.

run_fastsurfer.sh Show resolved Hide resolved
run_fastsurfer.sh Show resolved Hide resolved
@af-a
Copy link
Contributor Author

af-a commented Sep 26, 2022

Just for reference, this is a sample deep-seg.log, showing what it looks like at the current state of this PR:

Log file for fastsurfercnn run_prediction.py
Mon 26 Sep 10:28:53 CEST 2022

python3.8 run_prediction.py --orig_name $SUBJECT_DIR/mri/orig.mgz --pred_name $OUTPUT_DIR/$SUBJECT_ID/mri/aparc.DKTatlas+aseg.deep.mgz --conf_name $OUTPUT_DIR/$SUBJECT_ID/mri/orig.mgz --log_name $OUTPUT_DIR/$SUBJECT_ID/scripts/deep-seg.log --hires --ckpt_sag $FASTSURFER_DIR/checkpoints/FastSurferVINN_training_state_sagittal.pkl --ckpt_ax $FASTSURFER_DIR/checkpoints/FastSurferVINN_training_state_axial.pkl --ckpt_cor $FASTSURFER_DIR/checkpoints/FastSurferVINN_training_state_coronal.pkl --batch_size 1 --cfg_cor $FASTSURFER_DIR/FastSurferCNN/config/FastSurferVINN_coronal.yaml --cfg_ax $FASTSURFER_DIR/FastSurferCNN/config/FastSurferVINN_axial.yaml --cfg_sag $FASTSURFER_DIR/FastSurferCNN/config/FastSurferVINN_sagittal.yaml --run_viewagg_on check --device cuda
[INFO: run_prediction.py:  358]: Checking or downloading default checkpoints ...
[INFO: run_prediction.py:  130]: Output will be stored in: $OUTPUT_DIR
[INFO: run_prediction.py:  105]: Running view aggregation on GPU
[INFO: eval.py:   63]: Using device: cuda
[INFO: run_prediction.py:  376]: Analyzing single subject $SUBJECT_DIR/mri/orig.mgz
[INFO: run_prediction.py:  235]: Successfully saved image as $OUTPUT_DIR/$SUBJECT_ID/mri/orig/001.mgz
[INFO: run_prediction.py:  235]: Successfully saved image as $OUTPUT_DIR/$SUBJECT_ID/mri/orig.mgz
[INFO: run_prediction.py:  188]: Run coronal prediction
[INFO: eval.py:  109]: Loading checkpoint $FASTSURFER_DIR/checkpoints/FastSurferVINN_training_state_coronal.pkl
[INFO: dataset.py:   57]: Loading Coronal with input voxelsize (0.7, 0.7)
[INFO: dataset.py:   66]: Successfully loaded Image from $SUBJECT_DIR/mri/orig.mgz
[INFO: eval.py:  188]: Inference on 310 batches for coronal successful
[INFO: eval.py:  204]: coronal-Inference on $SUBJECT_DIR/mri/orig.mgz finished in 9.1823 seconds
[INFO: run_prediction.py:  194]: Run axial view agg
[INFO: eval.py:  109]: Loading checkpoint $FASTSURFER_DIR/checkpoints/FastSurferVINN_training_state_axial.pkl
[INFO: dataset.py:   53]: Loading Axial with input voxelsize (0.7, 0.7)
[INFO: dataset.py:   66]: Successfully loaded Image from $SUBJECT_DIR/mri/orig.mgz
[INFO: eval.py:  188]: Inference on 310 batches for axial successful
[INFO: eval.py:  204]: axial-Inference on $SUBJECT_DIR/mri/orig.mgz finished in 9.0970 seconds
[INFO: run_prediction.py:  200]: Run sagittal view agg
[INFO: eval.py:  109]: Loading checkpoint $FASTSURFER_DIR/checkpoints/FastSurferVINN_training_state_sagittal.pkl
[INFO: dataset.py:   48]: Loading Sagittal with input voxelsize (0.7, 0.7)
[INFO: dataset.py:   66]: Successfully loaded Image from $SUBJECT_DIR/mri/orig.mgz
[INFO: eval.py:  188]: Inference on 310 batches for sagittal successful
[INFO: eval.py:  204]: sagittal-Inference on $SUBJECT_DIR/mri/orig.mgz finished in 10.8106 seconds
[INFO: run_prediction.py:  235]: Successfully saved image as $OUTPUT_DIR/$SUBJECT_ID/mri/aparc.DKTatlas+aseg.deep.mgz

@af-a af-a merged commit 9b1f350 into Deep-MI:feature/vinn Sep 26, 2022
@af-a
Copy link
Contributor Author

af-a commented Sep 26, 2022

We have decided to merge these changes in for now, especially to fix the current logging behaviour (logging inside the program directory and overwriting the same file).

However, run_prediction.py will currently write a single file if multiple subjects are processed.
A better handling of the multiple subject case will be added in a future extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants