Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Overwrite of Tensorboard logging file does not work #761

Closed
BrotherHa opened this issue Dec 3, 2024 · 1 comment · Fixed by #763
Closed

[BUG]: Overwrite of Tensorboard logging file does not work #761

BrotherHa opened this issue Dec 3, 2024 · 1 comment · Fixed by #763
Assignees
Labels
bug Something isn't working

Comments

@BrotherHa
Copy link
Contributor

BrotherHa commented Dec 3, 2024

What happened?

Hello,

thank you very much for the new version 1.0.0 supporting Tensorboard logging. It's a great help to my work!
On my first tries I have experienced an issue when using the overwrite-parameter of the TensorBoardLoggerSpec set to True. It results in the error shown bellow, when executing for the second time, because the handle on the logging file is still blocked and the file can not be overwritten.
As a work-around I included the following function into TBLogger.jl:

function close_files!(lg::TBLogger)
    # close open streams
    for k=keys(lg.all_files)
        close(lg.all_files[k])
    end
end

I call this function in logger_specs.py at the end of the write_hparams function to release the handle and it seems to work fine.

Best regards

Version

1.0.0

Operating System

Windows

Package Manager

Conda

Interface

Script (i.e., python my_script.py)

Relevant log output

File "C:\Users\x\AppData\Local\miniforge3\envs\masterarbeit_sr\Lib\site-packages\pysr\sr.py", line 2240, in fit
    self._run(X, y, runtime_params, weights=weights, seed=seed, category=category)
  File "C:\Users\x\AppData\Local\miniforge3\envs\masterarbeit_sr\Lib\site-packages\pysr\sr.py", line 1920, in _run
    logger = self.logger_spec.create_logger() if self.logger_spec else None
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\x\AppData\Local\miniforge3\envs\masterarbeit_sr\Lib\site-packages\pysr\logger_specs.py", line 56, in create_logger
    return make_logger(log_dir, self.overwrite, self.log_interval)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\x\.julia\packages\PythonCall\Nr75f\src\JlWrap\any.jl", line 258, in __call__
    return self._jl_callmethod($(pyjl_methodnum(pyjlany_call)), args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
juliacall.JuliaError: IOError: unlink("logs/log\\events.out.tfevents.1.733232925065e9.LY-000006"): resource busy or locked (EBUSY)
Stacktrace:
  [1] uv_error
    @ .\libuv.jl:106 [inlined]
  [2] unlink(p::String)
    @ Base.Filesystem .\file.jl:1105
  [3] rm(path::String; force::Bool, recursive::Bool)
    @ Base.Filesystem .\file.jl:283
  [4] rm(path::String; force::Bool, recursive::Bool)
    @ Base.Filesystem .\file.jl:294
  [5] rm
    @ .\file.jl:273 [inlined]
  [6] init_logdir(logdir::String, overwrite::TensorBoardLogger.InitPolicy)
    @ TensorBoardLogger C:\Users\x\.julia\packages\TensorBoardLogger\0nEI0\src\TBLogger.jl:81
  [7] TensorBoardLogger.TBLogger(logdir::String, overwrite::TensorBoardLogger.InitPolicy; time::Float64, prefix::String, purge_step::Nothing, step_increment::Int64, min_level::Base.CoreLogging.LogLevel)
    @ TensorBoardLogger C:\Users\x\.julia\packages\TensorBoardLogger\0nEI0\src\TBLogger.jl:60
  [8] TensorBoardLogger.TBLogger(logdir::String, overwrite::TensorBoardLogger.InitPolicy)
    @ TensorBoardLogger C:\Users\x\.julia\packages\TensorBoardLogger\0nEI0\src\TBLogger.jl:53
  [9] make_logger(log_dir::String, overwrite::Bool, log_interval::Int64)
    @ Main .\none:2
 [10] pyjlany_call(self::typeof(make_logger), args_::Py, kwargs_::Py)
    @ PythonCall.JlWrap C:\Users\x\.julia\packages\PythonCall\Nr75f\src\JlWrap\any.jl:43
 [11] _pyjl_callmethod(f::Any, self_::Ptr{PythonCall.C.PyObject}, args_::Ptr{PythonCall.C.PyObject}, nargs::Int64)
    @ PythonCall.JlWrap C:\Users\x\.julia\packages\PythonCall\Nr75f\src\JlWrap\base.jl:73
 [12] _pyjl_callmethod(o::Ptr{PythonCall.C.PyObject}, args::Ptr{PythonCall.C.PyObject})
    @ PythonCall.JlWrap.Cjl C:\Users\x\.julia\packages\PythonCall\Nr75f\src\JlWrap\C.jl:63

Extra Info

No response

@BrotherHa BrotherHa added the bug Something isn't working label Dec 3, 2024
@MilesCranmer
Copy link
Owner

MilesCranmer commented Dec 3, 2024

Thanks for this solution! Indeed it seems like the logger is not being closed. Do you want to add your workaround in a pull request? I think we can have a new method:

    @abstractmethod
    def close(self, logger):
        pass

within the AbstractLoggerSpec. Then in TensorBoardLoggerSpec, you would put the Julia code (see the other Julia code in logger_specs.py and expression_specs.py to see examples).

And in PySRRegressor we would just add it right after this line:

PySR/pysr/sr.py

Line 2060 in e11f824

self.logger_spec.write_hparams(logger, self.get_params())

Like:

 self.logger_spec.write_hparams(logger, self.get_params()) 
 self.logger_spec.close(logger)

Then it will work for future logger types as well (e.g., if we ever add Wandb.jl)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants