Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐕 Batch: Revisiting Session Management #1511

Open
1 of 3 tasks
DhanshreeA opened this issue Jan 15, 2025 · 2 comments
Open
1 of 3 tasks

🐕 Batch: Revisiting Session Management #1511

DhanshreeA opened this issue Jan 15, 2025 · 2 comments
Labels
medium Useful issue or future roadmap, and needs attention

Comments

@DhanshreeA
Copy link
Member

DhanshreeA commented Jan 15, 2025

Summary

Session management in the Ersilia CLI has introduced the convenience to run multiple simultaneous models across different terminal "sessions", thereby increasing user productivity. However, the way Ersilia handles sessions currently, wherein a session is linked to a terminal process makes creates several complications especially in the following situations:

  1. Session directories (typically found in ~/eos/sessions ) do not get cleared up if they contain files that don't have write permissions, meaning they don't get successfully deleted, leaving behind the session directory as well.
  2. If a terminal is accidentally closed, or if a user's system crashes, or the docker engine crashes (in case of using Dockerized models), the session directories tend to remain on the system.
  3. Then there is the case of not being able to simultaneously run more than one session with the same model 🐕 Batch: Running the same model in parallel #1223

For the 1st case, we have identified issues with log files or temporary directories from model containers present in mounted storage on the system owned by the root user from the container, and having 700 permission. These files cannot be deleted by a regular user without running sudo, therefore these files do not get removed, causing the entire session directory to remain on disk.

Objective(s)

  • Clean up temporary log files from volume mounted docker containers - Addressed in Clean up temp files from model containers #1512
  • Figure out how to remove orphaned session folders present on the system.
  • Running same model in parallel sessions on the same system.

Documentation

No response

@DhanshreeA DhanshreeA added the medium Useful issue or future roadmap, and needs attention label Jan 15, 2025
@Abellegese
Copy link
Contributor

Hey @DhanshreeA here is what I encountered when running closing session:

  • First I serve the model and some session folder gets created, for instance session_3226. The folder contains
    - console.log current.log _logs logs eos-id.pid session.json

  • Then when executing the close command only eos-id.pid session.json removed other files and the folders remains.

  • I expected the close command to remove the session_3226 completely. Because when serving again other folder will be created and this will remain which is not good.

@GemmaTuron
Copy link
Member

GemmaTuron commented Feb 4, 2025

I confirm the session folders are never deleted and I can only resort to manually deleting them from time to time. I believe this should be easy to fix? I am referring to these kind of errors:

PermissionError: [Errno 13] Permission denied: '/home/gturon/eos/sessions/session_17939/_logs/tmp/ersilia-o9wneg9i'
A note on running several models in parallel sessions: this is not documented anywhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium Useful issue or future roadmap, and needs attention
Projects
Status: On Hold
Development

No branches or pull requests

3 participants