Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if file exists before sorting them into a list #784

Merged
merged 4 commits into from
Feb 11, 2025

Conversation

kush-gupt
Copy link
Contributor

@kush-gupt kush-gupt commented Feb 11, 2025

A dangling symlink can cause ramalama ls to fail with:

$ ramalama ls            
Error: [Errno 2] No such file or directory: 'ollama/qwen:1.8b

So checking to see if the file exists before appending it into a list to be sorted fixes this:

$ ramalama ls
NAME                       MODIFIED     SIZE    
ollama://starcoder2:3b     1 hour ago   1.59 GB 
ollama://deepseek-r1:32b   12 hours ago 18.49 GB
ollama://qwen2.5-coder:32b 12 hours ago 18.49 GB

Root cause could be solved by creating a hard link instead of symbolic from to ollama model blob to the ramalama blob store

Summary by Sourcery

Bug Fixes:

  • Fix an error where a dangling symlink could cause ramalama ls to fail.

Copy link
Contributor

sourcery-ai bot commented Feb 11, 2025

Reviewer's Guide by Sourcery

This pull request introduces a safety check before adding file paths to the list used for sorting by modification date, preventing errors caused by dangling symlinks or non-existent files.

Flow Diagram for File Existence Check in List Files Operation

flowchart TD
    A[Start: Iterate over file paths] --> B{Does file exist?}
    B -- Yes --> C[Append file path to list]
    B -- No --> D[Output error: path does not exist]
    C --> E[Continue processing]
    D --> E
    E --> F[Sort list by modification date]
    F --> G[Return sorted list]
Loading

File-Level Changes

Change Details Files
Added a file existence check before appending to the list.
  • Introduced a conditional statement to verify that a file exists using os.path.exists.
  • Prevented the appending of nonexistent file paths to the models list.
  • Eliminated failure due to dangling symlinks during the file listing process.
ramalama/cli.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kush-gupt - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider logging when a file is skipped due to not existing, to aid in debugging.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@kush-gupt
Copy link
Contributor Author

Fixes the issue reproduced here: #782 (comment)

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 11, 2025

Btw, if we get to the symlink -> hardlink change, that's gonna require a significant rewrite. The logic relies on following symlinks to remove all occurrences of a model.

I wonder should we auto-cleanup broken symlinks somewhere to stop them building up.

The whole :latest tag thing being an alias to another tag needs to be taken into account. If we delete one tag, we shouldn't necessarily delete the backing file.

We can change to hardlinks, but hardlinks don't follow each other, we have to search the filesystem for the other inodes.

@kush-gupt
Copy link
Contributor Author

In the context of where the other inodes are located, it should be just the ollama and hf caches right? So the user would have to ollama/hf rm them separately to completely wipe the model from their filesystem but that's also what we'd expect right?

Scenario I see is: User pulls some model with ramalama, it's nice to load from cache but it's still on the source cache/tooling to manage the original model?

@rhatdan
Copy link
Member

rhatdan commented Feb 11, 2025

I was thinking of creating a hard link to a file in the same destination to where ramalama pull ollaman:foobar puts it now.

So you would still have a symlink, but instead of a symlink to the file under OLLAMA it would be a symlink to a hardlink file under $HOME/.local/share/ramalama

@rhatdan
Copy link
Member

rhatdan commented Feb 11, 2025

With this change does ramalama rm MODEL with broken symlink report MODEL does not exists? I want ramalama rm MODEL to remove broken symlink.

@kush-gupt
Copy link
Contributor Author

ramalama rm works to remove the broken symlink:

$ ramalama rm qwen:1.8b
Untagged: qwen:1.8b
Deleted: sha256:1296b084ed6bc4c6eaee99255d73e9c715d38e0087b6467fd1c498b908180614
Deleted: sha256:9ece4a97bfb61bdb539531db5584fa119ad55684281d8a2d864339ae3fdd6c15

$ ls $HOME/.local/share/ramalama/models/ollama/
deepseek-r1:32b		qwen2.5-coder:32b	starcoder2:3b

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 11, 2025

With this change does ramalama rm MODEL with broken symlink report MODEL does not exists? I want ramalama rm MODEL to remove broken symlink.

I'm not sure there is a case where broken symlink happens. I suspect @kush-gupt just created a dangling symlink.

I just want to take a moment to describe why hardlinks have different disadvantages. The problem with hardlinks is the Ollama structure:

                  ----> modela:latest
sha256:somechecksum 
                  ----> modela:8b

8b and latest are pointing to the same file, both just symlinks for de-duplication purposes.

If we make things hardlinks, we have to do a lot of searching for the other linked files, it will be expensive. We don't really gain the automatic reference counting cleanup capabilities of hardlinks because we don't refer to models by their checksum (and we have to find that checksum somehow, with symlink it's easy because you are pointed to that SHA, when it's a hardlink we aren't pointed to any sha)

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 11, 2025

IIRC the "ramalama rm" does a decent job of trying to delete orphaned things, which is why here we see two sha256's deleted in @kush-gupt 's output

@kush-gupt
Copy link
Contributor Author

Indeed, I had that broken symlink created after I did ollama rm qwen:1.8b after I had already done a ramalama pull qwen:1.8b

@rhatdan
Copy link
Member

rhatdan commented Feb 11, 2025

Ok I think I am ok with just symlink as long as RamaLama handles it. I think we also create a symlink to a model stored as a file.

file:///tmp/model.gguf for example.

@rhatdan
Copy link
Member

rhatdan commented Feb 11, 2025

I would like to see a test for this though.

Create a model with file://$TEMPDIR/model.gguf Show it in list. Remove $TEMPDIR/model.gguf
Now see if the list sees it. Then attempt to remove it and make sure the rm succeeds.

@kush-gupt
Copy link
Contributor Author

Added a call to remove if a broken symlink is detected when listing models:

$ ollama ls
NAME                                   ID              SIZE      MODIFIED     
nomic-embed-text:latest                0a109f422b47    274 MB    8 months ago  
  
$ ramalama ls 
NAME                       MODIFIED     SIZE    
ollama://starcoder2:3b     7 hours ago  1.59 GB 
ollama://deepseek-r1:32b   17 hours ago 18.49 GB
ollama://qwen2.5-coder:32b 17 hours ago 18.49 GB

$ ramalama pull nomic-embed-text:latest 

$ ollama rm nomic-embed-text:latest 
deleted 'nomic-embed-text:latest'

kugupta@kugupta-mac ramalama % ramalama ls                          
Broken symlink found in: /Users/kugupta/.local/share/ramalama/models/ollama/nomic-embed-text:latest 
Attempting removal
Untagged: nomic-embed-text:latest
Deleted: sha256:970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
NAME                       MODIFIED     SIZE    
ollama://starcoder2:3b     7 hours ago  1.59 GB 
ollama://deepseek-r1:32b   17 hours ago 18.49 GB
ollama://qwen2.5-coder:32b 18 hours ago 18.49 GB

@rhatdan
Copy link
Member

rhatdan commented Feb 11, 2025

LGTM

Copy link
Collaborator

@maxamillion maxamillion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maxamillion maxamillion merged commit d9e6630 into containers:main Feb 11, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants