Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add symlinks to data files instead of copying #583

Merged
merged 11 commits into from
Feb 28, 2023

Conversation

suryatejreddy
Copy link
Collaborator

@suryatejreddy suryatejreddy commented Feb 3, 2023

This PR adds a major change (with little code).

Current Behavior

We copy the dataset file that the user loads into the eva_datasets folder.

Change

This PR modifies this behavior to add a symlink from eva_datasets to the file on the user's filesystem. In case the user later deletes the file, an appropriate error is thrown.

Testing

  • Added test cases for individual and multiple file loads.
  • Added test case for testing correct exception thrown if the file is removed from user's system.

@suryatejreddy suryatejreddy changed the title feat: add symlinks instead of copying feat: add symlinks instead of copying videos Feb 3, 2023
@suryatejreddy suryatejreddy changed the title feat: add symlinks instead of copying videos feat: Add symlinks to data files instead of copying Feb 23, 2023
@suryatejreddy suryatejreddy marked this pull request as ready for review February 23, 2023 07:30
Copy link
Member

@gaurav274 gaurav274 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify this also fixes #563? Thanks!

@suryatejreddy
Copy link
Collaborator Author

suryatejreddy commented Feb 23, 2023

@gaurav274 This doesn't throw an error while trying to load a missing file. But it throws the dataset missing file exception when we try to run a SELECT on a missing file.

file_name = "dog.png"
cursor.execute('LOAD IMAGE "' + file_name + '" INTO MyImages;')
response = cursor.fetch_all()
print(response)
>>>
@status: ResponseStatus.SUCCESS
@batch: 
                            0
0  Number of loaded IMAGE: 0
@query_time: 0.04345816600000063
cursor.execute("""SELECT data
                FROM MyImages""")
response = cursor.fetch_all()
print(response)
>>>
@status: ResponseStatus.FAIL
@batch: 
 None
@error: Dataset file not found. Please check the files still exists at the original path.

I can maybe add a similar exception when you load a missing file.

@gaurav274
Copy link
Member

Okay, so it does say 0 files loaded. We can discuss, what the behavior should be if some files are missing while loading and selecting. Btw, in this case if no file was loaded, why does it say missing file? Shouldn't it just execute with empty output?

@suryatejreddy
Copy link
Collaborator Author

suryatejreddy commented Feb 24, 2023

Okay, so it does say 0 files loaded. We can discuss, what the behavior should be if some files are missing while loading and selecting. Btw, in this case if no file was loaded, why does it say missing file? Shouldn't it just execute with empty output?

It says missing file during select because the original dataset file doesn't exist. That is the check that I have implemented as of now. I think if files are missing while loading we should raise an immediate error there.

@gaurav274
Copy link
Member

Can we merge this PR, or is there anything remaining?

@suryatejreddy suryatejreddy merged commit 0767047 into georgia-tech-db:master Feb 28, 2023
@jarulraj jarulraj mentioned this pull request Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants