Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrated external files (from dj 0.11.x) are not accessible with newer dj #1037

Closed
ecobost opened this issue Jun 19, 2022 · 2 comments · Fixed by #1186
Closed

Migrated external files (from dj 0.11.x) are not accessible with newer dj #1037

ecobost opened this issue Jun 19, 2022 · 2 comments · Fixed by #1186

Comments

@ecobost
Copy link

ecobost commented Jun 19, 2022

I migrated the external of an older schema to dj 0.12.0, but was not able to fetch any of the data. By default, it tries to search the blob in the new folder structure (/xx/xx/uuid and so on) but files in the older style are all listed in a single folder.
I believe this is supposed to be taken care by the try-catch here:

blob = self._download_buffer(self._make_uuid_path(uuid))
but actually when _download_buffer fails to find a file it does not raise a MissingExternalFile exception but a FileNotFound exception so it is not catched by the except block and the code to read the blob from the filepath as in 0.11.x (l 205 - l 218) is never executed.

Here's what the error trace looks like:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [4], in <module>
----> 1 (data.Responses.PerImage() & k).fetch('response')

File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:229, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    227 attributes = [a for a in attrs if not is_key(a)]
    228 ret = self._expression.proj(*attributes)
--> 229 ret = ret.fetch(
    230     offset=offset,
    231     limit=limit,
    232     order_by=order_by,
    233     as_dict=False,
    234     squeeze=squeeze,
    235     download_path=download_path,
    236     format="array",
    237 )
    238 if attrs_as_dict:
    239     ret = [
    240         {k: v for k, v in zip(ret.dtype.names, x) if k in attrs}
    241         for x in ret
    242     ]

File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:289, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    286     raise e
    287 for name in heading:
    288     # unpack blobs and externals
--> 289     ret[name] = list(map(partial(get, heading[name]), ret[name]))
    290 if format == "frame":
    291     ret = pandas.DataFrame(ret).set_index(heading.primary_key)

File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:111, in _get(connection, attr, data, squeeze, download_path)
    103         safe_write(local_filepath, data.split(b"\0", 1)[1])
    104     return adapt(str(local_filepath))  # download file from remote store
    106 return adapt(
    107     uuid.UUID(bytes=data)
    108     if attr.uuid
    109     else (
    110         blob.unpack(
--> 111             extern.get(uuid.UUID(bytes=data)) if attr.is_external else data,
    112             squeeze=squeeze,
    113         )
    114         if attr.is_blob
    115         else data
    116     )
    117 )

File /usr/local/lib/python3.8/dist-packages/datajoint/external.py:203, in ExternalTable.get(self, uuid)
    201 if blob is None:
    202     try:
--> 203         blob = self._download_buffer(self._make_uuid_path(uuid))
    204     except MissingExternalFile:
    205         if not SUPPORT_MIGRATED_BLOBS:

File /usr/local/lib/python3.8/dist-packages/datajoint/external.py:144, in ExternalTable._download_buffer(self, external_path)
    142     return self.s3.get(external_path)
    143 if self.spec["protocol"] == "file":
--> 144     return Path(external_path).read_bytes()
    145 assert False

File /usr/lib/python3.8/pathlib.py:1207, in Path.read_bytes(self)
   1203 def read_bytes(self):
   1204     """
   1205     Open the file in bytes mode, read it, and close the file.
   1206     """
-> 1207     with self.open(mode='rb') as f:
   1208         return f.read()

File /usr/lib/python3.8/pathlib.py:1200, in Path.open(self, mode, buffering, encoding, errors, newline)
   1198 if self._closed:
   1199     self._raise_closed()
-> 1200 return io.open(self, mode, buffering, encoding, errors, newline,
   1201                opener=self._opener)

File /usr/lib/python3.8/pathlib.py:1054, in Path._opener(self, name, flags, mode)
   1052 def _opener(self, name, flags, mode=0o666):
   1053     # A stub for the opener argument to built-in open()
-> 1054     return self._accessor.open(self, flags, mode)

FileNotFoundError: [Errno 2] No such file or directory: '/external/neuro-static/neurostatic_dec_data/1b/70/1b70bd9bdadc2cee3a0e1532e7ea5b69'
@ecobost
Copy link
Author

ecobost commented Jun 20, 2022

here's a straightforward way to solve it
atlab@6a0f2c1
they catch the FileNotFoundError and raise the MissingExternalFile one. Works well.

@dimitri-yatsenko
Copy link
Member

@ecobost Thanks for finding this solution. Would you like to submit a PR?

We do not know many groups needing to migrate from 0.11 and may not be able to add functionality to handle all corner cases and tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants