Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with core data #166

Closed
knaaptime opened this issue Nov 19, 2019 · 19 comments
Closed

issues with core data #166

knaaptime opened this issue Nov 19, 2019 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@knaaptime
Copy link
Member

this is odd!

geosnap depends on census data stored in our quilt bucket. Currently, our CI can grab this data just fine. If you try and collect data from a fresh install of geosnap, however, you wont be able to pull any data down

In [3]: t = quilt3.Package.browse(
   ...:                     "census/tracts_cartographic", "s3://quilt-cgs"
   ...:                 )
---------------------------------------------------------------------------
QuiltException                            Traceback (most recent call last)
~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/session.py in _create_auth(timeout)
    100             try:
--> 101                 auth = _update_auth(auth['refresh_token'], timeout)
    102             except QuiltException as ex:

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/session.py in _update_auth(refresh_token, timeout)
     63     if response.status_code != requests.codes.ok:
---> 64         raise QuiltException("Authentication error: %s" % response.status_code)
     65

QuiltException: Authentication error: 401

During handling of the above exception, another exception occurred:

QuiltException                            Traceback (most recent call last)
<ipython-input-3-9e8a81e76a1d> in <module>
      1 t = quilt3.Package.browse(
----> 2                     "census/tracts_cartographic", "s3://quilt-cgs"
      3                 )

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/packages.py in browse(cls, name, registry, top_hash)
    386         else:
    387             pkg_timestamp_file = f'{registry}/.quilt/named_packages/{name}/latest'
--> 388             latest_pkg_hash, _ = get_bytes(pkg_timestamp_file)
    389             latest_pkg_hash = latest_pkg_hash.decode('utf-8').strip()
    390             pkg_manifest_uri = fix_url(f'{registry}/.quilt/packages/{quote(latest_pkg_hash)}')

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/data_transfer.py in get_bytes(src)
    656             params.update(dict(VersionId=src_version_id))
    657         s3_client = create_s3_client()
--> 658         resp = s3_client.get_object(**params)
    659         data = resp['Body'].read()
    660         meta = _parse_metadata(resp)

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358
    359         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    646         else:
    647             http, parsed_response = self._make_request(
--> 648                 operation_model, request_dict, request_context)
    649
    650         self.meta.events.emit(

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/client.py in _make_request(self, operation_model, request_dict, request_context)
    665     def _make_request(self, operation_model, request_dict, request_context):
    666         try:
--> 667             return self._endpoint.make_request(operation_model, request_dict)
    668         except Exception as e:
    669             self.meta.events.emit(

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/endpoint.py in make_request(self, operation_model, request_dict)
    100         logger.debug("Making request for %s with params: %s",
    101                      operation_model, request_dict)
--> 102         return self._send_request(request_dict, operation_model)
    103
    104     def create_request(self, params, operation_model=None):

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/endpoint.py in _send_request(self, request_dict, operation_model)
    130     def _send_request(self, request_dict, operation_model):
    131         attempts = 1
--> 132         request = self.create_request(request_dict, operation_model)
    133         context = request_dict['context']
    134         success_response, exception = self._get_response(

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/endpoint.py in create_request(self, params, operation_model)
    114                 op_name=operation_model.name)
    115             self._event_emitter.emit(event_name, request=request,
--> 116                                      operation_name=operation_model.name)
    117         prepared_request = self.prepare_request(request)
    118         return prepared_request

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs)
    354     def emit(self, event_name, **kwargs):
    355         aliased_event_name = self._alias_event_name(event_name)
--> 356         return self._emitter.emit(aliased_event_name, **kwargs)
    357
    358     def emit_until_response(self, event_name, **kwargs):

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/hooks.py in emit(self, event_name, **kwargs)
    226                  handlers.
    227         """
--> 228         return self._emit(event_name, kwargs)
    229
    230     def emit_until_response(self, event_name, **kwargs):

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/hooks.py in _emit(self, event_name, kwargs, stop_on_response)
    209         for handler in handlers_to_call:
    210             logger.debug('Event %s: calling handler %s', event_name, handler)
--> 211             response = handler(**kwargs)
    212             responses.append((handler, response))
    213             if stop_on_response and response is not None:

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/signers.py in handler(self, operation_name, request, **kwargs)
     88         # this method is invoked to sign the request.
     89         # Don't call this method directly.
---> 90         return self.sign(operation_name, request)
     91
     92     def sign(self, operation_name, request, region_name=None,

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/signers.py in sign(self, operation_name, request, region_name, signing_type, expires_in, signing_name)
    147
    148             try:
--> 149                 auth = self.get_auth_instance(**kwargs)
    150             except UnknownSignatureVersionError as e:
    151                 if signing_type != 'standard':

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/signers.py in get_auth_instance(self, signing_name, region_name, signature_version, **kwargs)
    227         frozen_credentials = None
    228         if self._credentials is not None:
--> 229             frozen_credentials = self._credentials.get_frozen_credentials()
    230         kwargs['credentials'] = frozen_credentials
    231         if cls.REQUIRES_REGION:

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/credentials.py in get_frozen_credentials(self)
    589
    590         """
--> 591         self._refresh()
    592         return self._frozen_credentials
    593

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/credentials.py in _refresh(self)
    484                 is_mandatory_refresh = self.refresh_needed(
    485                     self._mandatory_refresh_timeout)
--> 486                 self._protected_refresh(is_mandatory=is_mandatory_refresh)
    487                 return
    488             finally:

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/credentials.py in _protected_refresh(self, is_mandatory)
    500         # the self._refresh_lock.
    501         try:
--> 502             metadata = self._refresh_using()
    503         except Exception as e:
    504             period_name = 'mandatory' if is_mandatory else 'advisory'

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/session.py in _refresh_credentials()
    221
    222 def _refresh_credentials():
--> 223     session = get_session()
    224     creds = session.get(
    225         "{url}/api/auth/get_credentials".format(

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/session.py in get_session(timeout)
    138     global _session
    139     if _session is None:
--> 140         auth = _create_auth(timeout)
    141         _session = _create_session(auth)
    142

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/session.py in _create_auth(timeout)
    102             except QuiltException as ex:
    103                 raise QuiltException(
--> 104                     "Failed to update the access token (%s). Run `quilt login` again." % ex
    105                 )
    106             contents[url] = auth

QuiltException: Failed to update the access token (Authentication error: 401). Run `quilt login` again.

I think this is because quilt has moved over to a different infrastructure and I need to migrate our packages. But, again, it's super strange that Travis has no problem (and part of the reason I didn't notice for so long)

I havent touched our quilt bucked in several months, so I'll need to investigate a bit. Though, I just logged into the quilt slack for the first time in months and judging from the conversations I missed with @akarve and @kevinemoore, it looks like i just need to move our packages into the newer opendata bucket

@knaaptime knaaptime added the bug Something isn't working label Nov 19, 2019
@knaaptime knaaptime self-assigned this Nov 19, 2019
@akarve
Copy link

akarve commented Nov 19, 2019

Is this client logged in? I am unable to reproduce the bug. the s3://quilt-cgs bucket is up and running (we would very much like to get you moved over to s3://spatial-ucr, but I don't think that's the root cause here).

This call works for me (with no credentials):

p = quilt3.Package.browse("census/tracts_cartographic", "s3://quilt-cgs") 

Is there something else I can try to repro this bug?

@akarve
Copy link

akarve commented Nov 20, 2019

This could be related to AWS credentials on your machine, assuming that's where you ran it. I would also try quilt3.logout. Since your bucket is public, anonymous reads should just work.

@knaaptime
Copy link
Member Author

thank you @akarve!

that worked on my machine, but the issue was brought to my attention by a student, so I'll wait for final confirmation.

Like I said, the CI worked fine, and the package was cached on my machine so I never noticed, but was admittedly very confused when I couldn't manage to solve the issue on the student's machine, then encountered the same issue in a fresh environment on two of my own.

quilt3.logout looks like it may work. Thanks again

This was referenced Nov 20, 2019
@knaaptime knaaptime reopened this Nov 20, 2019
@knaaptime
Copy link
Member Author

hm. looks like her issue is slightly different. Here's the trace

>>> import quilt3
>>> quilt3.logout()
Already logged out.
>>> import geosnap
/Users/suchitrap/geosnap/geosnap/_data.py:118: UserWarning: Unable to locate local census data. Streaming instead.
If you plan to use census data repeatedly you can store it locally with the data.store_census function for better performance
  "Unable to locate local census data. Streaming instead.\n"
Traceback (most recent call last):
  File "/Users/suchitrap/geosnap/geosnap/_data.py", line 115, in __init__
    from quilt3.data.census import tracts_cartographic, administrative
ImportError: cannot import name 'tracts_cartographic' from 'quilt3.data.census' (unknown location)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/suchitrap/geosnap/geosnap/__init__.py", line 28, in <module>
    from . import analyze
  File "/Users/suchitrap/geosnap/geosnap/analyze/__init__.py", line 1, in <module>
    from .analytics import cluster, cluster_spatial
  File "/Users/suchitrap/geosnap/geosnap/analyze/analytics.py", line 13, in <module>
    from .._data import _Map
  File "/Users/suchitrap/geosnap/geosnap/_data.py", line 469, in <module>
    datasets = DataStore()
  File "/Users/suchitrap/geosnap/geosnap/_data.py", line 124, in __init__
    "census/tracts_cartographic", "s3://quilt-cgs"
  File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/packages.py", line 398, in browse
    latest_pkg_hash, _ = get_bytes(pkg_timestamp_file)
  File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/data_transfer.py", line 680, in get_bytes
    resp = s3_client.get_object(**params)
  File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.

@akarve
Copy link

akarve commented Nov 20, 2019

I would guess that the user has a bad credentials file. You could try moving or renaming that file as an isolation. i.e. all calls to a public bucket should just work if the user is fully anonymous. The hard thing about the issue you're encountering is that boto3 (underneath Quilt) has a complex fallback pattern and keeps looking for credentials, and sometimes those credentials are bad (what I'm seeing above) or don't have access to the public bucket.

@suchitrapithavath
Copy link

I did not find ".aws" file in my home directory and iam using python 3 still facing the below issue.

(geosnap) suchitras-mbp:geosnap suchitrap$ python
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.

import geosnap
/Users/suchitrap/geosnap/geosnap/_data.py:118: UserWarning: Unable to locate local census data. Streaming instead.
If you plan to use census data repeatedly you can store it locally with the data.store_census function for better performance
"Unable to locate local census data. Streaming instead.\n"
Traceback (most recent call last):
File "/Users/suchitrap/geosnap/geosnap/_data.py", line 115, in init
from quilt3.data.census import tracts_cartographic, administrative
ImportError: cannot import name 'tracts_cartographic' from 'quilt3.data.census' (unknown location)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/Users/suchitrap/geosnap/geosnap/init.py", line 28, in
from . import analyze
File "/Users/suchitrap/geosnap/geosnap/analyze/init.py", line 1, in
from .analytics import cluster, cluster_spatial
File "/Users/suchitrap/geosnap/geosnap/analyze/analytics.py", line 13, in
from .._data import _Map
File "/Users/suchitrap/geosnap/geosnap/_data.py", line 469, in
datasets = DataStore()
File "/Users/suchitrap/geosnap/geosnap/_data.py", line 124, in init
"census/tracts_cartographic", "s3://quilt-cgs"
File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/packages.py", line 398, in browse
latest_pkg_hash, _ = get_bytes(pkg_timestamp_file)
File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/data_transfer.py", line 680, in get_bytes
resp = s3_client.get_object(**params)
File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/suchitrap/opt/anaconda3/envs/geosnap/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.

Please can anyone help me with this.

@akarve
Copy link

akarve commented Nov 20, 2019

@knaaptime Is ImportError: cannot import name 'tracts_cartographic' from 'quilt3.data.census', expected in streaming mode? i.e. is it expected that streaming mode will also install the package?

@suchitrapithavath which operating system and python version are you using?

@knaaptime
Copy link
Member Author

it looks for the package locally then falls back to streaming. I think the answer is no, in this case that wouldnt be expected to install the package.

From there I can add another handler for a QuiltException that raises a warning that the user should check for the ~.aws dir

@akarve
Copy link

akarve commented Nov 21, 2019

Sounds like ~/.aws is not the problem, per the user report. I think I'm still confused why the package import was tried at all in the streaming case? Because it would always fail?

@knaaptime
Copy link
Member Author

for convenience we have essentially a dataset class that provides access to our quilt datasets as methods

so if you call datasets.tracts_2000() and you have already installed the package, you'll get the local version, otherwise you get the Package.browse version.

the idea being that power users probably want the data available locally, so they have the option to cache the census data to their local machine (using the store_census function. On package import, we don't know ahead of time whether a user has stored the spatialucr package, so our dataset class first tries to import the local quilt package, but will fall back to streaming if it hasnt been installed.

On a fresh install, the first import attempt would always fail, but we don't know whether users have called store_census later.

@sjsrey
Copy link
Collaborator

sjsrey commented Nov 21, 2019

On a fresh install of geosnap, when working through the guide I'm hitting:

In [5]: datasets.tracts_2010()
---------------------------------------------------------------------------
QuiltException                            Traceback (most recent call last)
<ipython-input-5-62c0d1e98261> in <module>
----> 1 datasets.tracts_2010()

~/Dropbox/g/geosnap/git/geosnap/geosnap/_data.py in tracts_2010(self, states, convert)
    338 
    339         """
--> 340         t = self.tracts_cartographic["tracts_2010_500k.parquet"]()
    341         if states:
    342             t = t[t.geoid.str[:2].isin(states)]

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/packages.py in __call__(self, func, **kwargs)
    226         Shorthand for self.deserialize()
    227         """
--> 228         return self.deserialize(func=func, **kwargs)
    229 
    230 

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/packages.py in deserialize(self, func, **format_opts)
    190 
    191         # Verify hash before deserializing..
--> 192         self._verify_hash(data)
    193 
    194         return formats[0].deserialize(data, self._meta, pkey_ext, **format_opts)

~/anaconda3/envs/geosnap/lib/python3.7/site-packages/quilt3/packages.py in _verify_hash(self, read_bytes)
    129         digest = hashlib.sha256(read_bytes).hexdigest()
    130         if digest != self.hash.get('value'):
--> 131             raise QuiltException("Hash validation failed")
    132 
    133     def set(self, path=None, meta=None):

QuiltException: Hash validation failed

@akarve
Copy link

akarve commented Nov 21, 2019

I would like to repro but, at the start of the User Guide, this line is crashing the Jupyter Kernel.

from geosnap.data import Community
dc = Community.from_census(state_fips='11')

I am running the recommended installation procedure on conda from environment.yml.

I do notice an old-ish version of Quilt in conda list: quilt3 3.0.7 py37_0 conda-forge.

This is what the kernel bombing looks like:

[I 16:51:40.636 NotebookApp] Copying user-guide.ipynb to 
[I 16:51:41.241 NotebookApp] Kernel started: 85a72fcc-f50e-4dff-b3d3-02e32349ac6b
/Users/karve/anaconda3/envs/geosnap/bin/python: No module named plotkernel
[I 16:51:44.238 NotebookApp] KernelRestarter: restarting kernel (1/5), new random ports
/Users/karve/anaconda3/envs/geosnap/bin/python: No module named plotkernel
[I 16:51:47.248 NotebookApp] KernelRestarter: restarting kernel (2/5), new random ports
/Users/karve/anaconda3/envs/geosnap/bin/python: No module named plotkernel
[I 16:51:50.259 NotebookApp] KernelRestarter: restarting kernel (3/5), new random ports
/Users/karve/anaconda3/envs/geosnap/bin/python: No module named plotkernel
[I 16:51:53.266 NotebookApp] KernelRestarter: restarting kernel (4/5), new random ports
/Users/karve/anaconda3/envs/geosnap/bin/python: No module named plotkernel
[W 16:51:56.276 NotebookApp] KernelRestarter: restart failed
[W 16:51:56.277 NotebookApp] Kernel 85a72fcc-f50e-4dff-b3d3-02e32349ac6b died, removing from map.
[W 16:52:41.649 NotebookApp] Timeout waiting for kernel_info reply from 85a72fcc-f50e-4dff-b3d3-02e32349ac6b
[E 16:52:41.651 NotebookApp] Error opening stream: HTTP 404: Not Found (Kernel does not exist: 85a72fcc-f50e-4dff-b3d3-02e32349ac6b)
[W 16:52:43.826 NotebookApp] 404 GET /api/kernels/85a72fcc-f50e-4dff-b3d3-02e32349ac6b/channels?session_id=9be96ee5dd0445ae8de2f3f58e8e3154 (127.0.0.1): Kernel does not exist: 85a72fcc-f50e-4dff-b3d3-02e32349ac6b
[W 16:52:43.827 NotebookApp] 404 GET /api/kernels/85a72fcc-f50e-4dff-b3d3-02e32349ac6b/channels?session_id=9be96ee5dd0445ae8de2f3f58e8e3154 (127.0.0.1) 4.62ms referer=None
[W 16:52:46.820 NotebookApp] Replacing stale connection: 85a72fcc-f50e-4dff-b3d3-02e32349ac6b:9be96ee5dd0445ae8de2f3f58e8e3154
[I 16:53:41.529 NotebookApp] Saving file at /user-guide-Copy1.ipynb
[W 16:53:41.865 NotebookApp] Replacing stale connection: 07ac7c10-0ae5-4474-bc15-f94b0275862e:e2b06e54ff2b40ad82035461f84a5c31
[W 16:56:51.826 NotebookApp] Replacing stale connection: 85a72fcc-f50e-4dff-b3d3-02e32349ac6b:9be96ee5dd0445ae8de2f3f58e8e3154

@knaaptime
Copy link
Member Author

we just moved that up to the top level and i forgot to edit that section of the guide. You can do from geosnap import Community

@knaaptime
Copy link
Member Author

knaaptime commented Nov 21, 2019

I just updated the quilt version on conda-forge today as well, so new installs should pull quilt 3.1.5 on from here on, but might take an hour or two

@akarve
Copy link

akarve commented Nov 21, 2019

OK can you point me to the code sections where you specify the Quilt packages and versions that you interact with?

@akarve
Copy link

akarve commented Nov 21, 2019

also bombing for me: from geosnap import Community

@knaaptime
Copy link
Member Author

everything happens in this class. I'll need to look into why that import fails

@knaaptime
Copy link
Member Author

(again, the weird thing is CI is passing fine, and it tests both streaming and local versions of the packages)

@knaaptime knaaptime mentioned this issue Nov 27, 2019
4 tasks
@knaaptime
Copy link
Member Author

closing this as I've never been able to reproduce and the student who raised it hasn't run into it again. For good measure, we've also moved everything over to https://open.quiltdata.com/b/spatial-ucr anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants