You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And attempt to download any file from GCS using pd.read_parquet("gs://..").
For example, the file gs://gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_ANG.txt is publicly available so we can use it as a test case. Even though it is not valid parquet, it will crash on the reported error before complaining about the file format.
The file was able to be downloaded successfully (or, in the above test case, should crash with OSError: Could not open parquet input source ... Either the file is corrupted or this is not a parquet file.)
📋 Logs/tracebacks
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/steve/venv/lib/python3.8/site-packages/pandas/io/parquet.py", line 317, in read_parquetreturn impl.read(path, columns=columns, **kwargs)
File "/Users/steve/venv/lib/python3.8/site-packages/pandas/io/parquet.py", line 141, in read
result =self.api.parquet.read_table(
File "/Users/steve/venv/lib/python3.8/site-packages/pyarrow/parquet.py", line 1607, in read_table
dataset = _ParquetDatasetV2(
File "/Users/steve/venv/lib/python3.8/site-packages/pyarrow/parquet.py", line 1439, in __init__if filesystem.get_file_info(path).is_file:
File "pyarrow/_fs.pyx", line 438, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1004, in pyarrow._fs._cb_get_file_info
File "/Users/steve/venv/lib/python3.8/site-packages/pyarrow/fs.py", line 195, in get_file_info
info =self.fs.info(path)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapperreturn maybe_sync(func, self, *args, **kwargs)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_syncreturn sync(loop, func, *args, **kwargs)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in syncraise exc.with_traceback(tb)
File "/Users/steve/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f
result[0] =await future
File "/Users/steve/venv/lib/python3.8/site-packages/gcsfs/core.py", line 781, in _inforeturnawaitself._get_object(path)
File "/Users/steve/venv/lib/python3.8/site-packages/gcsfs/core.py", line 576, in _get_object
bucket, awaitself._call("GET", "b/{}/o/{}", bucket, key, json_out=True)
File "/Users/steve/venv/lib/python3.8/site-packages/gcsfs/core.py", line 487, in _callasyncwithself.session.request(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/client.py", line 1083, in __aenter__self._resp =awaitself._coro
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/client.py", line 490, in _request
conn =awaitself._connector.connect(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 528, in connect
proto =awaitself._create_connection(req, traces, timeout)
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 868, in _create_connection
_, proto =awaitself._create_direct_connection(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 1023, in _create_direct_connectionraise last_exc
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 999, in _create_direct_connection
transp, proto =awaitself._wrap_create_connection(
File "/Users/steve/venv/lib/python3.8/site-packages/aiohttp/connector.py", line 948, in _wrap_create_connectionraise ClientConnectorCertificateError(
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host www.googleapis.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1108)')]
📋 Your version of the Python
$ python --versionPython 3.8.2
📋 Your version of the aiohttp/yarl/multidict distributions
🐞 Describe the bug
When reading parquet files from Google Cloud Storage using Pandas and aiohttp==3.7.0, the following error is thrown:
💡 To Reproduce
And attempt to download any file from GCS using
pd.read_parquet("gs://..")
.For example, the file
gs://gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_ANG.txt
is publicly available so we can use it as a test case. Even though it is not valid parquet, it will crash on the reported error before complaining about the file format.💡 Expected behavior
The file was able to be downloaded successfully (or, in the above test case, should crash with
OSError: Could not open parquet input source ... Either the file is corrupted or this is not a parquet file.
)📋 Logs/tracebacks
📋 Your version of the Python
📋 Your version of the aiohttp/yarl/multidict distributions
📋 Additional context
Downgrading aiohttp to 3.6.3 fixes the issue
Gives
The text was updated successfully, but these errors were encountered: