Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bc2pg fails accessing bc-environmental-monitoring-locations due to UnboundLocalError #146

Closed
NathanEne opened this issue Oct 11, 2023 · 6 comments

Comments

@NathanEne
Copy link
Contributor

When running the following command: bcdata bc2pg bc-environmental-monitoring-locations using --db_url,--schema, --table options the following error is returned:

Traceback (most recent call last):
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/bcdata/bc2pg.py", line 43, in _download
    return data
UnboundLocalError: local variable 'data' referenced before assignment

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/venvs/maintenance/bin/bcdata", line 8, in <module>
    sys.exit(cli())
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1126, in __call__
    return self.main(*args, **kwargs)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1051, in main
    rv = self.invoke(ctx)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 1393, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/click/core.py", line 752, in invoke
    return __callback(*args, **kwargs)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/bcdata/cli.py", line 378, in bc2pg
    out_table = bcdata.bc2pg(
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/bcdata/bc2pg.py", line 144, in bc2pg
    df = _download(url)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/venvs/maintenance/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7fe5f9c4d660 state=finished raised UnboundLocalError>]

this issue is encountered on version 0.8.1.post1:

$ bcdata --version
0.8.1.post1
@smnorris
Copy link
Owner

smnorris commented Oct 11, 2023

I expect this error is due to requests timing out. Testing now and the service is very sluggish and I get the same error.

#143 notes how error messages need to be more informative when requests time out.

@smnorris
Copy link
Owner

smnorris commented Oct 11, 2023

I'm not sure what is going on, requests for other layers with >100k records currently work fine.

But this one (only ~40k records) is very slow, and consistently times out / returns a 502 error on the second request.
Is there anything unique about this table?

$ bcdata cat WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW -vv > /dev/null
2023-10-11 15:11:35,036:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,110:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,192:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,263:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,341:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,399:DEBUG:root: building WFS http://openmaps.gov.bc.ca/geo/pub/ows
2023-10-11 15:11:35,489:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:11:37,855:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&resultType=hits&outputFormat=json HTTP/1.1" 200 None
2023-10-11 15:11:37,861:DEBUG:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&resultType=hits&outputFormat=json
2023-10-11 15:11:37,862:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:11:36 GMT', 'Server': 'Apache', 'X-Frame-Options': 'allow-from (null)', 'X-Control-flow-delay-ms': '0', 'Content-Disposition': 'inline; filename=geoserver-GetFeature.text', 'Content-Type': 'text/xml', 'Set-Cookie': 'GS_FLOW_CONTROL=GS_CFLOW_6a64218f:18b20ca91ea:-8a4', 'Vary': 'Accept-Encoding', 'Access-Control-Allow-Origin': '(null)', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'POST, GET, OPTIONS, HEAD', 'Access-Control-Allow-Headers': 'X-Requested-With, Referer, Origin, Content-Type, SOAPAction, Authorization, Accept', 'Access-Control-Max-Age': '1000', 'X-Geo-Cache': 'miss', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked'}
2023-10-11 15:11:37,866:INFO:bcdata.wfs: {'start_time': 0.911322208, 'attempt_number': 1, 'idle_for': 0}
2023-10-11 15:11:37,866:INFO:bcdata.wfs: Total features requested: 41452
2023-10-11 15:11:37,874:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:11:42,797:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=0&count=10000 HTTP/1.1" 200 None
2023-10-11 15:14:07,092:INFO:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=0&count=10000
2023-10-11 15:14:07,093:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:11:42 GMT', 'Server': 'Apache', 'X-Frame-Options': 'allow-from (null)', 'X-Control-flow-delay-ms': '0', 'Content-Disposition': 'inline; filename=geoserver-GetFeature.application', 'Content-Type': 'application/json;charset=UTF-8', 'Set-Cookie': 'GS_FLOW_CONTROL=GS_CFLOW_4d2859f9:18b20caa96d:-47af', 'Access-Control-Allow-Origin': '(null)', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'POST, GET, OPTIONS, HEAD', 'Access-Control-Allow-Headers': 'X-Requested-With, Referer, Origin, Content-Type, SOAPAction, Authorization, Accept', 'Access-Control-Max-Age': '1000', 'X-Geo-Cache': 'miss', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked'}
2023-10-11 15:14:07,337:INFO:bcdata.wfs: {'start_time': 3.300790125, 'attempt_number': 1, 'idle_for': 0}
2023-10-11 15:14:07,340:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:15:07,887:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000 HTTP/1.1" 502 341
2023-10-11 15:15:07,900:INFO:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000
2023-10-11 15:15:07,900:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:14:07 GMT', 'Server': 'Apache', 'Content-Length': '341', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=iso-8859-1'}
2023-10-11 15:15:07,900:DEBUG:bcdata.wfs: WFS/network error
2023-10-11 15:15:08,307:DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): openmaps.gov.bc.ca:443
2023-10-11 15:16:08,898:DEBUG:urllib3.connectionpool: https://openmaps.gov.bc.ca:443 "GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000 HTTP/1.1" 502 341
2023-10-11 15:16:08,905:INFO:bcdata.wfs: https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000
2023-10-11 15:16:08,905:DEBUG:bcdata.wfs: {'Date': 'Wed, 11 Oct 2023 22:15:08 GMT', 'Server': 'Apache', 'Content-Length': '341', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=iso-8859-1'}
2023-10-11 15:16:08,905:DEBUG:bcdata.wfs: WFS/network error
Traceback (most recent call last):
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/Users/snorris/Projects/repo/bcdata/bcdata/wfs.py", line 171, in _request_features
    return features
UnboundLocalError: local variable 'features' referenced before assignment

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/bin/bcdata", line 8, in <module>
    sys.exit(cli())
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/snorris/Projects/repo/bcdata/bcdata/cli.py", line 278, in cat
    for feat in bcdata.get_features(
  File "/Users/snorris/Projects/repo/bcdata/bcdata/wfs.py", line 440, in get_features
    for feature in self._request_features(url):
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/Users/snorris/Projects/repo/bcdata/bcdata_env/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x132273dc0 state=finished raised UnboundLocalError>]

@smnorris
Copy link
Owner

Downloading manually as gpkg via the download service is near instant, all 41452 records returned, all are points.
It is a simple dataset - I don't know what the issue is.

$ ogrinfo EMS_MONITORING_LOCN_TYPES_SVW.gpkg WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW -so
INFO: Open of `EMS_MONITORING_LOCN_TYPES_SVW.gpkg'
      using driver `GPKG' successful.

Layer name: WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW
Geometry: Point
Feature Count: 41452

@smnorris
Copy link
Owner

same result with curl:

$ curl -v "https://openmaps.gov.bc.ca/geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000"
*   Trying 142.34.140.41:443...
* Connected to openmaps.gov.bc.ca (142.34.140.41) port 443 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: C=CA; ST=British Columbia; L=Victoria; O=Government of the Province of British Columbia; CN=*.data.gov.bc.ca
*  start date: Sep 28 22:33:08 2022 GMT
*  expire date: Oct 16 22:33:08 2023 GMT
*  subjectAltName: host "openmaps.gov.bc.ca" matched cert's "openmaps.gov.bc.ca"
*  issuer: C=US; O=Entrust, Inc.; OU=See www.entrust.net/legal-terms; OU=(c) 2012 Entrust, Inc. - for authorized use only; CN=Entrust Certification Authority - L1K
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /geo/pub/wfs?service=WFS&version=2.0.0&request=GetFeature&typeName=WHSE_ENVIRONMENTAL_MONITORING.EMS_MONITORING_LOCN_TYPES_SVW&outputFormat=json&SRSNAME=epsg%3A4326&sortby=OBJECTID&startIndex=10000&count=10000 HTTP/1.1
> Host: openmaps.gov.bc.ca
> User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 502 Proxy Error
< Date: Wed, 11 Oct 2023 22:33:25 GMT
< Server: Apache
< Content-Length: 341
< Connection: close
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p>
</body></html>
* Closing connection 0

@smnorris
Copy link
Owner

Sorting by MONITORING_LOCATION_ID resolves the issue.

Because there is no interface to determine primary key, bcdata makes some guesses and will default to OBJECT_ID. Something weird seems to be going on at the server when trying to sort that dataset by OBJECT_ID.

This works:
bcdata bc2pg --sortby monitoring_location_id bc-environmental-monitoring-locations

I'd recommend taking a look at the schema before trying downloads:
bcdata info <dataset> --indent 2

There will generally (but definitely not always) be a unique key noted. When using bc2pg, you can also set that column as the primary key in the output table with the --primary_key option.

@NathanEne
Copy link
Contributor Author

Thank you that worked excellently. I appreciate the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants