Skip to content

Commit

Permalink
CI/TST: Fix bbox tests to avoid breaking when GEOS is not available (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
brendan-ward authored Sep 30, 2022
1 parent d171404 commit 04d5da8
Show file tree
Hide file tree
Showing 9 changed files with 96 additions and 43 deletions.
1 change: 1 addition & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ jobs:
python -m pip install --pre --find-links wheelhouse/artifact pyogrio
python -m pip list
# NOTE: GEOS is not available on macOS / Linux runners
- name: Run tests
shell: bash
run: |
Expand Down
63 changes: 35 additions & 28 deletions docs/source/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,50 +177,57 @@ with the bbox.

Note: the `bbox` values must be in the same CRS as the dataset.

Note: if GEOS is present and used by GDAL, only geometries that intersect `bbox`
will be returned; if GEOS is not available or not used by GDAL, all geometries
with bounding boxes that intersect this bbox will be returned.
`pyogrio.__gdal_geos_version__` will be `None` if GEOS is not detected.

## Execute a sql query

You can use the `sql` parameter to execute a sql query on a dataset.
You can use the `sql` parameter to execute a sql query on a dataset.

Depending on the dataset, you can use different sql dialects. By default, if
the dataset natively supports sql, the sql statement will be passed through
Depending on the dataset, you can use different sql dialects. By default, if
the dataset natively supports sql, the sql statement will be passed through
as such. Hence, the sql query should be written in the relevant native sql
dialect (e.g. [GeoPackage](https://gdal.org/drivers/vector/gpkg.html)/
[Sqlite](https://gdal.org/drivers/vector/sqlite.html),
[PostgreSQL](https://gdal.org/drivers/vector/pg.html)). If the datasource
doesn't natively support sql (e.g.
[ESRI Shapefile](https://gdal.org/drivers/vector/shapefile.html),
[FlatGeobuf](https://gdal.org/drivers/vector/flatgeobuf.html)), you can choose
between '[OGRSQL](https://gdal.org/user/ogr_sql_dialect.html#ogr-sql-dialect)'
(the default) and
'[SQLITE](https://gdal.org/user/sql_sqlite_dialect.html#sql-sqlite-dialect)'.
For SELECT statements the 'SQLITE' dialect tends to provide more spatial
features as all
[spatialite](https://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html)
[Sqlite](https://gdal.org/drivers/vector/sqlite.html),
[PostgreSQL](https://gdal.org/drivers/vector/pg.html)). If the datasource
doesn't natively support sql (e.g.
[ESRI Shapefile](https://gdal.org/drivers/vector/shapefile.html),
[FlatGeobuf](https://gdal.org/drivers/vector/flatgeobuf.html)), you can choose
between '[OGRSQL](https://gdal.org/user/ogr_sql_dialect.html#ogr-sql-dialect)'
(the default) and
'[SQLITE](https://gdal.org/user/sql_sqlite_dialect.html#sql-sqlite-dialect)'.
For SELECT statements the 'SQLITE' dialect tends to provide more spatial
features as all
[spatialite](https://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html)
functions can be used. If gdal is not built with spatialite support in SQLite,
you can use ``sql_dialect="INDIRECT_SQLITE"`` to be able to use spatialite
functions on native SQLite files like Geopackage.
you can use `sql_dialect="INDIRECT_SQLITE"` to be able to use spatialite
functions on native SQLite files like Geopackage.

You can combine a sql query with other parameters that will filter the
dataset. When using ``columns``, ``skip_features``, ``max_features``, and/or
``where`` it is important to note that they will be applied AFTER the sql
You can combine a sql query with other parameters that will filter the
dataset. When using `columns`, `skip_features`, `max_features`, and/or
`where` it is important to note that they will be applied AFTER the sql
statement, so these are some things you need to be aware of:
- if you specify an alias for a column in the sql statement, you need to
specify this alias when using the ``columns`` keyword.
- ``skip_features`` and ``max_features`` will be applied on the rows returned
by the sql query, not on the original dataset.

For the ``bbox`` parameter, depending on the combination of the dialect of the
- if you specify an alias for a column in the sql statement, you need to
specify this alias when using the `columns` keyword.
- `skip_features` and `max_features` will be applied on the rows returned
by the sql query, not on the original dataset.

For the `bbox` parameter, depending on the combination of the dialect of the
sql query and the dataset, a spatial index will be used or not, e.g.:
- ESRI Shapefile: spatial index is used with 'OGRSQL', not with 'SQLITE'.
- Geopackage: spatial index is always used.

The following sql query returns the 5 Western European countries with the most
- ESRI Shapefile: spatial index is used with 'OGRSQL', not with 'SQLITE'.
- Geopackage: spatial index is always used.

The following sql query returns the 5 Western European countries with the most
neighbours:

```python
>>> sql = """
SELECT geometry, name,
(SELECT count(*)
(SELECT count(*)
FROM ne_10m_admin_0_countries layer_sub
WHERE ST_Intersects(layer.geometry, layer_sub.geometry)) AS nb_neighbours
FROM ne_10m_admin_0_countries layer
Expand Down
5 changes: 4 additions & 1 deletion pyogrio/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,10 @@ def read_bounds(
Examples: ``"ISO_A3 = 'CAN'"``, ``"POP_EST > 10000000 AND POP_EST < 100000000"``
bbox : tuple of (xmin, ymin, xmax, ymax), optional (default: None)
If present, will be used to filter records whose geometry intersects this
box. This must be in the same CRS as the dataset.
box. This must be in the same CRS as the dataset. If GEOS is present
and used by GDAL, only geometries that intersect this bbox will be
returned; if GEOS is not available or not used by GDAL, all geometries
with bounding boxes that intersect this bbox will be returned.
Returns
-------
Expand Down
5 changes: 4 additions & 1 deletion pyogrio/geopandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,10 @@ def read_dataframe(
Examples: ``"ISO_A3 = 'CAN'"``, ``"POP_EST > 10000000 AND POP_EST < 100000000"``
bbox : tuple of (xmin, ymin, xmax, ymax) (default: None)
If present, will be used to filter records whose geometry intersects this
box. This must be in the same CRS as the dataset.
box. This must be in the same CRS as the dataset. If GEOS is present
and used by GDAL, only geometries that intersect this bbox will be
returned; if GEOS is not available or not used by GDAL, all geometries
with bounding boxes that intersect this bbox will be returned.
fids : array-like, optional (default: None)
Array of integer feature id (FID) values to select. Cannot be combined
with other keywords to select a subset (``skip_features``, ``max_features``,
Expand Down
5 changes: 4 additions & 1 deletion pyogrio/raw.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,10 @@ def read(
Examples: "ISO_A3 = 'CAN'", "POP_EST > 10000000 AND POP_EST < 100000000"
bbox : tuple of (xmin, ymin, xmax, ymax), optional (default: None)
If present, will be used to filter records whose geometry intersects this
box. This must be in the same CRS as the dataset.
box. This must be in the same CRS as the dataset. If GEOS is present
and used by GDAL, only geometries that intersect this bbox will be
returned; if GEOS is not available or not used by GDAL, all geometries
with bounding boxes that intersect this bbox will be returned.
fids : array-like, optional (default: None)
Array of integer feature id (FID) values to select. Cannot be combined
with other keywords to select a subset (`skip_features`, `max_features`,
Expand Down
4 changes: 4 additions & 0 deletions pyogrio/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ def prepare_testfile(testfile_path, dst_dir, ext):
# allow mixed Polygons/MultiPolygons type
meta["geometry_type"] = "Unknown"

elif ext == ".gpkg":
# For .gpkg, spatial_index=False to avoid the rows being reordered
meta["spatial_index"] = False

write(dst_path, geometry, field_data, **meta)
return dst_path

Expand Down
41 changes: 36 additions & 5 deletions pyogrio/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import pytest

from pyogrio import (
__gdal_version__,
__gdal_geos_version__,
list_drivers,
list_layers,
Expand Down Expand Up @@ -138,26 +139,56 @@ def test_read_bounds_bbox(naturalearth_lowres_all_ext):
assert fids.shape == (0,)
assert bounds.shape == (4, 0)

fids, bounds = read_bounds(naturalearth_lowres_all_ext, bbox=(-140, 20, -100, 45))
fids, bounds = read_bounds(naturalearth_lowres_all_ext, bbox=(-85, 8, -80, 10))

assert fids.shape == (2,)
if naturalearth_lowres_all_ext.suffix == ".gpkg":
# fid in gpkg is 1-based
assert array_equal(fids, [5, 28]) # USA, MEX
assert array_equal(fids, [34, 35]) # PAN, CRI
else:
# fid in other formats is 0-based
assert array_equal(fids, [4, 27]) # USA, MEX
assert array_equal(fids, [33, 34]) # PAN, CRI

assert bounds.shape == (4, 2)
assert allclose(
bounds.T,
[
[-171.791111, 18.916190, -66.964660, 71.357764],
[-117.127760, 14.538829, -86.811982, 32.720830],
[-82.96578305, 7.22054149, -77.24256649, 9.61161001],
[-85.94172543, 8.22502798, -82.54619626, 11.21711925],
],
)


@pytest.mark.skipif(
__gdal_version__ < (3, 4, 0),
reason="Cannot determine if GEOS is present or absent for GDAL < 3.4",
)
def test_read_bounds_bbox_intersects_vs_envelope_overlaps(naturalearth_lowres_all_ext):
# If GEOS is present and used by GDAL, bbox filter will be based on intersection
# of bbox and actual geometries; if GEOS is absent or not used by GDAL, it
# will be based on overlap of bounding boxes instead
fids, _ = read_bounds(naturalearth_lowres_all_ext, bbox=(-140, 20, -100, 45))

if __gdal_geos_version__ is None:
# bboxes for CAN, RUS overlap but do not intersect geometries
assert fids.shape == (4,)
if naturalearth_lowres_all_ext.suffix == ".gpkg":
# fid in gpkg is 1-based
assert array_equal(fids, [4, 5, 19, 28]) # CAN, USA, RUS, MEX
else:
# fid in other formats is 0-based
assert array_equal(fids, [3, 4, 18, 27]) # CAN, USA, RUS, MEX

else:
assert fids.shape == (2,)
if naturalearth_lowres_all_ext.suffix == ".gpkg":
# fid in gpkg is 1-based
assert array_equal(fids, [5, 28]) # USA, MEX
else:
# fid in other formats is 0-based
assert array_equal(fids, [4, 27]) # USA, MEX


def test_read_info(naturalearth_lowres):
meta = read_info(naturalearth_lowres)

Expand Down
11 changes: 6 additions & 5 deletions pyogrio/tests/test_geopandas_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,10 @@ def test_read_bbox(naturalearth_lowres_all_ext):
df = read_dataframe(naturalearth_lowres_all_ext, bbox=(0, 0, 0.00001, 0.00001))
assert len(df) == 0

df = read_dataframe(naturalearth_lowres_all_ext, bbox=(-140, 20, -100, 45))
df = read_dataframe(naturalearth_lowres_all_ext, bbox=(-85, 8, -80, 10))
assert len(df) == 2
assert np.array_equal(df.iso_a3, ["USA", "MEX"])

assert np.array_equal(df.iso_a3, ["PAN", "CRI"])


def test_read_fids(naturalearth_lowres_all_ext):
Expand Down Expand Up @@ -317,12 +318,12 @@ def test_read_sql_columns_where_bbox(naturalearth_lowres_all_ext):
sql=sql,
sql_dialect="OGRSQL",
columns=["iso_a3_renamed", "name"],
where="iso_a3_renamed IN ('CAN', 'USA', 'MEX')",
bbox=(-140, 20, -100, 45),
where="iso_a3_renamed IN ('CRI', 'PAN')",
bbox=(-85, 8, -80, 10),
)
assert len(df.columns) == 3
assert len(df) == 2
assert df.iso_a3_renamed.tolist() == ["USA", "MEX"]
assert df.iso_a3_renamed.tolist() == ["PAN", "CRI"]


def test_read_sql_skip_max(naturalearth_lowres_all_ext):
Expand Down
4 changes: 2 additions & 2 deletions pyogrio/tests/test_raw_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,10 +182,10 @@ def test_read_bbox(naturalearth_lowres_all_ext):

assert len(geometry) == 0

geometry, fields = read(naturalearth_lowres_all_ext, bbox=(-140, 20, -100, 45))[2:]
geometry, fields = read(naturalearth_lowres_all_ext, bbox=(-85, 8, -80, 10))[2:]

assert len(geometry) == 2
assert np.array_equal(fields[3], ["USA", "MEX"])
assert np.array_equal(fields[3], ["PAN", "CRI"])


def test_read_fids(naturalearth_lowres):
Expand Down

0 comments on commit 04d5da8

Please sign in to comment.