Releases: geopandas/pyogrio
Releases Β· geopandas/pyogrio
Version 0.10.0
Improvements
- Add support to read, write, list, and remove
/vsimem/
files (#457).
Bug fixes
- Silence warning from
write_dataframe
withGeoSeries.notna()
(#435). - Enable mask & bbox filter when geometry column not read (#431).
- Raise NotImplmentedError when user attempts to write to an open file handle (#442).
- Prevent seek on read from compressed inputs (#443).
Packaging
- For the conda-forge package, change the dependency from
libgdal
to
libgdal-core
. This package is significantly smaller as it doesn't contain
some large GDAL plugins. Extra plugins can be installed as seperate conda
packages if needed: more info here.
This also leads topyproj
becoming an optional dependency; you will need
to installpyproj
in order to support spatial reference systems (#452). - The GDAL library included in the wheels is updated from 3.8.5 to GDAL 3.9.2 (#466).
- pyogrio now requires a minimum version of Python >= 3.9 (#473).
- Wheels are now available for Python 3.13.
Version 0.9.0
Version v0.8.0
Improvements
- Support for writing based on Arrow as the transfer mechanism of the data
from Python to GDAL (requires GDAL >= 3.8). This is provided through the
newpyogrio.raw.write_arrow
function, or by using theuse_arrow=True
option inpyogrio.write_dataframe
(#314, #346). - Add support for
fids
filter toread_arrow
andopen_arrow
, and to
read_dataframe
withuse_arrow=True
(#304). - Add some missing properties to
read_info
, including layer name, geometry name
and FID column name (#365). read_arrow
andopen_arrow
now provide
GeoArrow-compliant extension metadata,
including the CRS, when using GDAL 3.8 or higher (#366).- The
open_arrow
function can now be used without apyarrow
dependency. By
default, it will now return a stream object implementing the
Arrow PyCapsule Protocol
(i.e. having an__arrow_c_stream__
method). This object can then be consumed
by your Arrow implementation of choice that supports this protocol. To keep
the previous behaviour of returning apyarrow.RecordBatchReader
, specify
use_pyarrow=True
(#349). - Warn when reading from a multilayer file without specifying a layer (#362).
- Allow writing to a new in-memory datasource using io.BytesIO object (#397).
Bug fixes
- Fix error in
write_dataframe
if input has a date column and
non-consecutive index values (#325). - Fix encoding issues on windows for some formats (e.g. ".csv") and always write ESRI
Shapefiles using UTF-8 by default on all platforms (#361). - Raise exception in
read_arrow
orread_dataframe(..., use_arrow=True)
if
a boolean column is detected due to error in GDAL reading boolean values for
FlatGeobuf / GPKG drivers (#335, #387); this has been fixed in GDAL >= 3.8.3. - Properly ignore fields not listed in
columns
parameter when reading from
the data source not using the Arrow API (#391). - Properly handle decoding of ESRI Shapefiles with user-provided
encoding
option forread
,read_dataframe
, andopen_arrow
, and correctly encode
Shapefile field names and text values to the user-providedencoding
for
write
andwrite_dataframe
(#384). - Fixed bug preventing reading from bytes or file-like in
read_arrow
/
open_arrow
(#407).
Packaging
- The GDAL library included in the wheels is updated from 3.7.2 to GDAL 3.8.5.
Potentially breaking changes
- Using a
where
expression combined with a list ofcolumns
that does not include
the column referenced in the expression is not recommended and will now
return results based on driver-dependent behavior, which may include either
returning empty results (even if non-empty results are expected fromwhere
parameter)
or raise an exception (#391). Previous versions of pyogrio incorrectly
set ignored fields against the data source, allowing it to return non-empty
results in these cases.
Version 0.7.2
Version 0.7.1
Bug fixes
- Fix unspecified dependency on
packaging
(#318).
Version 0.7.0
Improvements
- Support reading and writing datetimes with timezones (#253).
- Support writing dataframes without geometry column (#267).
- Calculate feature count by iterating over features if GDAL returns an
unknown count for a data layer (e.g., OSM driver); this may have signficant
performance impacts for some data sources that would otherwise return an
unknown count (count is used inread_info
,read
,read_dataframe
) (#271). - Add
arrow_to_pandas_kwargs
parameter toread_dataframe
+ reduce memory usage
withuse_arrow=True
(#273) - In
read_info
, the result now also contains thetotal_bounds
of the layer as well
as some extracapabilities
of the data source driver (#281). - Raise error if
read
orread_dataframe
is called with parameters to read no
columns, geometry, or fids (#280). - Automatically detect supported driver by extension for all available
write drivers and addition ofdetect_write_driver
(#270). - Addition of
mask
parameter toopen_arrow
,read
,read_dataframe
,
andread_bounds
functions to select only the features in the dataset that
intersect the mask geometry (#285). Note: GDAL < 3.8.0 returns features that
intersect the bounding box of the mask when using the Arrow interface for
some drivers; this has been fixed in GDAL 3.8.0. - Removed warning when no features are read from the data source (#299).
- Add support for
force_2d=True
withuse_arrow=True
inread_dataframe
(#300).
Other changes
-
test suite requires Shapely >= 2.0
-
using
skip_features
greater than the number of features available in a data
layer now returns empty arrays forread
and an empty DataFrame for
read_dataframe
instead of raising aValueError
(#282). -
enabled
skip_features
andmax_features
forread_arrow
and
read_dataframe(path, use_arrow=True)
. Note that this incurs overhead
because all features up to the next batch size abovemax_features
(or size
of data layer) will be read prior to slicing out the requested range of
features (#282). -
The
use_arrow=True
option can be enabled globally for testing using the
PYOGRIO_USE_ARROW=1
environment variable (#296).
Bug fixes
- Fix int32 overflow when reading int64 columns (#260)
- Fix
fid_as_index=True
doesn't set fid as index usingread_dataframe
with
use_arrow=True
(#265) - Fix errors reading OSM data due to invalid feature count and incorrect
reading of OSM layers beyond the first layer (#271) - Always raise an exception if there is an error when writing a data source
(#284)
Potentially breaking changes
- In
read_info
(#281):- the
features
property in the result will now be -1 if calculating the
feature count is an expensive operation for this driver. You can force it to be
calculated using theforce_feature_count
parameter. - for boolean values in the
capabilities
property, the values will now be
booleans instead of 1 or 0.
- the
Packaging
- The GDAL library included in the wheels is updated from 3.6.4 to GDAL 3.7.2.
Version 0.6.0
Improvements
- Add automatic detection of 3D geometries in
write_dataframe
(#223, #229) - Add "driver" property to
read_info
result (#224) - Add support for dataset open options to
read
,read_dataframe
, and
read_info
(#233) - Add support for pandas' nullable data types in
write_dataframe
, or
specifying a mask manually for missing values inwrite
(#219) - Standardized 3-dimensional geometry type labels from "2.5D " to
" Z" for consistency with well-known text (WKT) formats (#234) - Failure error messages from GDAL are no longer printed to stderr (they were
already translated into Python exceptions as well) (#236). - Failure and warning error messages from GDAL are no longer printed to
stderr: failures were already translated into Python exceptions
and warning messages are now translated into Python warnings (#236, #242). - Add access to low-level pyarrow
RecordBatchReader
via
pyogrio.raw.open_arrow
, which allows iterating over batches of Arrow
tables (#205). - Add support for writing dataset and layer metadata (where supported by
driver) towrite
andwrite_dataframe
, and add support for reading
dataset and layer metadata inread_info
(#237).
Packaging
- The GDAL library included in the wheels is updated from 3.6.2 to GDAL 3.6.4.
- Wheels are now available for Linux aarch64 / arm64.
Version 0.5.1
Version 0.5.0
Major enhancements
- Support for reading based on Arrow as the transfer mechanism of the data
from GDAL to Python (requires GDAL >= 3.6 andpyarrow
to be installed).
This can be enabled by passinguse_arrow=True
topyogrio.read_dataframe
(or by usingpyogrio.raw.read_arrow
directly), and provides a further
speed-up (#155, #191). - Support for appending to an existing data source when supported by GDAL by
passingappend=True
topyogrio.write_dataframe
(#197).
Potentially breaking changes
- In floating point columns, NaN values are now by default written as "null"
instead of NaN, but with an option to control this (passnan_as_null=False
to keep the previous behaviour) (#190).
Improvements
- It is now possible to pass GDAL's dataset creation options in addition
to layer creation options inpyogrio.write_dataframe
(#189). - When specifying a subset of
columns
to read, unnecessary IO or parsing
is now avoided (#195).
Packaging
- The GDAL library included in the wheels is updated from 3.4 to GDAL 3.6.2,
and is now built with GEOS and sqlite with rtree support enabled
(which allows writing a spatial index for GeoPackage). - Wheels are now available for Python 3.11.
- Wheels are now available for MacOS arm64.
Version 0.4.2
Improvements
- new
get_gdal_data_path()
utility funtion to check the path of the data
directory detected by GDAL (#160)
Bug fixes
- register GDAL drivers during initial import of pyogrio (#145)
- support writing "not a time" (NaT) values in a datetime column (#146)
- fixes an error when reading GPKG with bbox filter (#150)
- properly raises error when invalid where clause is used on a GPKG (#150)
- avoid duplicate count of available features (#151)