Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Kontinuation committed Aug 22, 2024
1 parent 034ce1f commit 2a39e34
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 48 deletions.
46 changes: 0 additions & 46 deletions docs/api/sql/Constructor.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,3 @@
## Read ESRI Shapefile

Introduction: Construct a DataFrame from a Shapefile

Since: `v1.0.0`

SparkSQL example:

```scala
var spatialRDD = new SpatialRDD[Geometry]
spatialRDD.rawSpatialRDD = ShapefileReader.readToGeometryRDD(sparkSession.sparkContext, shapefileInputLocation)
var rawSpatialDf = Adapter.toDf(spatialRDD,sparkSession)
rawSpatialDf.createOrReplaceTempView("rawSpatialDf")
var spatialDf = sparkSession.sql("""
| ST_GeomFromWKT(rddshape), _c1, _c2
| FROM rawSpatialDf
""".stripMargin)
spatialDf.show()
spatialDf.printSchema()
```

!!!note
The path to the shapefile is the path to the folder that contains the .shp file, not the path to the .shp file itself. The file extensions of .shp, .shx, .dbf must be in lowercase. Assume you have a shape file called ==myShapefile==, the path should be `XXX/myShapefile`. The file structure should be like this:
```
- shapefile1
- shapefile2
- myshapefile
- myshapefile.shp
- myshapefile.shx
- myshapefile.dbf
- myshapefile...
- ...
```

!!!warning
Please make sure you use ==ST_GeomFromWKT== to create Geometry type column otherwise that column cannot be used in SedonaSQL.

If the file you are reading contains non-ASCII characters you'll need to explicitly set the Spark config before initializing the SparkSession, then you can use `ShapefileReader.readToGeometryRDD`.

Example:

```scala
spark.driver.extraJavaOptions -Dsedona.global.charset=utf8
spark.executor.extraJavaOptions -Dsedona.global.charset=utf8
```

## ST_GeomCollFromText

Introduction: Constructs a GeometryCollection from the WKT with the given SRID. If SRID is not provided then it defaults to 0. It returns `null` if the WKT is not a `GEOMETRYCOLLECTION`.
Expand Down
69 changes: 67 additions & 2 deletions docs/tutorial/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -459,9 +459,74 @@ root
|-- prop0: string (nullable = true)
```

## Load Shapefile using SpatialRDD
## Load Shapefile

Shapefile can be loaded by SpatialRDD and converted to DataFrame using Adapter. Please read [Load SpatialRDD](rdd.md#create-a-generic-spatialrdd) and [DataFrame <-> RDD](#convert-between-dataframe-and-spatialrdd).
Since v`1.7.0`, Sedona supports loading Shapefile as a DataFrame.

=== "Scala/Java"

```scala
val df = sedona.read.format("shapefile").load("/path/to/shapefile")
```

=== "Java"

```java
Dataset<Row> df = sedona.read().format("shapefile").load("/path/to/shapefile")
```

=== "Python"

```python
df = sedona.read.format("shapefile").load("/path/to/shapefile")
```

The input path can be a directory containing one or multiple shapefiles, or path to a `.shp` file.

- When the input path is a directory, all shapefiles under the directory will be loaded.
- When the input path is a `.shp` file, that shapefile will be loaded. Sedona will look for sibling files (`.dbf`, `.shx`, etc.) with the same main file name and load them automatically.

The name of the geometry column is `geometry` by default. You can change the name of the geometry column using the `geometry.name` option. If one of the non-spatial attributes is named "geometry", `geometry.name` must be configured to avoid conflict.

=== "Scala/Java"

```scala
val df = sedona.read.format("shapefile").option("geometry.name", "geom").load("/path/to/shapefile")
```

=== "Java"

```java
Dataset<Row> df = sedona.read().format("shapefile").option("geometry.name", "geom").load("/path/to/shapefile")
```

=== "Python"

```python
df = sedona.read.format("shapefile").option("geometry.name", "geom").load("/path/to/shapefile")
```

Each record in shapefile has a unique record number, that record number is not loaded by default. If you want to include record number in the loaded DataFrame, you can set the `key.name` option to the name of the record number column:

=== "Scala/Java"

```scala
val df = sedona.read.format("shapefile").option("key.name", "FID").load("/path/to/shapefile")
```

=== "Java"

```java
Dataset<Row> df = sedona.read().format("shapefile").option("key.name", "FID").load("/path/to/shapefile")
```

=== "Python"

```python
df = sedona.read.format("shapefile").option("key.name", "FID").load("/path/to/shapefile")
```

If you are using Sedona earlier than v`1.7.0`, you can load shapefiles as SpatialRDD and converted to DataFrame using Adapter. Please read [Load SpatialRDD](rdd.md#create-a-generic-spatialrdd) and [DataFrame <-> RDD](#convert-between-dataframe-and-spatialrdd).

## Load GeoParquet

Expand Down

0 comments on commit 2a39e34

Please sign in to comment.