Skip to content

Geoparquet file reader in Spark #56

Answered by pomadchin
ashar236 asked this question in Q&A
Discussion options

You must be logged in to vote

Hey @ashar236, there should be no problems, it will be read as BinaryType by Spark; You'd need to have UDFs or CatalystExpressions to interpret it as a Geometry if needed (you can use JTS for these purposes); the usage looks smth like this (via a simple UDF defined):

import org.locationtech.jts.geom.Geometry
import org.locationtech.jts.io._

// I feel like you'd need a Geometry ExpressionEncoder to keep it in this shape 
// another option is to add a Spark UDT to make it a typed column
val geomFromWKB = udf { arr: Array[Byte] => 
  // can be improved to try to reuse the WKBReader and not to create the new one every time
  new WKBReader().read(arr)
}

val df = spark.read.parquet("/path/to/…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ashar236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants