How to run the NYC Taxicab analysis notebook with Azure DataBricks #236

gadgetman4u · 2019-02-06T20:22:53Z

I would like to run the NYC Taxicab analysis notebook with Azure Databricks but the data is in S3. How do I save the data into Azure? Would I save to Azure Data Lake Store and then mount it to Databricks?

Thanks.

gadgetman4u · 2019-02-06T21:33:31Z

I already saved the neighborhoods.geojson file into Azure Data Lake Store and placed the path to it in the dbutils.fs.mount. How do I extract the neighborhoods and trips as per the code here?

val trips = sqlContext.read
.format("com.databricks.spark.csv")
.option("comment", "V")
.option("mode", "DROPMALFORMED")
.schema(schema)
.load("/mnt/nyctaxicabanalysis/trips/*")
.withColumn("point",
point($"pickup_longitude",$"pickup_latitude"))
.cache()

val neighborhoods = sqlContext.read
.format("magellan")
.option("type", "geojson")
.load("/mnt/nyctaxicabanalysis/neighborhoods/")
.select($"polygon",
$"metadata"("neighborhood").as("neighborhood"))
.cache()

Thanks.

gadgetman4u · 2019-02-11T16:00:13Z

Does anybody know how I can upload the data into Azure so I can extract the neighborhoods and trips?

guiferviz · 2019-02-13T14:26:11Z

I found this today that may be interesting for you. I'm not the author: https://lamastex.github.io/scalable-data-science/sds/2/2/db/032_NYtaxisInMagellan.html
It only works for me using the Databricks runtime with Spark 2.1.1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run the NYC Taxicab analysis notebook with Azure DataBricks #236

How to run the NYC Taxicab analysis notebook with Azure DataBricks #236

gadgetman4u commented Feb 6, 2019

gadgetman4u commented Feb 6, 2019

gadgetman4u commented Feb 11, 2019

guiferviz commented Feb 13, 2019

How to run the NYC Taxicab analysis notebook with Azure DataBricks #236

How to run the NYC Taxicab analysis notebook with Azure DataBricks #236

Comments

gadgetman4u commented Feb 6, 2019

gadgetman4u commented Feb 6, 2019

gadgetman4u commented Feb 11, 2019

guiferviz commented Feb 13, 2019