Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with geopandas #588 #818

Closed
wants to merge 24 commits into from

Conversation

iliatimofeev
Copy link
Contributor

@iliatimofeev iliatimofeev commented May 6, 2018

Integration with geopandas( fix #588)
Now GeoDataFrame is valid Data type for alt.Chart

import altair as alt
import geopandas as gpd

alt.renderers.enable('notebook')

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

#GeoDataFrame could be passed as usual pd.DataFrame 
alt.Chart(world[world.continent!='Antarctica']).mark_geoshape(
).project(
).encode( 
    # for GeoDataFrame fields shorthand infer types as for regular pd.DataFrame
    color='pop_est', 
    tooltip='id:Q' # GeoDataFrame.index is accessible as id
).properties( 
    width=500,
    height=300
)

visualization 2

Any object with geo_interface could be used as data. But in that case it will be interpreted as one shape

alt.Chart(world[world.continent=='Africa'].geometry).mark_geoshape(
).project(
).properties( 
    width=500,
    height=300
)

visualization 4

If you want to split FeatureCollection into different shapes use alt.geojson_feature (76a2af8)

alt.Chart(
    alt.geojson_feature(world[world.continent=='Africa'].geometry,'features')
   ).mark_geoshape( 
).project(
).encode(
    fill = alt.Color('id:N',legend=None)
).properties( 
    width=500,
    height=300
)

visualization 5

With GeoPandas to_json or to_values directly.

alt.Chart(alt.to_json(world[world.continent=='Africa'])
   ).mark_geoshape( 
).project(
).encode(
    fill = 'pop_est:Q'
).properties( 
    width=500,
    height=300
)

visualization 6

TODO:

  • initial GeoDataFrame support
  • add tests coverage
  • update documentation
  • [na] create a demo notebook - notebooks are in different repository

Changes summary

All three of to_values, to_json, to_csv act differently in two new use cases.

  1. Case GeoDataFrame (from GeoPandas) that is pd.DataFrame with __geo_interface__:

    • to_csv() I don't know any way to use geometry in this format from vega-light. So it throws NotImplemented to avoid user misunderstandings.
    • to_values() and to_json() will save GeoDataFrame.geometry through __geo_interface__ into geojson enriching it with values from data. That flatting allows simplify usage by avoiding nested properties. The Idea is inspired by vega/vega-lite/#3341
  2. Case geo_interface only for objects with __geo_interface__ than is not a pd.DataFrame instance so it is not GeoDataFrame:

    • to_csv() Throws NotImplemented. Geojson tree structure could not be saved in plain csv format.
    • to_values() and to_json() will save object through __geo_interface__ into geojson.
    • New function geojson_feature a clone of topojson_feature wraps data with appropriate alt.Data and allows to split GeoJSON collection into different shapes.

@iliatimofeev
Copy link
Contributor Author

Do we need alt.to_geojson_values version?
I think GeoDataFrame support and example on how to work with other sources is enough.

from geojson import FeatureCollection,Feature
from geojson.utils import generate_random
import altair as alt

alt.renderers.enable('notebook')

geo_data = FeatureCollection([Feature(geometry=generate_random('LineString'),
                                      properties={"prop": i+1}) for i in range(5)])

alt.Chart(alt.InlineData(geo_data.__geo_interface__,{'type':'json','property':'features'})
).mark_geoshape(
    fillOpacity=0.8
).project(
).encode(
    fill='properties.prop:O'
).properties( width=500,height=300)

visualization 3

@jakevdp
Copy link
Collaborator

jakevdp commented May 7, 2018

This is great – thanks for tackling this!

One comment: I think the cleanest approach would be to make it so the to_values data transformer recognizes any input object with a geo interface, and transforms it appropriately. That will make things as general as possible, not require any additional boilerplate from the user, and not necessitate any new package dependencies.

In particular, I think we should avoid explicit dependence on geojson within Altair's test suite if at all possible.

Please let me know when this is ready for review!

@iliatimofeev
Copy link
Contributor Author

Is it ok to use gpd = pytest.importorskip('geopandas') for tests?

@jakevdp
Copy link
Collaborator

jakevdp commented May 7, 2018

I'd prefer a test using a custom class that provides a simple geo interface output, if that's at all possible, so we have some test coverage of the behavior even without geo libraries installed.

We could also use an importorskip on a few tests as well.

@iliatimofeev
Copy link
Contributor Author

I think code and test are ready for review.
Would you give my some hints on documentation updates?

Copy link
Collaborator

@jakevdp jakevdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment below, but overall I'm a bit confused. It looks like this has three broad changes:

  1. to_json() will act differently if the dataframe has a __geo_interface__
  2. to_csv() will act differently if the dataframe has a __geo_interface__
  3. There is a new to_geojson_values data transformer that the user will have to somewhere explicitly enable in order to use geojson inputs.

Is that correct?

I'm a bit confused regarding the goal of these changes, and what your intent is with regard to the API and the user experience with Geojson data.

My thought would be that we should make it as seamless as possible: i.e. a user can pass a geojson object as they would any dataframe, and it should work without any additional configuration.

attrs['type'] = infer_vegalite_type(data[attrs['field']])
if hasattr(data,'__geo_interface__'): #TODO: Add descripion in shorthand
attrs['field']='properties.'+attrs['field']

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something strange is going on here: is not checking for type in attrs necessary for this PR? And why is this block doubly indented?

Copy link
Contributor Author

@iliatimofeev iliatimofeev May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'type' not in attrs this is still checked, but in the line below (232).
The idea behind: to allow user work with GeoDataFrame like as regular DataFrame but showing geoshapes attached to its rows.
Implementation details:
First of all GeoDataFrame is subclass of pd.DataFrame with __geo_interface__. So it is instance of pd.DataFrame that has attribute __geo_interface__. GeoPandas are stored as geojson FeatureCollection with each row as Feature object where all columns are placed in properties object. Sample will be more informative:

{ /* vega-light data*/
    "format": {
        "property": "features", /* generate geoshape for each row*/
        "type": "json"
    },
    "values": { /* valid geojson for all rows*/
        "type": "FeatureCollection",
        "features": [
            { /* valid geojson for each row*/
                "type": "Feature",
                "properties": { /* column values  */
                    "pop_est": 12799293.0,
                    "continent": "Africa",
                },
                "geometry": { /* geometry of the row  */
                    "type": "MultiPolygon",
                    "coordinates": [ /* a lot of numbers*/]
                }
            }
        ]
    }
}

So first step is to add ["property": "features"] to vega-light data format description. That splits valid geojson stored in values back to rows of GeoDataFrame (it is possible to replace this step with storing content of "features" directly into "values" but that will made "values" invalid geojson).

Next is access to column values of GeoDataFrame. Values are accessible from chart as "properties.column_name". I hoped to simplify user experience by adding "properties." in shorthand. Now if user use shorthands he will get same behavior for GeoDataFrame as for DataFrame (take a look to updated description of PR) .

May be I should check if field name starts with "properties." to avoid doubling it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see.

We can think about that. In the meantime can you fix the indentation? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore due new save format

@iliatimofeev
Copy link
Contributor Author

One comment below, but overall I'm a bit confused. It looks like this has three broad changes:

Yes, but I see it a little bit different.

  1. There is a new to_geojson_values data transformer that the user will have to somewhere explicitly enable in order to use geojson inputs.

No it was removed by 2cd9fde after your proposal of using to_value. May be I could make another PR without history to simplify review?

  1. to_json() will act differently if the dataframe has a geo_interface
  2. to_csv() will act differently if the dataframe has a geo_interface

Yes, all three of to_values, to_json, to_csv act differently in two new use cases.

  1. Case GeoDataFrame (from GeoPandas) that is pd.DataFrame with __geo_interface__:
    • to_csv() I don't know any way to use geometry in this format from vega-light. So it throws NotImplemented to avoid user misunderstandings.
    • to_values() and to_json() will save GeoDataFrame through __geo_interface__ into geojson with "format":{"property": "features"}
    • shorthand update to support type identification from GeoDataFrame and simplification of access to values in geojson format (see details in comment to your change request)
  2. Case geo_interface only for objects with __geo_interface__ than is not a pd.DataFrame instance so it is not GeoDataFrame:
    • to_csv() Throws NotImplemented. Geojson tree structure could not be saved in plain csv format.
    • to_values() and to_json() will save object through __geo_interface__ into geojson.
    • In that case we can not make assumptions on what is inside. It could be a simple polygon that will be drawn properly. Also we don't have any type information so no magic in shorthand could be provided.
      (Truly we could check if it is some Collection and split it but I see it as too magical behavior to be expected).

@jakevdp
Copy link
Collaborator

jakevdp commented May 8, 2018

No it was removed by 2cd9fde

Strange... when I looked at changes a few hours ago, I didn't see the updated commits.

I see that the changes are entirely different now. Let me take another look.

@@ -33,7 +33,7 @@ class DataTransformerRegistry(PluginRegistry[DataTransformerType]):
# form.
#
# A data model transformer has the following type signature:
# DataModelType = Union[dict, pd.DataFrame]
# DataModelType = Union[dict, pd.DataFrame, gpd.GeoDataFrame, geojson interface object]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that any Python object can expose a geojson interface? In that case, I don't think the typing-style specification is all that useful anymore. (or just needs to have Any as an entry)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According Sean Gillies it is growing community of adopters:). In fact there is some list of classes unknown to me. In my opinion there is no crucial value in supporting general interface - just funny feature. If it breaks some concept maybe we should remove interface support.

Or we can declare only geojson, so all others will be supported accidentally.

Union[dict, pd.DataFrame, gpd.GeoDataFrame, geojson.GeoJSON] ? What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I'm not certain. I don't know the geo space well enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think Union[dict, pd.DataFrame, gpd.GeoDataFrame, geojson.GeoJSON] better for now :) I'll leave it until further instructions.


elif hasattr(data,'__geo_interface__'): # geojson object
with open(filename,'w') as f:
json.dump(data.__geo_interface__, f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put all the geo_interface logic in one block... e.g.

if hasattr(data, '__geo_interface__'):
    # do geo stuff
elif isinstance(data, pd.DataFrame):
    # do dataframe stuff
elif isinstance(data, dict):
   # do dict stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, GeoPandas needs extra line of code than a simple __geo_interface__. I had something like :

if isinstance(data, pd.DataFrame) and hasattr(data, '__geo_interface__'):
    # do GeoPandas stuff
elif hasattr(data, '__geo_interface__'):
    # do geo_interface stuff
elif isinstance(data, pd.DataFrame):
    # do dataframe stuff
elif isinstance(data, dict):
   # do dict stuff

That is simple and clear. But I supposed that it could provoke errors in case some modification in Pandas block without mirroring it to GeoPandas. That why is so strange logic in final version.

Should i rewrite it back?

Copy link
Collaborator

@jakevdp jakevdp May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see – there's overlap between geo and dataframe.

I think it would be best to put all the geo logic under one if statement, so something like:

if hasattr(data, '__geo_interface__'):
    if dataframe:
        #stuff
    else:
        # other stuff
elif isinstance(data, pd.DataFrame):
    # do dataframe stuff
elif isinstance(data, dict):
   # do dict stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,3 @@
{
"python.pythonPath": "/Users/tim/anaconda/anaconda/envs/altair_test/bin/python"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

]
}
]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

attrs['type'] = infer_vegalite_type(data[attrs['field']])
if hasattr(data,'__geo_interface__'): #TODO: Add descripion in shorthand
attrs['field']='properties.'+attrs['field']

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see.

We can think about that. In the meantime can you fix the indentation? 😄

@@ -57,6 +57,8 @@ def limit_rows(data, max_rows=5000):
values = data['values']
else:
return data
else:
return data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there are three return statements in this code, the logic is pretty opaque (it took me a bit to read this and figure out what it was doing).

I think the function should be refactored for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

data = sanitize_dataframe(data)
data.to_csv(filename)
return {
'url': filename,
'format': {'type': 'csv'}
}
elif hasattr(data,'__geo_interface__'):#GeoJSON
raise NotImplementedError('to_csv only works with Pandas DataFrame objects.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here: please do the geo_interface logic all in one place, rather than checking it twice.

Also, four spaces for indentation please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mattijn
Copy link
Contributor

mattijn commented May 8, 2018

What does it take to include (potential) support for writing to topojson format?
Since vega-lite supports topojson format already and there is an open issue and PR at geopandas for writing topojson (geopandas/geopandas#610, geopandas/geopandas#645).

I’m not saying it should be implemented now, but once it is supported in geopandas it would be good to have an option to switch without refactoring the codebase of altair completely.

@iliatimofeev
Copy link
Contributor Author

@mattijn It would be tricky now as interface it is not defined yet, but as soon as GeoPandas release it (I hope we will see it) we should update to topojson as default behavior.

Copy link
Contributor Author

@iliatimofeev iliatimofeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all requested changes are done.

@jakevdp
Copy link
Collaborator

jakevdp commented May 9, 2018

Thanks – I'm traveling to PyCon tomorrow and speaking the next two days, so it may be a few days before I get back to reviewing this.

@iliatimofeev
Copy link
Contributor Author

While Travis is fighting my Python 3.5 code it skipped main part of the test based on selenium.
I decided to check that I can get the same result using GeoPandas instead of common Pandas. So I hacked tests_examples to substitute pd.DataFrame with my version of GeoDataFrame and then compared images with original example. I'm really not sure that best implementation was used in test_geopandas_examples. Although it gives a lot of information for today and it could be excluded from final result.

outcome example count
different 26 chart images are different, in most cases there are no image
fail 3 vega-light fall into Error('It should never reach here')
identical 15 good, but one crushes on hover
no_pandas 37 Pandas was not involved so we can't count it as success.

Detailed report is here.

It would take some time to analyze it, but my preliminary conclusion is that storing GeoDataFrame as GeoJSON was not best idea :(

I'll think about some other format to store data.

@jakevdp
Copy link
Collaborator

jakevdp commented May 14, 2018

Yes, selenium is not particularly easy to set up or install in Travis, so for the time being those tests are skipped in the CI.

I'm not certain I understand the issue with GeoDataFrame... are you trying to use GeoDataFrame for every example? What's the purpose of that?

@iliatimofeev
Copy link
Contributor Author

I was working on example for documentation:

import altair as alt
import geopandas as gpd

alt.renderers.enable('notebook')

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
data = world[world.continent=='Africa']

alt.hconcat(
    alt.Chart().mark_bar(
    ).encode( 
        x='pop_est', 
        #SortField do not use shorthand so full field naming is needed. 
        # \\. is to avoid Vega-lite bug 3727
        y=alt.Y('name', sort=alt.SortField(field='properties\\.pop_est', 
                                           op='sum', order='descending')),
        tooltip='pop_est',
        color='pop_est',
    ).properties( 
        width=300,
        height=400,
    ),
    alt.Chart().mark_geoshape(
        ).project(
        ).encode( 
            # for GeoDataFrame fields shorthand adds "properties." and infer types 
            color='pop_est',
            tooltip='name'
        ).properties( 
            width=400,
            height=400,
            title='Africa population'
        ),
    
    data=data
)         

visualization 7

Now shorthand adds "properties." to field names and adds a "title" property if it is not set. But I understood that there are a lot of field definitions without shorthand, so we need to add "properties." in all other cases and than test it somehow.

Then I bumped into vega-lite #3727 than on #3729, so I started thinking on systematic approach to test it all. I found "test_examples" that do complex test like I needed and decided that if use just data from GeoDataFrame in the same code I should expect the same charts as with DataFrame but with data stored otherway.

I expected to find most of places where "properties." should be added and come up with solution for that, but things gone wrong.

@iliatimofeev
Copy link
Contributor Author

First six charts analysis in a table:

Example Vega-light Altair
error_bars_with_ci Vega Editor shows it correctly ok
diverging_stacked_bar_chart vega/vega-lite#3742 ok
beckers_barley_trellis_plot vega/vega-lite#3727 alt.SortField
step_chart ok .transform_filter( datum.symbol == 'GOOG' )
horizontal_stacked_bar_chart vega/vega-lite#3744 ok
scatter_marginal_hist vega/vega-lite#3744 ok

Basing on that table I can conclude that Vega-light has issues with nested data that will block all efforts to support GeoJSON on Altair. In other hand to hide from user that GeoPandas stored in nested structures will need some real efforts that may include new Mixin for transformers and so.

New approach
If the goal is to show geospatial data from GeoDataFrame with minimal efforts from user, why not just store it in some format that works. I can mix geometry and data in one record and all will work without any extra efforts from Altair or Vega-Light with one acceptable exception: fields could not be named 'type' and 'geometry'.

What do you think about it?

Proof of concept
It will work in current release version

import altair as alt
import geopandas as gpd

alt.renderers.enable('notebook')

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
data = world[world.continent=='Africa']

alt_data = dict(values =[dict(d,type = g['type'],geometry=g['geometry'])
                         for d,g in zip(
                            data.drop('geometry',axis=1).to_dict('row'),
                            data.geometry.__geo_interface__['features']
                         )] )

alt.Chart(alt_data).mark_geoshape(
).project(
).encode( 
    color='pop_est:Q',
    tooltip='name:N' 
).properties( 
    width=500,
    height=300
)

visualization 8

@iliatimofeev iliatimofeev mentioned this pull request May 16, 2018
@iliatimofeev
Copy link
Contributor Author

Is it ok to have dependence from geopandas in an example ? Or notebook with examples is preferable?

elif isinstance(data, dict) and ('values' in data):
values = data['values']
else:
return data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently written the function will never progress beyond this line, and the max_rows check will never happen

Copy link
Contributor Author

@iliatimofeev iliatimofeev May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if so, how could it pass test_limit_rows()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean that it bypass unknown data types, then yes like #887

@jakevdp
Copy link
Collaborator

jakevdp commented May 17, 2018

Thanks for the continued work on this.

I have to admit I'm a bit apprehensive about merging this at this point: this ends up being much more complicated than I anticipated in the initial discussion, and there seem to be a lot of details and subtleties here. Admittedly geojson is not something I have a deep understanding of... I would not be confident in my ability to maintain this code, or to answer user questions that will inevitably come up.

@iliatimofeev
Copy link
Contributor Author

How can I support you in that decision?

Now with flat data version, in my opinion it looks much simpler that it was in progress.
Almost everything is located in utils.data.py, most of it in _geopandas_to_dict(data). One exception is geojson_feature which is next to its sister topo_feature in api.py.

If you really don't want include it in Altair, could I publish "plugin" connector based on your 'utils.data' which will overload pipeline with my version?

@iliatimofeev iliatimofeev changed the title [WIP] Integration with geopandas #588 Integration with geopandas #588 May 18, 2018
@mattijn
Copy link
Contributor

mattijn commented Jun 9, 2018

Hm, this PR is still open. Let me be brave and add a comment in the hope we might go forward (or backward, both is fine).

I'd played a bit with the way how you create a flat data version of the pandas dataframe including the geometry column and its seems to follow more the 'normal' row-oriented JSON. I really liked the example you wrote of the population of continental Africa, where you combined a bar chart with a spatial chart*1.

I'm worried though that if this flat data version is written on the side of altair the usage becomes static. We often create the vega(-lite) specification (eg. through altair) and then make the data dynamic using other data sources (sometimes out of our control). Since vega-lite contains geospatial support using the standard GeoJSON and other data sources also adopt these GeoJSON standard, I'd rather prefer that this flattening happens on the vega-lite side using GeoJSON as data input.

So even though I will use your flat data version for the time being, I think vega-lite should try to fix the issues regarding nested data (or drop support for GeoJSON). Your flat data version might be that fix.

Regarding plugins, I hope that we won't get a plugin for any mark of which Jake doesn't have a deep understanding of (of which I doubt this is the case). I'd rather prefer some more maintainers that can answer user questions.

*1 In the flat data version it creates a dictionary of the (geo)dataframe where there is no dtype checking/guessing anymore, so your example has to be specified as follow (included a brush for the bar chart to filter countries):

import altair as alt
import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
data = world[world.continent=='Africa']

alt_data = dict(values =[dict(d,type = g['type'],geometry=g['geometry'])
                         for d,g in zip(
                            data.drop('geometry',axis=1).to_dict('row'),
                            data.geometry.__geo_interface__['features']
                         )])

brush = alt.selection_interval(encodings=["y"])

alt.hconcat(
    alt.Chart().mark_bar().encode(
        x=alt.X('pop_est:Q', scale=alt.Scale(nice=False)),
        y=alt.Y('name:N', sort=alt.SortField(field='pop_est', 
                                             op='sum', order='descending')),
        tooltip=['name:N','pop_est:Q'],
        color=alt.condition(brush, 'pop_est:Q', alt.value('lightgray'))
        ).add_selection(
            brush
        ).properties( 
            width=300,
            height=400
        ),
    alt.Chart().mark_geoshape().project().encode( 
        color=alt.condition(brush, 'pop_est:Q', alt.value('lightgray')),
        tooltip=['name:N']
        ).properties( 
            width=400,
            height=400,
            title='Africa population'
        ),
    data=alt_data
)   

@iliatimofeev
Copy link
Contributor Author

iliatimofeev commented Jun 10, 2018

@jakevdp I've rebased to resolve merge conflicts, but is good point to make a final decision on that branch.

@mattijn thank you for your feedback.
First of all I improved geojson_feature to support GeoDataFrame, now it will be "sanitized" before storing so all types will be JSON compatible . You can use that feature in case you need canonical GeoJSON with data stored under properties, but in that case you will lose automatic type definition in shorthands.
I interpret GeoDataFrame is a DataFrame at the first, so from users perspective it should behave exactly as pd.DataFrame. Storing DataFrame columns under nested "properties" object will require to reference it as "properties.column_name" I see it as little bit uncomfortable and confusing for users without knowledge on GeoJSON specification. Idea to somehow hide that fact from a user became too tricky to implement and in fact has a lot of potential for a kind of magical behavior.

But fortunately GeoJSON Specification RFC 7946 allows use of "Foreign Members" in objects. That is exactly what I do during "flattering". So now is generating valid GeoJSON format, with one exception it stores array of GeoJSON objects which is acceptable for Vega but not allowed by specification. If you think that "data.values" by itself should be exactly a GeoJSON Object I can wrap it with collection and than spit it back with value of "data.format.property"

Regarding maintenance and user support I understand Jake's scepticism. GeoJSON it little bit bigger than just a mark type so users could have a lot directly unrelated questions that should be redirected somewhere and it could take some time. From my side I have plan to make a notebook with examples of making charts and maps using geopandas and altair. And I can support that area for now if it could help.

@jakevdp
Copy link
Collaborator

jakevdp commented Jun 10, 2018

I think it would be good to move forward with this.

But I think that #887 offers a much cleaner framework in which to add this sort of functionality. I thought we could merge that relatively quickly a few weeks ago, but review has been slow.

@iliatimofeev
Copy link
Contributor Author

@jakevdp I've reviewed #887 it really good idea to somehow universalise pipeline to work with different classes. But geojson and geopandas introduce some new cases to that abstraction (I commented it in the PR). That's why I would prefer if it possible to rebase #887 on my code to make a useful framework. Otherwise I'll need to somehow update their implementation of mechanics for handling invalid pipelines.

} for item in dct['data']['values']]
})

assert (data2[data.columns] == data).all().all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would data2[data.columns].equals(data) not work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should work :) and looks more readable

@iliatimofeev
Copy link
Contributor Author

@jakevdp I think is time to close branch in favour to separate connection module. Or you prefer to use this branch somehow?

Please take a look at https://iliatimofeev.github.io/gpdvega/. What do you think on idea to move it under altait-viz umbrella?

@jakevdp
Copy link
Collaborator

jakevdp commented Aug 15, 2018

Sorry, this got delayed because I'm still waiting on a review of #887.

But yes, I think a separate module is probably the way to go for this, since it adds such a large amount of very specific processing logic.

But once #887 is merged, adding that logic to Altair will be much cleaner.

@mattijn
Copy link
Contributor

mattijn commented Dec 13, 2019

Thanks for prototyping the __geo_interface__ approach @iliatimofeev! A slightly different approach was merged in #1664. This PR can be closed.

@jakevdp jakevdp closed this Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integration with geopandas geometries
4 participants