-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Dataset df/gdf #704
Comments
I believe this was handled in #741 |
When I created this issue, I was thinking about starting in the from_dataframe method, removing one of _df or _gdf leaving only one way to work locally, avoiding having the same data twice, and probably doing things after downloading a table. I think this one still makes sense, we will need to do fewer things thanks to #741, but I am not sure about the scope |
ei @andy-esch & @alrocar how do you see adding the We could handle this issue from 2 sides:
We have talked about it in the past, and the main reason was the size of the
|
+1 to remove internal duplication of
I don't have a strong opinion about this, so I prefer to listen to Andy's thoughts. |
(I have deleted my previous comment) Yes, we are creating 2 different objects. The second one, the GeoDataFrame one is not saved in the Dataset. So we are using 2x memory. Showing the code deeper, I am afraid using only one object, we could transform the dataframe in the Map render process |
I am going to focus on the scenario without geopandas installed. We could advance more if we find it useful in the future |
We have 2 properties to “support” DataFrame and geoDataFrame. Furthermore, we are using DataFrames and casting to GeoDataFrame (compute_geodataframe method) when we want to create a CARTO VL map. In the end, we have the same thing twice and in some cases, we have the same thing even twice in memory. In the geojson case, we are already converting it into a GeoDataFrame.
The proposal is to use only GeoDataFrame by default, having only one property and one creation method that tries to create a GeoDataFrame from the beginning:
With this PR https://github.com/CartoDB/cartoframes/pull/741/files we already have a big part of this work done.
We would need to solve the case when a user creates the Dataset from DataFrame without geometry and wants to add a geometry after that. Probably, we will need to create a specific method for that.
The text was updated successfully, but these errors were encountered: