-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: internal state of Dataset changes after Map operation #861
Comments
This is a simplified case: ds = Dataset('table')
ds.download() # returns a Dataframe
ds.download() # returns None Why?
ds = Dataset('table')
ds.download()
Of course, I understand the problem. Following the functional approach (pure functions), both requests over the same object, should return the same. ProposalI have 2 approaches for the result of the second
What do you think? @cmongut @andy-esch I would go with the second one without a message. |
Something similar happens with upload method working with a query. ds = Dataset('SELECT * FROM my_table WHERE ...')
ds.upload(table_name='my_table') # creating table 'my_table' from query
ds.upload(table_name='my_table') # raises error: 'Nothing to upload. Dataset needs a DataFrame, a GeoDataFrame or a query to upload data to CARTO.' In the first In this case, it is more difficult to detect the situation to bring better advice to the user or better behavior. I think this is a special use case and not very usual, but maybe we should take into a ccount |
Let's wait for @andy-esch opinion but here it's mine. When calling |
A bit of context about the state of the Dataset class.In the beginning, we wanted to solve every situation with the same class. But we realized that it was a real pain, full of corner cases. And finally, we only supported 2 specific "state" changes:
download case:For me, download again only makes sense in this case: you have already downloaded a table and then the table receives some changes (inserts, updates or deletes). You know that and you want to download it again. In any other case, it makes no sense. And following our approach about the "dataset state" it could/should? be solved creating a new Dataset. |
I like proposal 2 as well. One main reason is that the table could change between operations and the user wants to fetch the latest version.
I vote for By the way, I would expect Overall, I thought we decided a while back to make the Dataset objects immutable. With this approach, the Dataset instances don't change but operations on them return new class instances. ds_table = Dataset('tablename')
# download the table dataset into a DataframeDataset
ds_dataframe = ds_table.download()
ds_query = Dataset('select * from table where x > 50')
# 'uploading' a query to create a table returns a
# TableDataset
ds_table_from_query = ds.upload(table_name='tablename') |
It is what we are doing right now if you don't use
I see you both think the 2 exceptions are a bad solution and I agree. Now, I see what I explained in the previous comment (as the reasons to have 2 exceptions) as a user responsibility or decision. And probably, seeing this way, Dataset class becomes easier to understand. Example: dataframe - download
Example: query - upload
It is another possibility, but I am not sure about it. For example, in the download case, I think we should return a DataFrame, because it is the object the people want to play with. |
After using a
Dataset
in the following workflow, the Dataset seems to have forgotten it's tablenameThis is using the latest version of
develop
.The text was updated successfully, but these errors were encountered: