You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To give users better control over memory consumption, particularly when executing queries that produce large search results (hundreds of thousands, or even millions of granules), query results should not be obtained in their entirety and put into a list all at once before returning the results to the user. This can easily lead to "out of memory" errors for large results. Instead, an iterator (generator) should be returned so the user has the option to limit the number of results held in memory at any given moment.
Since this would be a breaking change to the existing get and get_all query methods, it might be worth defining a new method. However, given that this library has not reached a 1.0 release, making such a breaking change would not be unreasonable. Alternatively, a new method could be added while deprecating the existing ones. Either way, I recommend that returning an iterator being the only option, not an additional option, otherwise it can be confusing to users as to why there are multiple options. This also follows the Zen of Python's "There should be one-- and preferably only one --obvious way to do it." Further, returning an iterator does not prevent the user from populating a list themselves, if they wish.
There has been discussion amongst the users and maintainers of the earthaccess library about this very topic, and a number of us agreed that it makes more sense to implement this here rather than in earthaccess, which simply calls through to this library.
The text was updated successfully, but these errors were encountered:
Agreed, generator return type would be preferred implementation.
I would concur with the plan to add new functions, mark the existing ones as deprecated indicating that the list-type functions will be removed in the 1.0.0 release. I know this library is used in a few different places but I don't have a good handle on everywhere it gets used; so I think it is important to have at least 1, maybe more, minor versions with the existing functions marked as deprecated before completely removing them.
We (GES DISC) are using this library for some operational scripts so would also vote for minor versions before completely removing. Was going to use today's community for earthaccess to discuss this.
To give users better control over memory consumption, particularly when executing queries that produce large search results (hundreds of thousands, or even millions of granules), query results should not be obtained in their entirety and put into a list all at once before returning the results to the user. This can easily lead to "out of memory" errors for large results. Instead, an iterator (generator) should be returned so the user has the option to limit the number of results held in memory at any given moment.
Since this would be a breaking change to the existing
get
andget_all
query methods, it might be worth defining a new method. However, given that this library has not reached a 1.0 release, making such a breaking change would not be unreasonable. Alternatively, a new method could be added while deprecating the existing ones. Either way, I recommend that returning an iterator being the only option, not an additional option, otherwise it can be confusing to users as to why there are multiple options. This also follows the Zen of Python's "There should be one-- and preferably only one --obvious way to do it." Further, returning an iterator does not prevent the user from populating a list themselves, if they wish.There has been discussion amongst the users and maintainers of the earthaccess library about this very topic, and a number of us agreed that it makes more sense to implement this here rather than in earthaccess, which simply calls through to this library.
The text was updated successfully, but these errors were encountered: