-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MODISTools onboarding #246
Comments
Thanks for this submission @khufkens! I will follow up shortly with next steps. |
No rush, I promised both @sckott and @seantuck12 to push this back to rOpensci back in June but the summer got in the way. |
Editor checks:
Editor comments👋 @khufkens The package looks great and my preliminary checks with good practice don't raise anything significant. Your test coverage is already high and you might consider fixing the other two suggestions as I line up reviewers. Reviewer 1: @pmnatural
|
I have reached out to 3 potential reviewers on Sep 4th. Will update thread when I hear back. |
I fixed most of the notes above, but haven't pushed them. Keep me posted on progress. |
Thanks for agreeing to review @pmnatural 🙏 Please reach out if you have any questions. |
Tagged @etiennebr as the second reviewer. |
I don't think I'll update the CRAN package before review. I'm not sure how this influences the process. I've mainly been focussing on addressing the issues above and interoperability and docs. I still have to go through this: https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html to fix some issues (e.g. create a ornl_api function etc). |
Ok, I won't worry about the CRAN version; are the fix pushed to the github master branch? |
Yes, everything is in the master branch. Larger additions or refactoring is done on a separate one (but not applicable now). |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 10h (including reading all the material available to new reviewers) Review CommentsThanks @khufkens for your contribution. I believe the package can be useful to users looking for MODIS time series in specific locations and really simplifies the usage of the web API. I was surprised at first that it did not provide a I have a some miscellaneous comments and improvements to suggest, I think they could make the package even better by improving the users experience and facilitate maintenance.
products <- mt_products()
products %>%
mutate(frequency = gsub("-", " ", frequency)) %>%
mutate(frequency = gsub("Day", "day", frequency)) %>%
mutate(frequency = gsub("Daily", "1 day", frequency)) %>%
mutate(frequency = gsub("Yearly", "1 year", frequency)) %>%
mutate(freq = duration(frequency)) %>%
select(freq, frequency)
#> freq frequency
#> 1 86400s (~1 days) 1 day
#> 2 31557600s (~1 years) 1 year
#> 3 691200s (~1.14 weeks) 8 day
#> 4 345600s (~4 days) 4 day
#> 5 691200s (~1.14 weeks) 8 day
#> 6 691200s (~1.14 weeks) 8 day
#> 7 1382400s (~2.29 weeks) 16 day
#> 8 691200s (~1.14 weeks) 8 day
#> 9 691200s (~1.14 weeks) 8 day
#> 10 691200s (~1.14 weeks) 8 day
|
Hi @etiennebr, thank you for reviewing the package and the contributions to improve the package. I've been chipping away on implementing some of your comments but I was wondering if I could get some more information on some of your remarks. In particular, I was wondering how to restructure the data. You mentioned that, indeed, I can reformat things into a tidy data frame (with a duplication cost for some data, but reducing overall complexity of the data structure - a fair trade-off given that space is probably cheaper than cpu cycles). However, this might be at odds with your request for a true bounding box using the ll-tr offsets as an sf polygon. The only other option in this context would be to formulate a function which translates the location data in the tidy data frame into a an sf polygon. Does this sound like a good middle ground - balancing simplicity and compatibility? |
You've probably been thinking about this problem longer than me, so you might have a better solution or ran into issues I haven't seen with my proposal. Here's one way to do it. (I'm starting from the vignette) mt_tidy <- function(x) {
as_tibble(x$header) %>%
mutate(data = list(x$data))
}
mt_tidy(subset)
#> # A tibble: 1 x 16
#> xllcorner yllcorner cellsize nrows ncols band units scale latitude longitude site product
#> <chr> <chr> <chr> <int> <int> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> # ... with 4 more variables: start <chr>, end <chr>, complete <lgl>, data <list>
mt_tidy(subset) %>% unnest()
#> # A tibble: 20 x 22
#> xllcorner yllcorner cellsize nrows ncols band units scale latitude longitude site product
#> <chr> <chr> <chr> <int> <int> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
#> 1 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 2 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 3 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 4 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 5 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 6 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 7 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 8 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 9 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 10 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 11 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 12 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 13 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 14 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 15 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 16 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 17 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 18 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 19 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> 20 -9370036… 4447802.… 926.625… 1 1 LST_… Kelv… 0.02 40 -110 test… MOD11A2
#> # ... with 10 more variables: start <chr>, end <chr>, complete <lgl>, modis_date <chr>,
# calendar_date <chr>, band1 <chr>, tile <chr>, proc_date <chr>, pixel <chr>, data <int> It doesn't prevent an I hope this helps, but don't hesitate if that's not the case. As I said above, maybe I'm overlooking some limitations. |
Thanks @etiennebr for the code contribution! Sorry for my slow response, I've a grant deadline coming up and at night I have a connection which is basically enough for an email but not much more. From reading through your post (and code) I think this is a good solution. It makes things easier for sure, and indeed provides the sf mapping compatibility. I'm on the slow connection, so I'll try to give things a go tomorrow. |
Hi @etiennebr, Below the corrections I made to the package. Major changes include:
Other comments are below, let me know how you feel about this implementation.
I've implemented the conversion of the standard output to a more readable format. I've not included the conversion to seconds as this meta-data is rarely used in analysis and implicit when downloading the time series (which is always continuous, no individual dates can be queried).
This is indeed a valuable addition, if not for visualization purposes alone. I included adapted code you supplied (thanks again for this). I do use a base R implementation in order to limit direct dependencies.
See the above. This is implemented through two functions. One to convert the xll / yll corners to lat / lon, subsequently there is function to convert this coordinate, together with other meta-data to an sf polygon. This data is not tagged onto the original data frame to limit complexity and ease of use (saving data as a flat file to disk).
Fixed the typo. Thanks for spotting this.
I implemented
In the end I decided to do away with the nested structure. The output is now a plain data frame which can easily be save as a flat file to disk. Consequently the
I fixed the
I updated the testing routines, removing the
I let the errors surface, without custom error messages. I find think that depending on the circumstances these messages can be good for advanced users, but often confuse beginners.
Used missing() throughout, converting other packages slowly as well. Thanks for pointing me toward this function.
This is mostly a documentation issue I think. The input can either be a data frame, or a csv file using a particular structure (which indeed is not mentioned). I'll hold of on
This is a limitation of the API on the ORNL side not the package itself. The use case is also rare. Given the relatively small datasets the overhead is also minor to extract certain dates afterward.
This is covered. |
I'm sorry for not getting back earlier on this. @karthik what are your expectations for the next steps? I have several engagements in the next few weeks, and I would like to devote enough time to appropriately review @khufkens's work. I think I can do it before the end of the year, if that sounds right for both of you. |
@etiennebr Thanks for the thoughtful response. I think the next steps would be to see if @khufkens has adequately addressed the issues that you've raised. Otherwise you can point out the ones that remain unresolved. If it's easier, you can also open issues on his repo and tag |
@pmnatural Have you had a chance to complete your review yet? Please update on your progress. |
@etiennebr I've fixed both the cell size issue and the error statements (both in mt_dates and mt_subset). I also included the reference to the sinusoidal projection both on the LP DAAC and wikipedia. Both explain the projection and the mention the required parameters. Build is still clean, all should be good. Thanks for all the feedback! |
Good job @khufkens! @karthik I would recommend approving this package; @khufkens adressed the issues I raised. I think there is still room for a higher test coverage rate to help the package in the long run, and |
Thanks for the vote of confidence @etiennebr. I acknowledge that New features can be added incrementally without breaking things. I think with the feedback the package is also better structured which makes all the difference when moving forward. |
Thank you for your review and recommendation @etiennebr! I'll need a few more days to look everything over but I will make a decision early next week and advise next steps for @khufkens |
Thanks. Take your time, no rush. Previous version is still up and things only got better. |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
Documentation
Functionality
Review Comments
|
Hi @pmnatural, Below my response and references to the code changes.
I'm not sure which exact version you reviewed but both the vignette and DESCRIPTION file now reference to the ORNL API. Although the focus is on point based extractions the querying of regional data prevents me to explicitly state that it is a "point based" approach (the upper limit is 200 x 200 km a sizable footprint, roughly covering Belgium or the state of Massachussets). Statement can be read here:
https://khufkens.github.io/MODISTools/reference/mt_subset.html By default kilometers are now rounded to prevent errors on this case.
This has currently been implemented, highlighting the steps needed to gather all required information.
On the suggestion of a previous reviewer a switched from a nested listed to a tidy dataframe. Given current limited data size restrictions and general small datasets I don't see this as an issue. Tidy data allows for quick plotting data across bands, locations, pixels, without looping over complex list structures. It also facilitates concatenating multiple files / bands without the need for nested lists. I would argue that convenience trumps data duplication in this case (as often with long rather than wide or nested formats). Some of the power of this tidy data is provided in the vignette and the ease with which to generate spatial or other subsets or aggregated analysis. More recent features also include functionality to convert the tidy dataframe to raster stacks. I refer to the vignette to describe this feature and the teaching material by colleague Prof. dr. Jan Verbesselt in his time series analysis course. A shorter version is provided in my github repo. Both are too detailed to serve as general examples.
Support for both data frames and CSV files is already in place, as mentioned in the documentation. An example is also included in the documentation and tested for in unit checks.
A new analysis is provided in the vignette using a spatial analysis around the Arcachon Bay in south-west France (random example of a rather varied landscape on a small area). I hope this addresses all the issues raised! Kind regards, |
Final approval (post-review)
Estimated hours spent reviewing: |
Congrats @khufkens , your submission has been approved! 🎉 Thank you for submitting and @etiennebr and @pmnatural for thorough and timely reviews. To-dos:
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing. |
Congratulations @khufkens 🎉. We would indeed love to have a post about MODISTools. Guidelines for submitting a blog post or tech note are here: https://github.com/ropensci/roweb2#contributing-a-blog-post. We currently have publication slots available June 11 or later. Happy to answer any questions |
Good job @khufkens! 🎉 |
I can be 100% flexible on publication date, since most important is for you to get what you want out of it. Short, or substantial is your choice based on the message you want to convey. Do you want to propose a date to submit a draft? |
Hi @stefaniebutland, can we peg it on mid June then? This should give me some time to frame things properly with some graphs etc. |
Should be fixed now @khufkens |
Thanks! |
Hi @karthik I was notified by someone that the website link of the package is broken. I forgot to change the link to an ropensci account one. It currently reads (which leads nowhere): but should be Is there someone at your end who can update this so that people can access examples etc? |
Summary
Programmatic interface to the 'MODIS Land Products Subsets' web services (https://modis.ornl.gov/data/modis_webservice.html). Allows for easy downloads of 'MODIS' time series directly to your R workspace or your computer.
https://github.com/khufkens/MODISTools
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
Who is the target audience and what are scientific applications of this package?
MODIS data subsets have a wide range of applications tracking state changes of the environment which inform among others ecological and hydrological models. For a an in depth rational I refer to Tuck et al. 2014 (https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.1273).
yours differ or meet our criteria for best-in-category?
Not to my knowledge. It is the only community contribution listed at the bottom of the ORNL page. Although tiled data processing packages exist, this package has a particular focus on subsets of these tiles (from a single pixel to a maximum extend of 200x200km window), mostly for rapid model development and localized environmental assessment. Tiled data download packages focus on wall to wall remote sensing image coverage, often after using subsets for model development.
None made.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
[Note: already on CRAN]
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.This package is a refactored version of the original MODISTools package by Tuck et al. 2014. In order to limit fragmentation I kept this name although the functionality changed (due in part to changes on the backend API). In order to respect the original author's contribution I would like to keep this reference in place. I also asked approval of both first and second author of the original package to do so. However, they did not want to be involved in new development due to time constraints.
Detail
Does
R CMD check
(ordevtools::check()
) succeed?Does the package conform to rOpenSci packaging guidelines?
Yes, but might have missed something.
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Please contact the previous package author for review:
@seantuck12
The text was updated successfully, but these errors were encountered: