Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework timeseries creation to contain species with observation after 1950 #199

Closed
SanderDevisscher opened this issue Jul 1, 2024 · 9 comments · Fixed by #202
Closed
Assignees
Labels
automated workflow enhancement New feature or request

Comments

@SanderDevisscher
Copy link
Collaborator

Currently species introduced prior to 1950 are excluded from the GAM timeseries this should be reworked to match option 2 of inbo/alien-species-portal#74 (comment). ea species should be included in the timeseries if they have at least 1 observation after 1950 independent of their year of introduction. This means the logic discribed here should be reworked.

year_of_introduction <- 1950

@soriadelva
Copy link
Contributor

soriadelva commented Jul 3, 2024

I've altered the code but @SanderDevisscher could you run the upload of the timeseries to the S3-bucket because I don't get permission to do that (still the same error as before)? Also I changed the name of one of the output files in the data/output folder from taxa_introduced_in_BE_before_1950.tsv to taxa_last_observed_in_BE_before_1950.tsv, but I'm not sure whether this may be throwing errors downstream. According to #135 this file has to be placed in the interim folder so it may be fine.

@mvarewyck
Copy link
Collaborator

We don't use that file taxa_introduced_in_BE_before_1950.tsv, so for me renaming is fine.

@SanderDevisscher
Copy link
Collaborator Author

SanderDevisscher commented Jul 4, 2024

We don't use that file taxa_introduced_in_BE_before_1950.tsv, so for me renaming is fine.

=> #135 remains TRUE

@SanderDevisscher
Copy link
Collaborator Author

I've altered the code but @SanderDevisscher could you run the upload of the timeseries to the S3-bucket because I don't get permission to do that (still the same error as before)? Also I changed the name of one of the output files in the data/output folder from taxa_introduced_in_BE_before_1950.tsv to taxa_last_observed_in_BE_before_1950.tsv, but I'm not sure whether this may be throwing errors downstream. According to #135 this file has to be placed in the interim folder so it may be fine.

@soriadelva you can allways trigger the workflow from a branch, see https://github.com/inbo/aspbo/actions/workflows/update_indicators_preprocessing.yaml. As long as you run it from uat afterwards.

@soriadelva
Copy link
Contributor

I've altered the code but @SanderDevisscher could you run the upload of the timeseries to the S3-bucket because I don't get permission to do that (still the same error as before)? Also I changed the name of one of the output files in the data/output folder from taxa_introduced_in_BE_before_1950.tsv to taxa_last_observed_in_BE_before_1950.tsv, but I'm not sure whether this may be throwing errors downstream. According to #135 this file has to be placed in the interim folder so it may be fine.

@soriadelva you can allways trigger the workflow from a branch, see https://github.com/inbo/aspbo/actions/workflows/update_indicators_preprocessing.yaml. As long as you run it from uat afterwards.

@SanderDevisscher ik krijg die workflow precies niet te zien als ik onder actions kijk 🤔

@soriadelva
Copy link
Contributor

This Github Actions keeps on throwing errors and it seems to go wrong after the timeseries (df_ts) is uploaded to the S3 bucket:
action_cancellation This step cannot be executed:
alienSpecies::createTimeseries( shapeData = alienSpecies::loadShapeData("grid.RData")$utm1_bel_with_regions, bucket = UAT_bucket)
The changes I did on this branch cause the timeseries file to be much larger than before (51 397 730 rows instead of 16 601 399). Could it be that this file is too heavy and causes the workflow to get cancelled?

@soriadelva
Copy link
Contributor

@mvarewyck I did a test this morning, rewriting the createTimeseries function (within the 05_occurrence_indicators... .Rmd , not the original function in the alien-species-portal repo) so it would merge the df_ts data with shapeData in chunks of rows instead of all at once and this fixed the issue. Could you rewrite the original function in the way you'd prefer so that it could efficiently do this merge for a large file?

@mvarewyck
Copy link
Collaborator

@mvarewyck I did a test this morning, rewriting the createTimeseries function (within the 05_occurrence_indicators... .Rmd , not the original function in the alien-species-portal repo) so it would merge the df_ts data with shapeData in chunks of rows instead of all at once and this fixed the issue. Could you rewrite the original function in the way you'd prefer so that it could efficiently do this merge for a large file?

@soriadelva I've implemented a fix for efficient joining in the alienspecies v1.0.0 which works for me locally. Can you check whether it works on your side? Otherwise can you send me the code to do it chunk-wise as I don't know how large the chunks can be to make it work.

@soriadelva
Copy link
Contributor

@mvarewyck I did a test this morning, rewriting the createTimeseries function (within the 05_occurrence_indicators... .Rmd , not the original function in the alien-species-portal repo) so it would merge the df_ts data with shapeData in chunks of rows instead of all at once and this fixed the issue. Could you rewrite the original function in the way you'd prefer so that it could efficiently do this merge for a large file?

@soriadelva I've implemented a fix for efficient joining in the alienspecies v1.0.0 which works for me locally. Can you check whether it works on your side? Otherwise can you send me the code to do it chunk-wise as I don't know how large the chunks can be to make it work.

@mvarewyck I tested your fix and it works, thanks a lot! I'll rewrite the code in the 05_occurrence_indicators... .Rmd so it calls upon this function again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated workflow enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants