Skip to content

Code for predicting probabilities of threat for Data Deficient species of the IUCN Red List of Threatened Species

Notifications You must be signed in to change notification settings

jannebor/dd_forecast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extinction Risk of Data Deficient Species

Numerous species of the IUCN Red List of Threatened Species are classified as Data Deficient. This code was used to predict probabilities of being threatened by extinction for Data Deficient species containing range map data available from the IUCN spatial data download. The classifier can be applied for individual species using our web application (alpha version).

Number of threatened DD species

Predictor data

Note: The following datasets need to be downloaded individually from third-party sources for reproducing the study, otherwise skip to Model preparation:

Scripts for data pre-processing, e.g., calculating land-use fractions, etc., and stacking all spatial layers are stored in workflow/preparation/raster_preparation and need to be adjusted individually.

The underlying function for retrieving predictor data from tables, web sources (i.e., IUCN, GBIF & OBIS), and the above downloaded spatial datasets for single species is workflow/preparation/data_extraction.R. We applied this function for entire spatial datasets in workflow/preparation/data_extraction_batch.R. The resulting full dataframe (df_ml_v2) is stored as R object in dataframes/full_data.

Model preparation

Full reproducibility (based on code only) is given from this point onwards:

Training (75%) and testing (25%) data was prepared (workflow/preparation/model_prep.R) for each partition (partition 1: all species, partition 2: marine & non-marine species separately) and stored as R objects in dataframes/Partition 1 and dataframes/Partition 2. For each of the partition-specific dataframes features were selected (workflow/preparation/feature_selection.R) using the Boruta algorithm (Kursa & Rudnicki 2010). Only relevant features were considered during model building.

Model building

In total 510 models were fitted using AutoML in H2O. 222 models were fitted using all species (workflow/training/model_partition 1.R), 134 using only marine species and 154 using only non-marine species (workflow/training/model_partition 2.R). All models were calibrated using 10-fold cross-validation, and ranked in terms of AUC based on the set aside testing data (25%), e.g. for partition 1:

model_id auc logloss aucpr mean_per_class_error rmse mse
StackedEnsemble_AllModels_3_AutoML_1 0.912 0.314 0.795 0.174 0.311 0.097
StackedEnsemble_AllModels_6_AutoML_1 0.912 0.315 0.795 0.175 0.311 0.097
StackedEnsemble_AllModels_4_AutoML_1 0.912 0.315 0.795 0.175 0.311 0.097
StackedEnsemble_AllModels_5_AutoML_1 0.910 0.318 0.791 0.176 0.313 0.098
StackedEnsemble_BestOfFamily_4_AutoML_1 0.909 0.318 0.793 0.184 0.313 0.098

Model evaluation

Performance metrics were calculated based on the testing data (workflow/evaluation/model_performance.R) and based on reclassified Data Deficient species (workflow/evaluation/dd_performance.R). Permutation variable importance was calculated by measuring performance loss before and after a feature was permuted (workflow/evaluation/variable_importance.R).

Permutation variable importance

Predictions

The generated predictions for Data Deficient species are stored in dd_predictions.csv and show the probability of being threatened by extinction for each species:

Species Last Assessed Taxonomic class Red List Category Probability of being threatened
Chirostoma grandocule 2018 Actinopterygii Data Deficient 95.8%
Sarcohyla miahuatlanensis 2019 Amphibia Data Deficient 95.8%
Crossodactylus dantei 2008 Amphibia Data Deficient 95.4%
Nyctibatrachus sholai 2008 Amphibia Data Deficient 95.2%
Colostethus alacris 2016 Amphibia Data Deficient 95.2%