You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with this i can hit stop at any time and the results will save where i
stopped and go back to row 43 (load_checkpoint), it will pick up where it left off
Optionally, combine the results into a single data frame if needed
powoID_df <- do.call(rbind, powoID)
taxonomy <- vector(length = length(powoID)) # Create empty vector to store orders
for (i in 1:length(powoID)) { # start for loop
taxonomy[i] <- pow_lookup(powoID[i]) # Call pow_lookup for each PoWO ID and store order
}
Check if all sub-lists have the same structure (optional)
if (!all.equal(lengths(extracted_data), sapply(extracted_data, length))) {
warning("Sub-lists in 'output' might have different structures. Extraction might be incomplete.")
}
Append extracted data to the original data frame (assuming rownames match)
Hello -- while using pow_lookup() and get_pow() I noticed that the taxonomy I received in my outputs are incorrect.
For example:
My full list of species (3500) all returned "clazz" as "Equisetopsida"
I believe Magnoliidae is also incorrectly designated as subclass for a number of species.
Here is my input data:
finalfinal_checklist_2024March03.xlsx
Here is my script:
read in datarame
plant_data <- read.xlsx("~/Desktop/update_flora_final/output/2024March03/finalfinal_checklist_2024March03.xlsx")
Function to save intermediate results
save_checkpoint <- function(obj, filename) {
saveRDS(obj, file = filename)
}
Function to load intermediate results
load_checkpoint <- function(filename) {
if (file.exists(filename)) {
return(readRDS(filename))
} else {
return(NULL)
}
}
##########
Load the previous checkpoint if it exists
powoID <- load_checkpoint("powoID_checkpoint.rds")
Initialize powoID if it is NULL (no checkpoint found)
if (is.null(powoID)) {
powoID <- list()
}
Determine the starting point
start_idx <- length(powoID) + 1
Iterate through the data$family vector
for (i in start_idx:length(plant_data$family)) {
family_name <- plant_data$family[i]
Retrieve POW ID for the current family name
powoID[[i]] <- get_pow(sci_com = family_name)
Save the intermediate results after each iteration
save_checkpoint(powoID, "powoID_checkpoint.rds")
}
with this i can hit stop at any time and the results will save where i
stopped and go back to row 43 (load_checkpoint), it will pick up where it left off
Optionally, combine the results into a single data frame if needed
powoID_df <- do.call(rbind, powoID)
taxonomy <- vector(length = length(powoID)) # Create empty vector to store orders
for (i in 1:length(powoID)) { # start for loop
taxonomy[i] <- pow_lookup(powoID[i]) # Call pow_lookup for each PoWO ID and store order
}
Extract data from each taxonomy sub-list
extracted_data <- lapply(taxonomy, function(x) {
c(family = x$family, order = x$order, class = x$clazz, subclass = x$subclass, phylum = x$phylum, taxonomicStatus = x$taxonomicStatus)
})
Check if all sub-lists have the same structure (optional)
if (!all.equal(lengths(extracted_data), sapply(extracted_data, length))) {
warning("Sub-lists in 'output' might have different structures. Extraction might be incomplete.")
}
Append extracted data to the original data frame (assuming rownames match)
data_output <- cbind(plant_data, do.call(rbind, extracted_data))
The text was updated successfully, but these errors were encountered: