incorrect taxonomy from "POW" functions #932

msedaghatpour · 2024-05-23T20:01:49Z

Hello -- while using pow_lookup() and get_pow() I noticed that the taxonomy I received in my outputs are incorrect.
For example:

My full list of species (3500) all returned "clazz" as "Equisetopsida"

I believe Magnoliidae is also incorrectly designated as subclass for a number of species.

Here is my input data:
finalfinal_checklist_2024March03.xlsx

Here is my script:

read in datarame

plant_data <- read.xlsx("~/Desktop/update_flora_final/output/2024March03/finalfinal_checklist_2024March03.xlsx")

Function to save intermediate results

save_checkpoint <- function(obj, filename) {
saveRDS(obj, file = filename)
}

Function to load intermediate results

load_checkpoint <- function(filename) {
if (file.exists(filename)) {
return(readRDS(filename))
} else {
return(NULL)
}
}
##########

Load the previous checkpoint if it exists

powoID <- load_checkpoint("powoID_checkpoint.rds")

Initialize powoID if it is NULL (no checkpoint found)

if (is.null(powoID)) {
powoID <- list()
}

Determine the starting point

start_idx <- length(powoID) + 1

Iterate through the data$family vector

for (i in start_idx:length(plant_data$family)) {
family_name <- plant_data$family[i]

Retrieve POW ID for the current family name

powoID[[i]] <- get_pow(sci_com = family_name)

Save the intermediate results after each iteration

save_checkpoint(powoID, "powoID_checkpoint.rds")
}

with this i can hit stop at any time and the results will save where i

stopped and go back to row 43 (load_checkpoint), it will pick up where it left off

Optionally, combine the results into a single data frame if needed

powoID_df <- do.call(rbind, powoID)

taxonomy <- vector(length = length(powoID)) # Create empty vector to store orders
for (i in 1:length(powoID)) { # start for loop
taxonomy[i] <- pow_lookup(powoID[i]) # Call pow_lookup for each PoWO ID and store order
}

Extract data from each taxonomy sub-list

extracted_data <- lapply(taxonomy, function(x) {
c(family = x$family, order = x$order, class = x$clazz, subclass = x$subclass, phylum = x$phylum, taxonomicStatus = x$taxonomicStatus)
})

Check if all sub-lists have the same structure (optional)

if (!all.equal(lengths(extracted_data), sapply(extracted_data, length))) {
warning("Sub-lists in 'output' might have different structures. Extraction might be incomplete.")
}

Append extracted data to the original data frame (assuming rownames match)

data_output <- cbind(plant_data, do.call(rbind, extracted_data))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect taxonomy from "POW" functions #932

incorrect taxonomy from "POW" functions #932

msedaghatpour commented May 23, 2024

incorrect taxonomy from "POW" functions #932

incorrect taxonomy from "POW" functions #932

Comments

msedaghatpour commented May 23, 2024

read in datarame

Function to save intermediate results

Function to load intermediate results

Load the previous checkpoint if it exists

Initialize powoID if it is NULL (no checkpoint found)

Determine the starting point

Iterate through the data$family vector

Retrieve POW ID for the current family name

Save the intermediate results after each iteration

with this i can hit stop at any time and the results will save where i

stopped and go back to row 43 (load_checkpoint), it will pick up where it left off

Optionally, combine the results into a single data frame if needed

Extract data from each taxonomy sub-list

Check if all sub-lists have the same structure (optional)

Append extracted data to the original data frame (assuming rownames match)