Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect taxonomy from "POW" functions #932

Open
msedaghatpour opened this issue May 23, 2024 · 0 comments
Open

incorrect taxonomy from "POW" functions #932

msedaghatpour opened this issue May 23, 2024 · 0 comments

Comments

@msedaghatpour
Copy link

Hello -- while using pow_lookup() and get_pow() I noticed that the taxonomy I received in my outputs are incorrect.
For example:

My full list of species (3500) all returned "clazz" as "Equisetopsida"
Screen Shot 2024-05-23 at 12 48 03 PM

I believe Magnoliidae is also incorrectly designated as subclass for a number of species.

Here is my input data:
finalfinal_checklist_2024March03.xlsx

Here is my script:

read in datarame

plant_data <- read.xlsx("~/Desktop/update_flora_final/output/2024March03/finalfinal_checklist_2024March03.xlsx")

Function to save intermediate results

save_checkpoint <- function(obj, filename) {
saveRDS(obj, file = filename)
}

Function to load intermediate results

load_checkpoint <- function(filename) {
if (file.exists(filename)) {
return(readRDS(filename))
} else {
return(NULL)
}
}
##########

Load the previous checkpoint if it exists

powoID <- load_checkpoint("powoID_checkpoint.rds")

Initialize powoID if it is NULL (no checkpoint found)

if (is.null(powoID)) {
powoID <- list()
}

Determine the starting point

start_idx <- length(powoID) + 1

Iterate through the data$family vector

for (i in start_idx:length(plant_data$family)) {
family_name <- plant_data$family[i]

Retrieve POW ID for the current family name

powoID[[i]] <- get_pow(sci_com = family_name)

Save the intermediate results after each iteration

save_checkpoint(powoID, "powoID_checkpoint.rds")
}

with this i can hit stop at any time and the results will save where i

stopped and go back to row 43 (load_checkpoint), it will pick up where it left off

Optionally, combine the results into a single data frame if needed

powoID_df <- do.call(rbind, powoID)

taxonomy <- vector(length = length(powoID)) # Create empty vector to store orders
for (i in 1:length(powoID)) { # start for loop
taxonomy[i] <- pow_lookup(powoID[i]) # Call pow_lookup for each PoWO ID and store order
}

Extract data from each taxonomy sub-list

extracted_data <- lapply(taxonomy, function(x) {
c(family = x$family, order = x$order, class = x$clazz, subclass = x$subclass, phylum = x$phylum, taxonomicStatus = x$taxonomicStatus)
})

Check if all sub-lists have the same structure (optional)

if (!all.equal(lengths(extracted_data), sapply(extracted_data, length))) {
warning("Sub-lists in 'output' might have different structures. Extraction might be incomplete.")
}

Append extracted data to the original data frame (assuming rownames match)

data_output <- cbind(plant_data, do.call(rbind, extracted_data))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant