addDCElement is too slow #18

mpadge · 2022-05-24T16:07:25Z

I've got code that loops over a list of metadata terms which can't be know in advance and does something like this, acting on a DCEntry object called dcmi:

    for (i in seq (nrow (terms))) {
        dc_fn <- # code to match `addDC...` fn from `atom4R`
        value <- data [[terms$value [i]]]
        do.call (dcmi [[dc_fn]], list (value))
    }

The problem is this is the slowest bit of my code by far, because all of the addDCElement() calls take so long. This in turn seems to be because of these lines:

atom4R/R/AtomAbstractObject.R

Lines 902 to 924 in 6a76b07

    
           list_of_classes <- list_of_classes[sapply(list_of_classes, function(x){ 
        
             clazz <- try(eval(parse(text=x)),silent=TRUE) 
        
             if(is(clazz, "try-error")) clazz <- try(eval(parse(text=paste0("atomR::",x))),silent=TRUE) 
        
             r6Predicate <- class(clazz)[1]=="R6ClassGenerator" 
        
             if(!r6Predicate) return(FALSE) 
        
             atomObjPredicate <- FALSE 
        
             superclazz <- clazz 
        
             while(!atomObjPredicate && !is.null(superclazz)){ 
        
               clazz_fields <- names(superclazz) 
        
               if(!is.null(clazz_fields)) if(length(clazz_fields)>0){ 
        
                 if("get_inherit" %in% clazz_fields){ 
        
                   superclazz <- superclazz$get_inherit() 
        
                   atom4RPredicate <- FALSE 
        
                   if("parent_env" %in% clazz_fields) atom4RPredicate <- environmentName(superclazz$parent_env)=="atom4R" 
        
                   atomObjPredicate <- superclazz$classname == classname && atom4RPredicate 
        
                 }else{ 
        
                   break 
        
                 } 
        
               } 
        
             } 
        
             return(atomObjPredicate) 
        
           })]

For a single term, that can end up looping around 300 times, each time doing a full eval(parse()) call. Would it not be possible to restructure that entire call as its own reference function that did not accept classname as a dynamic parameter, rather did the single trawl over everything and dumped c(classname, atom4RPredicate) to create a single reference table. That call could easily be memoised for instant recall, and then the list_of_classes for a single classname could also be instantly extracted. Given my estimates for a few terms of 300 or so loops of that funciton times unknown numbers of terms going into my code like the above, it should be possible to achieve a speed-up here of O(1000).

The text was updated successfully, but these errors were encountered:

eblondel · 2022-05-25T11:44:36Z

Yes, this is well known issue that I have to look into very soon. I had already improved this in similar package methods (eg geometa) and I have to see if this can be applied here. I didn't know about 'memoise' and I will look into it, thanks for that. Also something you have to know is that the 'add' methods check if there is alread a value, and will return FALSE if already available, but since some elements consist in complex objects (and not simple string values) the comparison is done by comparing the out xml.

An immediate approach, faster, that is actually implemented in the decode function, is not to use the util adders (and then much less convenient), but directly set the fields with the appropriate elements. Example:

dc <- DCEntry$new()
dc[["contributor"]] <- lapply(c("Bob", "Peter", "Bunny"), function(x){DCContributor$new( value = x)})

On this basis, I will support new setters as talked in #14

mpadge · 2022-05-25T12:17:40Z

Great stuff - thanks!

eblondel · 2022-05-30T07:39:25Z

@mpadge I've slightly improved the code, first by fetching DC rdf vocabulary by default, (previously done at each DC element creation), secondly by simplifying each DC element class by aligning with the R6 class writing convention used in the package, for what concern native DC elements (for which we don't really need to look into all class inheritance). Together with that, i've provided setters for lists of elements . See #14

eblondel mentioned this issue May 26, 2022

Fetch vocabulary data by default #19

Closed

eblondel self-assigned this May 30, 2022

eblondel added enhancement New feature or request user support labels May 30, 2022

eblondel added this to the 0.3 milestone May 30, 2022

eblondel added a commit that referenced this issue May 30, 2022

fix #14 #18

2eddbd1

eblondel closed this as completed Jun 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

addDCElement is too slow #18

addDCElement is too slow #18

mpadge commented May 24, 2022

eblondel commented May 25, 2022 •

edited

Loading

mpadge commented May 25, 2022

eblondel commented May 30, 2022

addDCElement is too slow #18

addDCElement is too slow #18

Comments

mpadge commented May 24, 2022

eblondel commented May 25, 2022 • edited Loading

mpadge commented May 25, 2022

eblondel commented May 30, 2022

eblondel commented May 25, 2022 •

edited

Loading