You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've got code that loops over a list of metadata terms which can't be know in advance and does something like this, acting on a DCEntry object called dcmi:
for (iin seq (nrow (terms))) {
dc_fn<-# code to match `addDC...` fn from `atom4R`value<-data [[terms$value [i]]]
do.call (dcmi [[dc_fn]], list (value))
}
The problem is this is the slowest bit of my code by far, because all of the addDCElement() calls take so long. This in turn seems to be because of these lines:
For a single term, that can end up looping around 300 times, each time doing a full eval(parse()) call. Would it not be possible to restructure that entire call as its own reference function that did not accept classname as a dynamic parameter, rather did the single trawl over everything and dumped c(classname, atom4RPredicate) to create a single reference table. That call could easily be memoised for instant recall, and then the list_of_classes for a single classname could also be instantly extracted. Given my estimates for a few terms of 300 or so loops of that funciton times unknown numbers of terms going into my code like the above, it should be possible to achieve a speed-up here of O(1000).
The text was updated successfully, but these errors were encountered:
Yes, this is well known issue that I have to look into very soon. I had already improved this in similar package methods (eg geometa) and I have to see if this can be applied here. I didn't know about 'memoise' and I will look into it, thanks for that. Also something you have to know is that the 'add' methods check if there is alread a value, and will return FALSE if already available, but since some elements consist in complex objects (and not simple string values) the comparison is done by comparing the out xml.
An immediate approach, faster, that is actually implemented in the decode function, is not to use the util adders (and then much less convenient), but directly set the fields with the appropriate elements. Example:
@mpadge I've slightly improved the code, first by fetching DC rdf vocabulary by default, (previously done at each DC element creation), secondly by simplifying each DC element class by aligning with the R6 class writing convention used in the package, for what concern native DC elements (for which we don't really need to look into all class inheritance). Together with that, i've provided setters for lists of elements . See #14
I've got code that loops over a list of metadata terms which can't be know in advance and does something like this, acting on a
DCEntry
object calleddcmi
:The problem is this is the slowest bit of my code by far, because all of the
addDCElement()
calls take so long. This in turn seems to be because of these lines:atom4R/R/AtomAbstractObject.R
Lines 902 to 924 in 6a76b07
For a single term, that can end up looping around 300 times, each time doing a full
eval(parse())
call. Would it not be possible to restructure that entire call as its own reference function that did not acceptclassname
as a dynamic parameter, rather did the single trawl over everything and dumpedc(classname, atom4RPredicate)
to create a single reference table. That call could easily bememoise
d for instant recall, and then thelist_of_classes
for a singleclassname
could also be instantly extracted. Given my estimates for a few terms of 300 or so loops of that funciton times unknown numbers of terms going into my code like the above, it should be possible to achieve a speed-up here of O(1000).The text was updated successfully, but these errors were encountered: