-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace S3 dispatch by C implementation #400
Conversation
@hadley It would be great to get your opinion of whether this approach is fine for you. It kind of works against the dispatch system. But I'm also not quite sure whether the dispatch is really needed as there are only three classes (<xml_missing>, <xml_node> and <xml_nodeset>) and I don't think the user really needs to extend them. The performance gains are quite big (around a factor of 20), which would be nice for e.g. the {paws} package (a package to work with AWS). Overall, the code is pretty redundant so we could also use a macro or some other CPP techniques but I'm not so familiar with C/CPP to know about the potential downsides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think switching from S3 dispatch to something custom at the C-level is fine, although as I noted I think providing a helper that makes it possible to use switch
will make the implementation cleaner (and a little easier to extend in the unlikely event that is necessary).
I'm not too worried about the duplication, although the idiosyncratic mix of C and C++ is going to make maintenance generally harder. Possibly might be better to use cpp11 and some of the vector helpers that it provides.
tests/testthat/test-xml_name.R
Outdated
@@ -1,3 +1,13 @@ | |||
test_that("xml_name() returns the name", { | |||
x <- sample_nodeset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be better to copy and paste the contents of sample_nodeset()
here; otherwise it's hard to verify that the test is correct.
xml_name.xml_nodeset()
in Cxml_name()
, xml_text()
and xml_type()
in C
xml_name()
, xml_text()
and xml_type()
in CXPtrNode node(node_sxp); | ||
|
||
std::string name = nodeName(node.checked_get(), nsMap); | ||
out = Rf_mkCharLenCE(name.c_str(), name.size(), CE_UTF8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this has always been the case, but if we have an R error anywhere in here, then it will skip over any C++ destructors (like for XPtrNode, if it has one, and std::string
) on the way out
I don't think there is a reason to try and improve this in this PR, but I will keep an eye out for potentially problematic cases as I continue my review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have an odd mix of C and C++ in this package which we should try and fix at some point.
src/xml2_node.cpp
Outdated
if (n == 0) { | ||
return Rf_ScalarInteger(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything is fine if you remove this. Generally i try and reserve early exits for cases where they are absolutely necessary (like if your algorithm requires >=2 elements or something, like most sorting algos)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I see what you are doing here. This seems odd. If the nodeset is length 0, shouldn't the output be length 0? i.e. integer()
? In which case you could definitely just remove this then it would do the right thing.
Maybe this is some old behavior I don't know about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed this and found it quite weird which is why I opened an issue #404. I wanted to tackle that in a separate PR, forgot to mention this here.
This breaks fhircracker setClass(
Class = "fhir_bundle_xml",
contains = c("fhir_resource_xml", "fhir_bundle"),
slots = c(next_link = "fhir_url", self_link = "fhir_url"),
prototype = prototype(xml2::read_xml(x = "<Bundle></Bundle>"))
) |
I'm guessing this is because while Yeah — looks like it https://github.com/wch/r-source/blob/d6dad605b05810cb43f991d292169a2cd436a818/src/include/Rinlinedfuns.h#L774. |
Closes #386.
Created on 2023-08-23 with reprex v2.0.2
More benchmarks
Created on 2023-08-30 with reprex v2.0.2