-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another take at multiweight tabbook #513
Conversation
Codecov Report
@@ Coverage Diff @@
## master #513 +/- ##
==========================================
+ Coverage 90.60% 90.64% +0.04%
==========================================
Files 126 126
Lines 7598 7665 +67
==========================================
+ Hits 6884 6948 +64
- Misses 714 717 +3
Continue to review full report at Codecov.
|
Downstream I have a functional branch based off of ed2e1ee. Had to update my fixtures, but otherwise looking good. |
Some 500's coming off of body$where in tabBookSingle with the example dataset. It looks like there is a problem with the manner in which jsonlite is preparing the body element. For example, I had to comment out the body$where part and also add the parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to pull this into crunchtabs with some modifications so that I could deliver to my users. I'm not sure how or if we should bother trying to bring this back into rcrunch
given the limited users (Innovations team).
Would be happy to revisit but if this is something you want to integrate please follow the functionality embedded in crunchtabs.
Another take at multiweight tabbooks.
I think the arguments and return value of
tabBook()
is compatible with your PR, though I have substantially changed (and renamed) the weight helper function.There is a continuum of differences from just stylistic preference to actual improvements. I find it hard to concretely define them, but I'll try to focus on some changes that I made deliberately that are closer to the "improvement" side.
ind
andindex
columns was a start, but there's still a keep column that doesn't mean that some rows are dropped, anorder
column that only matters if you know what the individual tabbooks look like, and aposition
column that seems like it might also mean the same thing asorder
, but doesn't. In the new implementation, the data.frame is two columnsweight
andalias
(and the order of the data.frame determines the order).This is important for two reasons, because it's clearer to whoever has to work on this part of the codebase next, and also because, as I was reminded today "there are downstream dependencies here" (even for features that haven't yet passed code review). By including such implementation details in the public interface, it locks us into them, making it harder to maintain the code and preserve backwards compatibility when things change. As an example, what if the multitable export server side API changes to allow you to specify multiple weights in a single export. Then the
order
column is totally meaningless, but to help my downstream dependencies, I would need to keep it.I also decided to be strict that there can only be these two columns, because this allows us to expand in the future. I kind of think there needs to be a "name" column that would allow you to set the name of the cubes (eg "q1 - likely voter", "q1 - registered voters"), but I'm not sure where that name can be stored in the tabbook result and it sounds like you plan to do that for yourself in crunchtabs, so there's no rush. If I allowed arbitrary columns, then using the name column like this would potentially be a breaking change.
All of the work about picking the weights is done in the
tabBookWeightSpec()
function now because it's much easier and quicker to test that function than thetabBook
one. This is why I'm able to achieve (almost) complete test coverage of the new code with just one new fixture. Not only is saving fixtures a pain, but each test oftabBookWeightSpec()
is like 10 times faster than the tabbook export, so I'm able to test more edge cases without slowing down testing (and thus development time).At your suggestion, I made the mock generator script totally reproducible so that it does all of that file moving and renaming that we were having to do manually before. I'm trying to document more things.