Another take at multiweight tabbook #513

gergness · 2020-10-08T20:03:41Z

Another take at multiweight tabbooks.

I think the arguments and return value of tabBook() is compatible with your PR, though I have substantially changed (and renamed) the weight helper function.

There is a continuum of differences from just stylistic preference to actual improvements. I find it hard to concretely define them, but I'll try to focus on some changes that I made deliberately that are closer to the "improvement" side.

If we are going to export the weight helper function (which I think is a good idea), then it needs to be understandable on its own. Renaming the ind and index columns was a start, but there's still a keep column that doesn't mean that some rows are dropped, an order column that only matters if you know what the individual tabbooks look like, and a position column that seems like it might also mean the same thing as order, but doesn't. In the new implementation, the data.frame is two columns weight and alias (and the order of the data.frame determines the order).

This is important for two reasons, because it's clearer to whoever has to work on this part of the codebase next, and also because, as I was reminded today "there are downstream dependencies here" (even for features that haven't yet passed code review). By including such implementation details in the public interface, it locks us into them, making it harder to maintain the code and preserve backwards compatibility when things change. As an example, what if the multitable export server side API changes to allow you to specify multiple weights in a single export. Then the order column is totally meaningless, but to help my downstream dependencies, I would need to keep it.

I also decided to be strict that there can only be these two columns, because this allows us to expand in the future. I kind of think there needs to be a "name" column that would allow you to set the name of the cubes (eg "q1 - likely voter", "q1 - registered voters"), but I'm not sure where that name can be stored in the tabbook result and it sounds like you plan to do that for yourself in crunchtabs, so there's no rush. If I allowed arbitrary columns, then using the name column like this would potentially be a breaking change.

All of the work about picking the weights is done in the tabBookWeightSpec() function now because it's much easier and quicker to test that function than the tabBook one. This is why I'm able to achieve (almost) complete test coverage of the new code with just one new fixture. Not only is saving fixtures a pain, but each test of tabBookWeightSpec() is like 10 times faster than the tabbook export, so I'm able to test more edge cases without slowing down testing (and thus development time).
At your suggestion, I made the mock generator script totally reproducible so that it does all of that file moving and renaming that we were having to do manually before. I'm trying to document more things.

…an NA

…cates in tabbook

codecov · 2020-10-09T01:02:25Z

Codecov Report

Merging #513 into master will increase coverage by 0.04%.
The diff coverage is 95.83%.

@@            Coverage Diff             @@
##           master     #513      +/-   ##
==========================================
+ Coverage   90.60%   90.64%   +0.04%     
==========================================
  Files         126      126              
  Lines        7598     7665      +67     
==========================================
+ Hits         6884     6948      +64     
- Misses        714      717       +3

Impacted Files	Coverage Δ
R/tab-book.R	`95.26% <95.83%> (+0.16%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cfa146...e1235aa. Read the comment docs.

R/tab-book.R

dev-misc/capture-multi-weight-tabbook-mock.R

tests/testthat/test-multitables.R

R/tab-book.R

…ights

1beb · 2020-10-19T17:31:17Z

Downstream I have a functional branch based off of ed2e1ee. Had to update my fixtures, but otherwise looking good.

1beb · 2020-11-27T04:00:59Z

Some 500's coming off of body$where in tabBookSingle with the example dataset. It looks like there is a problem with the manner in which jsonlite is preparing the body element. For example, I had to comment out the body$where part and also add the parameter jsonlite::toJSON(body, null = "null") otherwise it was presented as a list instead of a literal null.

1beb

I had to pull this into crunchtabs with some modifications so that I could deliver to my users. I'm not sure how or if we should bother trying to bring this back into rcrunch given the limited users (Innovations team).

Would be happy to revisit but if this is something you want to integrate please follow the functionality embedded in crunchtabs.

gergness added 24 commits October 8, 2020 09:16

oops reoxygen master

6d00cb0

Start work on multi-weight tabbooks

0281105

separate tabbook tests from multitable tests

19eba4e

outdated info in comments

7fc93fa

oops

0a07fa8

test tabBookWeightSpec and some modifications

ff63e47

actually, it's a lot easier if we use "" to mean unweighted rather th…

d34858c

…an NA

split out single and multi weight tabbook and start on multiweight

a4a92be

add script to capture the mocks and move them to the right place

d667949

capture mocks

bf5ac63

some code cleanup found while writing tests

a6c7b26

add tests for the multiweight tabbook

2d04301

work on documentation for complex weights

47d8d50

can now save a complex weighted tabbook to .json file

66e13e0

oops errant character snuck in

b336506

can't use complex weights with excel

d3133ab

oops didn't mean to drop this line

c0cb15a

drop duplicates with a warning in weightSpec helper and fail on dupli…

b66840a

…cates in tabbook

whoa, wasn't thinking there

8624447

small typos

4661ded

Add tests for edge cases

2ec488f

oops

4dc5928

R CMD check fixes

a26469e

oops last typo and lintr

8227b2c

gergness requested review from 1beb and malecki and removed request for 1beb October 9, 2020 01:56

1beb reviewed Oct 9, 2020

View reviewed changes

gergness added 2 commits October 13, 2020 15:25

don't get names when recombining weights

bffb4a1

check weights from full variable catalog when working with complex we…

ed2e1ee

…ights

gergness added 6 commits October 19, 2020 12:42

oops special case for unweighted

13511a3

clean up code

21a59b4

be explicit about how tabbooks are ordered

34ef4ba

allow append_default_wt to be set in tabbook

8ea719f

oops not quite right for passing append_default_wt

8d8e181

explicitly allow data.frames

12da4af

1beb mentioned this pull request Oct 20, 2020

Updates to support multiweight in crunchtabs Crunch-io/crunchtabs#210

Merged

gergness added 2 commits October 20, 2020 11:06

make sure tabbook requests variables in ds order

cf999cc

make sure weight exists

e1235aa

1beb reviewed Feb 11, 2021

View reviewed changes

gergness closed this Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another take at multiweight tabbook #513

Another take at multiweight tabbook #513

gergness commented Oct 8, 2020 •

edited

Loading

codecov bot commented Oct 9, 2020 •

edited

Loading

1beb commented Oct 19, 2020 •

edited

Loading

1beb commented Nov 27, 2020 •

edited

Loading

1beb left a comment

Another take at multiweight tabbook #513

Another take at multiweight tabbook #513

Conversation

gergness commented Oct 8, 2020 • edited Loading

codecov bot commented Oct 9, 2020 • edited Loading

Codecov Report

1beb commented Oct 19, 2020 • edited Loading

1beb commented Nov 27, 2020 • edited Loading

1beb left a comment

Choose a reason for hiding this comment

gergness commented Oct 8, 2020 •

edited

Loading

codecov bot commented Oct 9, 2020 •

edited

Loading

1beb commented Oct 19, 2020 •

edited

Loading

1beb commented Nov 27, 2020 •

edited

Loading