Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another take at multiweight tabbook #513

Closed
wants to merge 34 commits into from
Closed

Conversation

gergness
Copy link
Contributor

@gergness gergness commented Oct 8, 2020

Another take at multiweight tabbooks.

I think the arguments and return value of tabBook() is compatible with your PR, though I have substantially changed (and renamed) the weight helper function.

There is a continuum of differences from just stylistic preference to actual improvements. I find it hard to concretely define them, but I'll try to focus on some changes that I made deliberately that are closer to the "improvement" side.

  1. If we are going to export the weight helper function (which I think is a good idea), then it needs to be understandable on its own. Renaming the ind and index columns was a start, but there's still a keep column that doesn't mean that some rows are dropped, an order column that only matters if you know what the individual tabbooks look like, and a position column that seems like it might also mean the same thing as order, but doesn't. In the new implementation, the data.frame is two columns weight and alias (and the order of the data.frame determines the order).

This is important for two reasons, because it's clearer to whoever has to work on this part of the codebase next, and also because, as I was reminded today "there are downstream dependencies here" (even for features that haven't yet passed code review). By including such implementation details in the public interface, it locks us into them, making it harder to maintain the code and preserve backwards compatibility when things change. As an example, what if the multitable export server side API changes to allow you to specify multiple weights in a single export. Then the order column is totally meaningless, but to help my downstream dependencies, I would need to keep it.

I also decided to be strict that there can only be these two columns, because this allows us to expand in the future. I kind of think there needs to be a "name" column that would allow you to set the name of the cubes (eg "q1 - likely voter", "q1 - registered voters"), but I'm not sure where that name can be stored in the tabbook result and it sounds like you plan to do that for yourself in crunchtabs, so there's no rush. If I allowed arbitrary columns, then using the name column like this would potentially be a breaking change.

  1. All of the work about picking the weights is done in the tabBookWeightSpec() function now because it's much easier and quicker to test that function than the tabBook one. This is why I'm able to achieve (almost) complete test coverage of the new code with just one new fixture. Not only is saving fixtures a pain, but each test of tabBookWeightSpec() is like 10 times faster than the tabbook export, so I'm able to test more edge cases without slowing down testing (and thus development time).

  2. At your suggestion, I made the mock generator script totally reproducible so that it does all of that file moving and renaming that we were having to do manually before. I'm trying to document more things.

@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

Merging #513 into master will increase coverage by 0.04%.
The diff coverage is 95.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #513      +/-   ##
==========================================
+ Coverage   90.60%   90.64%   +0.04%     
==========================================
  Files         126      126              
  Lines        7598     7665      +67     
==========================================
+ Hits         6884     6948      +64     
- Misses        714      717       +3     
Impacted Files Coverage Δ
R/tab-book.R 95.26% <95.83%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cfa146...e1235aa. Read the comment docs.

@gergness gergness requested review from 1beb and malecki and removed request for 1beb October 9, 2020 01:56
R/tab-book.R Show resolved Hide resolved
R/tab-book.R Outdated Show resolved Hide resolved
R/tab-book.R Outdated Show resolved Hide resolved
dev-misc/capture-multi-weight-tabbook-mock.R Show resolved Hide resolved
tests/testthat/test-multitables.R Show resolved Hide resolved
R/tab-book.R Show resolved Hide resolved
@1beb
Copy link
Contributor

1beb commented Oct 19, 2020

Downstream I have a functional branch based off of ed2e1ee. Had to update my fixtures, but otherwise looking good.

@1beb
Copy link
Contributor

1beb commented Nov 27, 2020

Some 500's coming off of body$where in tabBookSingle with the example dataset. It looks like there is a problem with the manner in which jsonlite is preparing the body element. For example, I had to comment out the body$where part and also add the parameter jsonlite::toJSON(body, null = "null") otherwise it was presented as a list instead of a literal null.

Copy link
Contributor

@1beb 1beb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to pull this into crunchtabs with some modifications so that I could deliver to my users. I'm not sure how or if we should bother trying to bring this back into rcrunch given the limited users (Innovations team).

Would be happy to revisit but if this is something you want to integrate please follow the functionality embedded in crunchtabs.

@gergness gergness closed this Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants