From 97d10e8be213fe5b4fc2ac5e372b8a08c3a763fa Mon Sep 17 00:00:00 2001 From: braffolk Date: Mon, 5 Jun 2023 17:01:34 +0300 Subject: [PATCH] 0.3.0: - Added verbose to Rim class to allow controlled report generation - Added col_filter param to scheme_from_df to allow complex schemas, for example creating grouups per region. --- README.md | 74 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 48 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index 1b1b348e..8af162c3 100644 --- a/README.md +++ b/README.md @@ -2,17 +2,10 @@ Weightipy is a cut down version of [Quantipy3](https://github.com/Quantipy/quantipy3) for weighting people data using the RIM (iterative raking) algorithm. -### Planned features -- Support for multithreaded weighting -- Support for more weighting algorithms -- Rewrite of the API to be less oriented towards how Quantipy worked and more in line with simple weighting needs - -#### Origins -- Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com - -#### Contributors -- Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin Müller : https://yougov.co.uk -- Datasmoothie’s Birgir Hrafn Sigurðsson and [Geir Freysson](http://www.twitter.com/@geirfreysson): http://www.datasmoothie.com +### Changes from Quantipy +- Removed all quantipy overhead. Weightipy supports the latest versions of Pandas and Numpy and is tested for Python 3.7, 3.8, 3.9, 3.10 and 3.11. +- Weightipy runs up to 6 times faster than Quantipy, depending on the dataset. +- Rim class will not generate reports like Quantipy did, unless the parameter verbose is set to True on the Rim constructor. ## Installation @@ -22,8 +15,6 @@ or `python3 -m pip install weightipy` -Note that the package is called __weightipy__ on pip. - #### Create a virtual envirionment If you want to create a virtual environment when using Weightipy: @@ -42,10 +33,7 @@ python -m venv [your_env_name] **Get started** -#### Weighting -If your data hasn't been weighted yet, you can use Weightipy's RIM weighting algorithm. - -Assuming we have the variables `gender` and `agecat` we can weight the dataset with these two variables: +Assuming we have the variables `gender` and `agecat` we can weight the dataset like this: ```Python import weightipy as wp @@ -64,27 +52,40 @@ df_weighted = wp.weight_dataframe( efficiency = wp.weighting_efficiency(df_weighted["weights"]) ``` -Or if we want more control of the raking process, we can use the Rim class directly: +In case we are working with census data, which also includes a region variable and we would +like to weight the data by age and gender in each region, we can use the `scheme_from_df` function: ```Python import weightipy as wp +import pandas as pd -age_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}} -gender_targets = {'gender':{0:49, 1:51}} -scheme = wp.Rim('gender_and_age') -scheme.set_targets(targets=[age_targets, gender_targets]) +df_data = pd.read_csv("data_to_weight.csv") +df_census = pd.read_csv("census_data.csv") +scheme = wp.scheme_from_df( + df=df_census, + cols_weighting=["agecat", "gender"], + col_filter="region", + col_freq="freq" +) df_weighted = wp.weight_dataframe( - df=my_df, + df=d, scheme=scheme, weight_column="weights" ) efficiency = wp.weighting_efficiency(df_weighted["weights"]) ``` - -Or by using the underlying functions that will give more access to reports etc: +Or by using the underlying functions that will give more access to the weighting process, we +can use the Rim and WeightEngine classes directly: ```Python -... +import weightipy as wp + +# in this example, agecat and gender are int dtype + +age_targets = {'agecat':{1:5.0, 2:30.0, 3:26.0, 4:19.0, 5:20.0}} +gender_targets = {'gender':{0:49, 1:51}} +scheme = wp.Rim('gender_and_age') +scheme.set_targets(targets=[age_targets, gender_targets]) my_df["identity"] = range(len(my_df)) engine = wp.WeightEngine(data=df) @@ -110,6 +111,9 @@ Maximum weight factor 6.187700 Weight factor ratio 13.283522 ``` +For more references on the underlying classes, refer to the Quantipy +[documentation](https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class) + Overview of functions to get started: | Function | Description | @@ -121,6 +125,14 @@ Overview of functions to get started: | Rim class | Useful for creation of more complex weighting schemas. For example when weighting subregions or groups, which require filters. See: https://quantipy.readthedocs.io/en/staging-develop/sites/lib_doc/weights/02_rim.html#using-the-rim-class | | WeightEngine class | Useful for more specialised manipulation of the weighting process | +## Planned features +- More utility functions to simplify the weighting process +- More performance improvements, in order to better support batch weighting of many datasets +- Support for multithreaded weighting (possibly using Polars) +- Rewrite of the API to be less oriented towards how Quantipy worked and more in line with simple weighting needs +- Far future: Support for more weighting algorithms + + # Contributing The test suite for Weightipy can be run with the command @@ -132,3 +144,13 @@ But when developing a specific aspect of Weightipy, it might be quicker to run ( `python3 -m unittest tests.test_rim` We welcome volunteers and supporters. Please include a test case with any pull request, especially those that run calculations. + +# Quantipy + +#### Origins +- Quantipy was concieved of and instigated by Gary Nelson: http://www.datasmoothie.com + + +### Contributors on Quantipy +- Alexander Buchhammer, Alasdair Eaglestone, James Griffiths, Kerstin Müller : https://yougov.co.uk +- Datasmoothie’s Birgir Hrafn Sigurðsson and [Geir Freysson](http://www.twitter.com/@geirfreysson): http://www.datasmoothie.com