Releases: sfu-db/dataprep
v0.4.4-alpha.1
Bugfixes 🐛
- eda.create-db-report: add missing style files from previously ignored by gitignore (7536191)
- eda: jinja2.markup import broken with 3.1 (b9b60a0)
- eda: fixed create_report browser sort rendering issue, returned context values directly instead of selecting by css class (331a964)
- eda: report for empty df (485e58d)
- eda: plot_diff when columns are not aligned (7e53dbf)
- eda: scipy version issue (8798a14)
- eda: na column name when upgrade dask (43fdd1a)
- eda: pd grouper issue when upgrade dask (761c445)
- clean: delete abundant print (0e072a8)
- eda.plot: fix display issue in notebook (6ed13b0)
- eda.plot: fix pagination styling issues (8396f2d)
- eda: restyled plots into same row, set height + width of plots to be same (c6ffcd4)
- eda: interaction error in report for cat-only df (e60239a)
- eda: fix cat-cat error (94f70ef)
- eda: fix stat layout issue (5bb535d)
- eda.create_report: fix display issue in notebook (487659f)
- clean: remove usaddress library (c192ab4)
- clean: fix the bug of am, pm (4c3b231)
- clean: fix the bug of am, pm (caf2b37)
- eda: fixed issue where plots weren't rendering twice (fd3fd57)
- eda: wordcloud setting in terminal (0090169)
Features ✨
- eda: added sorting feature for create_diff_report (8b187a6)
- eda: add running total for time series test (d094072)
- eda: add create_db_report submodule (9784cce)
- eda.plot: add pagination threshold and add auto jump in pagination navigation (cfdd0de)
- eda.create_report: add sort by approximate unique (5738db2)
- eda: add sort variables by alphabetical and missing (fb93493)
- clean: New version of GUI (6828807)
- eda: enriched show details tab by adding plots and overview statistics (eeb210d)
Code Quality + Testing 💯
- eda: add tests for intermediate compute functions (700add7)
Documentation 📃
- clean: revise _init.py (02ede81)
- clean: add doc of clean GUI (5e2f38a)
- eda.plot: add pagination for plot (c4cd4b9)
- eda.create_report: remove old doc file (e1153cb)
- eda.create_report: convert rst docs file to ipynb and add additional docs for variables sort (bf39a56)
- eda: add doc for getting imdt result (6fbcfe4)
- eda: add the doc of run dataprep.eda on Hadoop yarn (628686d)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- Andrey Pham <andrey.pham@move.com> (First time contributor) ⭐️
- Bowen0729 <bowen0729@qq.com> (First time contributor) ⭐️
- Danrui Qi <qidanrui@gmail.com> (First time contributor) ⭐️
- Danrui QI <danruiqi@Danruis-MBP.hitronhub.home> (First time contributor) ⭐️
- dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (First time contributor) ⭐️
- Devin <devinllu@hotmail.com> (First time contributor) ⭐️
- Devin Lu <ludevinl@sfu.ca>
- Grey Murav <65895033+gremur@users.noreply.github.com> (First time contributor) ⭐️
- henryye <yixuy@sfu.ca> (First time contributor) ⭐️
- Jinglin Peng <jlpengcs@gmail.com>
- jwa345 <jwa345@sfu.ca> (First time contributor) ⭐️
- qidanrui <qidanrui@gmail.com>
- Weiyuan Wu <youngw@sfu.ca>
🎉🎉 Thank you! 🎉🎉
v0.4.4
Bugfixes 🐛
- eda: type error for npartitions (57db1ed)
- eda.create-db-report: remove pystache dependency and replace it with jinja2 (676fff1)
- eda.create-db-report: add missing style files from previously ignored by gitignore (7536191)
- eda: jinja2.markup import broken with 3.1 (b9b60a0)
- eda: fixed create_report browser sort rendering issue, returned context values directly instead of selecting by css class (331a964)
- eda: report for empty df (485e58d)
- eda: plot_diff when columns are not aligned (7e53dbf)
- eda: scipy version issue (8798a14)
- eda: na column name when upgrade dask (43fdd1a)
- eda: pd grouper issue when upgrade dask (761c445)
- clean: delete abundant print (0e072a8)
- eda.plot: fix display issue in notebook (6ed13b0)
- eda.plot: fix pagination styling issues (8396f2d)
- eda: restyled plots into same row, set height + width of plots to be same (c6ffcd4)
- eda: interaction error in report for cat-only df (e60239a)
- eda: fix cat-cat error (94f70ef)
- eda: fix stat layout issue (5bb535d)
- eda.create_report: fix display issue in notebook (487659f)
- clean: remove usaddress library (c192ab4)
- clean: fix the bug of am, pm (4c3b231)
- clean: fix the bug of am, pm (caf2b37)
- eda: fixed issue where plots weren't rendering twice (fd3fd57)
- eda: wordcloud setting in terminal (0090169)
Features ✨
- clean: add updated version of rapidfuzz and python-crfsuite (59f3506)
- eda.create-db-report: add save report functionality (2fb16ad)
- eda: add get_db_names (a7bf820)
- eda: added sorting feature for create_diff_report (8b187a6)
- eda: add running total for time series test (d094072)
- eda: add create_db_report submodule (9784cce)
- eda.plot: add pagination threshold and add auto jump in pagination navigation (cfdd0de)
- eda.create_report: add sort by approximate unique (5738db2)
- eda: add sort variables by alphabetical and missing (fb93493)
- clean: New version of GUI (6828807)
- eda: enriched show details tab by adding plots and overview statistics (eeb210d)
Code Quality + Testing 💯
- eda: add test for npartition type error (5affd75)
- eda: add tests for intermediate compute functions (700add7)
Documentation 📃
- eda: add the use-case of dataprep.eda for spark dataframe with ray (4bf14e7)
- clean: revise _init.py (02ede81)
- clean: add doc of clean GUI (5e2f38a)
- eda.plot: add pagination for plot (c4cd4b9)
- eda.create_report: remove old doc file (e1153cb)
- eda.create_report: convert rst docs file to ipynb and add additional docs for variables sort (bf39a56)
- eda: add doc for getting imdt result (6fbcfe4)
- eda: add the doc of run dataprep.eda on Hadoop yarn (628686d)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- Andrey Pham <andrey.pham@move.com> (First time contributor) ⭐️
- astellarius <zak.lake0@gmail.com> (First time contributor) ⭐️
- Bowen0729 <bowen0729@qq.com> (First time contributor) ⭐️
- Danrui Qi <qidanrui@gmail.com> (First time contributor) ⭐️
- Danrui QI <danruiqi@Danruis-MBP.hitronhub.home> (First time contributor) ⭐️
- dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (First time contributor) ⭐️
- Devin <devinllu@hotmail.com> (First time contributor) ⭐️
- Devin Lu <ludevinl@sfu.ca>
- Grey Murav <65895033+gremur@users.noreply.github.com> (First time contributor) ⭐️
- henryye <yixuy@sfu.ca> (First time contributor) ⭐️
- Jinglin Peng <jlpengcs@gmail.com>
- jwa345 <jwa345@sfu.ca> (First time contributor) ⭐️
- qidanrui <qidanrui@gmail.com>
- Sultan Orazbayev <contact@econpoint.com> (First time contributor) ⭐️
- Weiyuan Wu <youngw@sfu.ca>
🎉🎉 Thank you! 🎉🎉
v0.4.3
Bugfixes 🐛
- eda: fixed create_report browser sort rendering issue, returned context values directly instead of selecting by css class (331a964)
- eda: report for empty df (485e58d)
- eda: plot_diff when columns are not aligned (7e53dbf)
- eda: scipy version issue (8798a14)
- eda: na column name when upgrade dask (43fdd1a)
- eda: pd grouper issue when upgrade dask (761c445)
- clean: delete abundant print (0e072a8)
- eda.plot: fix display issue in notebook (6ed13b0)
- eda.plot: fix pagination styling issues (8396f2d)
- eda: restyled plots into same row, set height + width of plots to be same (c6ffcd4)
- eda: interaction error in report for cat-only df (e60239a)
- eda: fix cat-cat error (94f70ef)
- eda: fix stat layout issue (5bb535d)
- eda.create_report: fix display issue in notebook (487659f)
- clean: remove usaddress library (c192ab4)
- clean: fix the bug of am, pm (4c3b231)
- clean: fix the bug of am, pm (caf2b37)
- eda: fixed issue where plots weren't rendering twice (fd3fd57)
- eda: wordcloud setting in terminal (0090169)
Features ✨
- eda: added sorting feature for create_diff_report (8b187a6)
- eda: add running total for time series test (d094072)
- eda: add create_db_report submodule (9784cce)
- eda.plot: add pagination threshold and add auto jump in pagination navigation (cfdd0de)
- eda.create_report: add sort by approximate unique (5738db2)
- eda: add sort variables by alphabetical and missing (fb93493)
- clean: New version of GUI (6828807)
- eda: enriched show details tab by adding plots and overview statistics (eeb210d)
Code Quality + Testing 💯
- eda: add tests for intermediate compute functions (700add7)
Documentation 📃
- clean: revise _init.py (02ede81)
- clean: add doc of clean GUI (5e2f38a)
- eda.plot: add pagination for plot (c4cd4b9)
- eda.create_report: remove old doc file (e1153cb)
- eda.create_report: convert rst docs file to ipynb and add additional docs for variables sort (bf39a56)
- eda: add doc for getting imdt result (6fbcfe4)
- eda: add the doc of run dataprep.eda on Hadoop yarn (628686d)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- Andrey Pham <andrey.pham@move.com> (First time contributor) ⭐️
- Bowen0729 <bowen0729@qq.com> (First time contributor) ⭐️
- Danrui Qi <qidanrui@gmail.com> (First time contributor) ⭐️
- Danrui QI <danruiqi@Danruis-MBP.hitronhub.home> (First time contributor) ⭐️
- dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (First time contributor) ⭐️
- Devin <devinllu@hotmail.com> (First time contributor) ⭐️
- Devin Lu <ludevinl@sfu.ca>
- Grey Murav <65895033+gremur@users.noreply.github.com> (First time contributor) ⭐️
- henryye <yixuy@sfu.ca> (First time contributor) ⭐️
- Jinglin Peng <jlpengcs@gmail.com>
- jwa345 <jwa345@sfu.ca> (First time contributor) ⭐️
- qidanrui <qidanrui@gmail.com>
- Weiyuan Wu <youngw@sfu.ca>
🎉🎉 Thank you! 🎉🎉
v0.4.2
Bugfixes 🐛
- eda: na column name when upgrade dask (43fdd1a)
- eda: pd grouper issue when upgrade dask (761c445)
- clean: delete abundant print (0e072a8)
- eda.plot: fix display issue in notebook (6ed13b0)
- eda.plot: fix pagination styling issues (8396f2d)
- eda: restyled plots into same row, set height + width of plots to be same (c6ffcd4)
- eda: interaction error in report for cat-only df (e60239a)
- eda: fix cat-cat error (94f70ef)
- eda: fix stat layout issue (5bb535d)
- eda.create_report: fix display issue in notebook (487659f)
- clean: remove usaddress library (c192ab4)
- clean: fix the bug of am, pm (4c3b231)
- clean: fix the bug of am, pm (caf2b37)
- eda: fixed issue where plots weren't rendering twice (fd3fd57)
- eda: wordcloud setting in terminal (0090169)
Features ✨
- eda.plot: add pagination threshold and add auto jump in pagination navigation (cfdd0de)
- eda.create_report: add sort by approximate unique (5738db2)
- eda: add sort variables by alphabetical and missing (fb93493)
- clean: New version of GUI (6828807)
- eda: enriched show details tab by adding plots and overview statistics (eeb210d)
Code Quality + Testing 💯
- eda: add tests for intermediate compute functions (700add7)
Documentation 📃
- clean: add doc of clean GUI (5e2f38a)
- eda.plot: add pagination for plot (c4cd4b9)
- eda.create_report: remove old doc file (e1153cb)
- eda.create_report: convert rst docs file to ipynb and add additional docs for variables sort (bf39a56)
- eda: add doc for getting imdt result (6fbcfe4)
- eda: add the doc of run dataprep.eda on Hadoop yarn (628686d)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- Andrey Pham <andrey.pham@move.com> (First time contributor) ⭐️
- Bowen0729 <bowen0729@qq.com> (First time contributor) ⭐️
- dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (First time contributor) ⭐️
- Devin <devinllu@hotmail.com> (First time contributor) ⭐️
- Devin Lu <ludevinl@sfu.ca>
- Grey Murav <65895033+gremur@users.noreply.github.com> (First time contributor) ⭐️
- henryye <yixuy@sfu.ca> (First time contributor) ⭐️
- Jinglin Peng <jlpengcs@gmail.com>
- jwa345 <jwa345@sfu.ca> (First time contributor) ⭐️
- qidanrui <qidanrui@gmail.com>
- Weiyuan Wu <youngw@sfu.ca>
🎉🎉 Thank you! 🎉🎉
0.4.1
v0.4.1
Bugfixes 🐛
- eda: stat layout in plot (946319f)
- eda: fix display in plot(df) (c11bb94)
- eda: report for pandas extension type (2cbb387)
- eda: fix saving imdt as json file (5ee6529)
Features ✨
- clean: Add wiki and simple GUI(7f4ab12)
- eda: added overview and variables section for create_diff_report (dc4cf7d)
- eda: add categorical interaction in create_report (7f13cd5)
Code Quality + Testing 💯
- eda: added basic automated tests (3a0653e)
Documentation 📃
- eda: link creete_diff_report to intro (05d9850)
- eda: added docs for create_diff_report (d8fc9d4)
- eda: enrich parameters in report (3d0a148)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- Devin Lu <ludevinl@sfu.ca>
- Jinglin Peng <jlpengcs@gmail.com>
- qidanrui <qidanrui@gmail.com>
- waterpine <songbian@zju.edu.cn>
- Weiyuan Wu <youngw@sfu.ca>
- Xiaoying Wang <xiaoying_wang@sfu.ca> (First time contributor) ⭐️
- Xiaoying Wang <wangxiaoying0369@gmail.com>
🎉🎉 Thank you! 🎉🎉
0.4.0
v0.4.0
Bugfixes 🐛
- eda: fix string type (b7e3321)
- eda: fix value table display (57281bc)
- eda: remove imdt output from plot (5c227e1)
- eda: adjusted save report method to accept one parameter (4ceefcc)
- eda: clean config code and fix scatter sample param (8ab27f9)
- plot_diff: fix ci issue (44ce81c)
- clean: clean_duplication issue 646 (ca9f708)
- eda: fix category type error (9750694)
Features ✨
- eda: refactored code and added density parameter to plot_diff(df) (323ae6b)
- eda: save imdt as json file (7867386)
- connector: integrate connectorx into connector (106457e, a64e356, 9f89d3b)
- clean: add clean_ml function (909cd19)
- clean: add multiple clean functions for number types (3c05be5)
- eda.diff: add plot_diff([df1..dfn], continuous) (3bfb4f5)
- clean: support conversion into packed binary format in clean_ip (7e30f93, 37a83b0)
Code Quality + Testing 💯
Performance 🚀
- clean: update documentation of clean_duplication (50f90fa)
Documentation 📃
- clean: change the introduction (862b447)
- eda: change eda colab position (ce25b17, d00b0bd)
- clean: add documentation for multiple clean functions for number types (732480f)
- clean: add documentation for clean_ml function (0c139db)
- eda: scattter.sample_rate added to documentation (549b319)
- eda: fix plot show (0b40a40)
- readme: add benchmark link (e807f79)
- readme: small text change on clean and connector (e193a6a)
- readme: fix titanc link (29cc06c)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- Devin Lu <devinllu@hotmail.com> (First time contributor) ⭐️
- dylanzxc <zca92@sfu.ca>
- Jinglin Peng <jlpengcs@gmail.com>
- Noir Tree <2515744793@qq.com> (First time contributor) ⭐️
- pwwang <1188067+pwwang@users.noreply.github.com> (First time contributor) ⭐️
- qidanrui <qidanrui@gmail.com>
- sahmad11 <53022377+sahmad11@users.noreply.github.com> (First time contributor) ⭐️
- waterpine <songbian@zju.edu.cn>
- Weiyuan Wu <youngw@sfu.ca>
- Xiaoying Wang <wangxiaoying0369@gmail.com> (First time contributor) ⭐️
🎉🎉 Thank you! 🎉🎉
v0.3.0
v0.3.0
Bugfixes 🐛
- eda: fix long name in missing heatmap (f6cc399)
- connector: fix bug in url_path_params (c95a7ff)
- eda: fix NA and int viz issue in plot_diff (ef36d5a)
- eda: fix missing for SmallCard and DateTime type (201e487)
- eda: fix create_report for dask csv (93e8567)
- clean: fix mixesd up formats of date in one column (e295695)
- eda: fixed uncaught dtype and long var names (24f0295)
- eda: fix correlation of num columns with small distinct values (9959b78)
- eda: fix issue with dataframe of one column (910bb71)
- eda: add geopoint in type count (94cbca2)
- eda: fixed uncaught dtype exceptions (d301eb7)
- eda: fix str transform with small distinct as categorical (65e7f90)
- eda: fix na values display issue (1ce5775)
- eda: keep na when preprocess df (17d8219)
- clean: fix returned df_clean in clean_dupl (180e6ad)
- clean: escape apostrophes in code exported by clean_dupl (e6ea7e9)
- eda: fixed endless loop and UI issues (69779cd)
- eda: fix insight error (9ad4e26)
- eda: suppress warnings for missing and report (df2a1e7)
- eda: fix insights of plot_correlation (f0ca5f4)
- eda: suppress warnings of progress bar and dask (ca8da4e)
- eda.create_report: fix constant column error (160844a)
- docs: fix docs of clean_df (38dd4b2)
- clean: remove unneeded replace in clean_dupl (51c02cd)
- eda: fixed bugs come with random generated datasets (53ecf76)
- eda: fix bugs in log transformation (209d7d0)
- eda: fixed and optimized css layouts (58e1b18)
- clean: fix bug in validate_country (28068d4)
- eda: fix column name and index related issues (40a89b9)
- eda: variables can be none (325b090)
- connector: path to new config repo (59603e5)
- clean: lat_long regex not match a date format (49d3d22)
- eda.distribution: highlight variable names (998b176)
- eda: fix the error of numerical cell in object column (91c4f9d)
- eda.distribution: box plot with object dtype (a37e9f2)
- clean: add comma after street suffix or name (e7655db)
- clean: cast values as str in validate funcs (8e1b459)
Features ✨
- clean: tuple of input formats for clean_country() (6bc6551)
- clean: add clean_text function (55d3ae9)
- eda: change color of geo map (1dbcddb)
- clean: add clean_currency function (deb5593)
- clean: add clean_df() function (b750284)
- type: detect column as categorical for small unique values (4696e59)
- eda: add geo_plot function (bbe64ec)
- eda: create_report UI improvement (c849b01)
- eda: added new function plot_diff (79523c3)
- connector: allow parameters appear in url path (5adaf30)
- eda: value frequency table (bc37b79)
- eda: create_report UI improvement (72a0ca9)
- clean: add clean_duplication() function (98ff38d)
- clean: support letters in clean_phone (25d163b)
- eda: specify colors in plot(df), plot(df, x) (33fa36e)
- connector: add functionality that lists supported websites (88187e1)
- clean: add clean_address function (e839ecd)
- clean: add clean_headers function (40742a1)
- eda: parameter management and how-to guide (d2e8b10)
- clean: add clean_date function (6aa6410)
- create_report: add tabs for correlation and missing (6dc568b)
Code Quality + Testing 💯
- eda: add test for geo point (943033a)
- eda: add dataset test for report (0de5208)
- eda: add test of random df (68239f0)
- clean: add tests for clean_duplication() (a4b9d32)
- eda: add random data generator (e83f95b)
- clean: add tests for clean_headers (0aca076)
- eda: add test case of object column with numerical cell (5783984)
- clean) : add tests for clean_date and validate_date (812dbb8)
Performance 🚀
- eda: optimize df preprocess and performance of create_report (e7eb182)
- clean: update documentation of clean_date (c540fcc)
- clean: improve performance of clean_duplication (8fda37e)
- eda: use approximate nunique (6030064)
- clean: improve the peformace of clean_email() (176382b)
- clean: improve performance of clean_date (854329b)
Documentation 📃
- readme: update video, paper and titanic report for eda (1126dea)
- eda: replace x, y, z with col1, col2, col3 (57f65b3)
- clean: add documentation for clean_text (65436b0)
- eda: add documentation for insights (1e4659b)
- clean: add documentation for clean_df() (4ecf0d7)
- eda: update user guide's datasets (2428f98)
- eda: add documentation for geo plot (3558257)
- clean: add user guide for clean_duplication (d834e85)
- clean: fix clean documentation (e3bed2b)
- connector: revision (23085dd)
- clean: add documentation for clean_date function (d445f36)
- connector: add info docs (cb8cb5c)
- connector: add config file section (f55226e)
- connector: adding a process overview via DBLP section (5794d6c)
- connector: remove stale rst files (433fdfe)
- connector: convert pagination section from rst to ipynb (e4b9ba0)
- connector: convert authorization section from rst to ipynb (d25af47)
- connector: change the pointer in index file from connector.rst to introduction.ipynb (218e41c)
- connector: rewrite introduction and form doc structure (6a87693)
- connector: update API reference doc (9bed169)
- clean: improve DataPrep.Clean ReadMe (a0bc96b)
- eda: update legacy documentations for eda (8f948e0)
- clean: add documentation for clean_address (4061fca)
- clean: add documentation for clean_headers (7a9d519)
- clean: add links from user guide to api ref (182b525)
- clean: Docstrings for phone and email (47f1e33)
- datasets: add introduction for datasets (83d42ce)
- clean: add API reference (68182f6)
- clean: add documentation for clean_ip function (9da3ed1)
- connector: add query() section (c904d1f)
- connector: add connect() section (bff842e)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- andy <insunshine@love.com> (First time contributor) ⭐️
- AndyWangSFU <zwa117@sfu.ca> (First time contributor) ⭐️
- atol <alicelinlc@gmail.com>
- Brandon Lockhart <brandon_lockhart@sfu.ca>
- dylanzxc <zca92@sfu.ca>
- eutialia <dev@ebon.network>
- Jinglin Peng <jlpengcs@gmail.com>
- jinglinpeng <jlpengcs@gmail.com>
- Lakshay-sethi <58126894+Lakshay-sethi@users.noreply.github.com> (First time contributor) ⭐️
- nzrymiak <nzrymiak@sfu.ca>
- peiwangdb <pennyiscomputing@gmail.com>
- peterirani <peshotan_irani@sfu.ca> (First time contributor) ⭐️
- qidanrui <qidanrui@gmail.com> (First time contributor) ⭐️
- ryanwdale <ryanwdale@gmail.com>
- waterpine <songbian@zju.edu.cn>
- Weiyuan Wu <youngw@sfu.ca>
- Yi Xie <zjuxyee@gmail.com>
- yuzhenmao <harrymao666@gmail.com>
- yuzhenmao <57878927+yuzhenmao@users.noreply.github.com>
- yxie66 <zjuxyee@gmail.com>
- zhixuan_chi <zca92@sfu.ca>
🎉🎉 Thank you! 🎉🎉
v0.2.15
Bugfixes 🐛
- eda: add test to plot_missing (303a13e)
- eda: when data size is small using plot_missing (9e59aa0)
- eda: set encoding to udf when file is opened (f43c1aa)
- clean: split parameter for clean_phone (f9bb100)
- connector: config manager checks _meta.json (5c2278d)
- eda.create_report: univar datetime analysis (4632852)
- eda.report: encoding and show issue (721ae7b)
Features ✨
- datasets: add load_dataset and get_dataset_names (2b9e1f9)
- connector: allow using config from other branches (276afff)
- connector: from_key parameter validation (bd89ef2)
- clean: add clean_ip function (3b23270)
- connector: improve info (2a175a8)
- eda: enrich plot_correlation (29c444e)
- clean: implement clean_phone for Canadian/US formats (45d4368)
- eda: modify doc of plot_missing (489c922)
- clean: add errors parameter, enhance report for clean_url (aa7ec9c)
- clean: add clean_url function (2894d0a)
- eda: add stat. in plot_missing (0f44f15)
- connector: adding validation for auth params (0a7c712)
- eda: convert all plot functions to new UI (36f8fa3)
- connector: update info function documentation (7b6ae53)
- connector: create display dataframe function (9767cf4)
Code Quality + Testing 💯
- clean: add tests for clean_ip and validate_ip (fc15682)
- clean: add tests for clean_url (452dbe8)
- clean: add tests for clean_phone (fcf7310)
- clean: add tests for clean_email() (fdd02c6)
- clean: add tests for clean_country() (8a593fa)
- clean: add tests for clean_lat_long (aea2602)
Performance 🚀
- clean: improve the peformace of the clean subpackage (c7c787b)
Documentation 📃
- README: add link to each section (b687076)
- README: polish EDA section (fd5ef8c)
- clean: add documentation for clean_url (bf937f9)
- clean: add documentation for clean_phone (8165a42)
- readme: fix the broken image (12e1fa1)
- readme: add introduction for dataprep.clean (3710037)
- clean: add docs for clean_country (2163981)
- eda: modify doc for plot_correlation (b6b377c)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- atol <alicelinlc@gmail.com>
- Brandon Lockhart <brandon_lockhart@sfu.ca>
- eutialia <dev@ebon.network>
- Jinglin Peng <jlpengcs@gmail.com>
- jinglinpeng <jlpengcs@gmail.com> (First time contributor) ⭐️
- Juan Ospina <jospina@sfu.ca> (First time contributor) ⭐️
- nzrymiak <nzrymiak@sfu.ca> (First time contributor) ⭐️
- pallavib <pallavib@sfu.ca>
- peiwangdb <pennyiscomputing@gmail.com>
- Peshotan Irani <peshotan_irani@sfu.ca>
- peterirani <peshotanirani@gmail.com>
- ryanwdale <ryanwdale@gmail.com>
- waterpine <songbian@zju.edu.cn>
- Weiyuan Wu <youngw@sfu.ca>
- Yi Xie <zjuxyee@gmail.com>
- yuzhenmao <harrymao666@gmail.com> (First time contributor) ⭐️
- yxie66 <zjuxyee@gmail.com>
🎉🎉 Thank you! 🎉🎉
v0.2.14
Bugfixes 🐛
- eda.plot_missing: new label texts and color mapping (71a95f9)
- connector: add missing authdef (8b274b9)
- eda.create_report: handle unhashable dtypes (7743749)
Features ✨
- connector: remove jsonschema dependency (6f07faf)
- connector: don't support xml website anymore (fa173a0)
- connector: simplify generator, add connect (a96d9b3)
- clean: implement clean_country function (5dea1bd)
- connector: do not update local config if it already exists (cd675f3)
- eda: Redesigned layout for plot_missing (c85eaa5)
- connector: add generator UI (4d1e900)
Performance 🚀
- eda: optimize plot_missing and plot_corr (b46036d)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- Brandon Lockhart <brandon_lockhart@sfu.ca>
- eutialia <dev@ebon.network>
- Jinglin Peng <jlpengcs@gmail.com>
- Pallavi Bharadwaj <pallavib@sfu.ca>
- pallavib <pallavib@sfu.ca>
- ryanwdale <ryanwdale@gmail.com> (First time contributor) ⭐️
- Weiyuan Wu <youngw@sfu.ca>
🎉🎉 Thank you! 🎉🎉
v0.2.13
Bugfixes 🐛
- eda: change dtype 'string' to 'object' (8ddddbc)
- eda: remove unecessary compute (98c4ab0)
- connector: wrong calculation for pagination (516038b)
- eda.data_array: handle empty df correctly (97db86d)
- eda.distribution: fix pie chart insight (d3564a6)
- eda.distribution: delay scipy computations (89fafae)
- eda.correlation: wrong mask calculation (8ebe9cc)
- eda.plot: fixed wordcloud, all nan column (ce762d5)
Features ✨
- connector: implement authorization code (e6838ca)
- connector: full text search _q to be a universal parameter (947584a)
- cleaning: add clean_email() function (4658a20)
- connector: implement generator (7a93ea0)
- connector: add token based pagination (5ec6e00)
- connector: implement page pagination (02c93b4)
- connector: implement header authentication (d879c20)
- connector: use pydantic for schema (dff0844)
- connector: rename pagination types (500ce13)
- cleaning: add report parameter for clean_lat_long (f0af621)
- connector: Parameter check when calling query() (0db7a16)
- eda: support series as the input (bad6a87)
- eda.plot: Redesigned layout for plot(df, x) (04c7fd5)
- cleaning: clean latitude, longitude coordinates (93927a9)
- eda.report: allow disabling the progress bar (2a90f7f)
- eda.correlation: move nan corr values to the bottom (4bba52e)
- eda: add progress bar for dask local scheduler (e13257c)
- eda.plot: increase # of bins and ngroups (f78cfae)
Performance 🚀
- eda.plot: changed drop_null to dropna (0a7fe56)
- eda.missing: use DataArray (fb69ea1)
- eda.plot: optimize bivariate computations (031748e)
- eda: improve progress bar performance (64be889)
- eda.correlation: increase the performance (3575aac)
- eda.correlation: performance tuning (68471e5)
Documentation 📃
- cleaning: add documentation for clean_email() (5bc3770)
- cleaning: update clean_lat_long docs (d698a10)
- cleaning: add documentation for clean_lat_long (eaba8c7)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- atol <alicelinlc@gmail.com> (First time contributor) ⭐️
- Brandon Lockhart <brandon_lockhart@sfu.ca>
- eutialia <dev@ebon.network>
- Jinglin Peng <jlpengcs@gmail.com>
- jospina <jospina@sfu.ca> (First time contributor) ⭐️
- Pallavi Bharadwaj <pallavib@sfu.ca>
- pallavib <pallavib@sfu.ca> (First time contributor) ⭐️
- peiwangdb <pennyiscomputing@gmail.com>
- rwdale <ryan_dale@sfu.ca> (First time contributor) ⭐️
- Weiyuan Wu <youngw@sfu.ca>
- Yi Xie <zjuxyee@gmail.com> (First time contributor) ⭐️
- yuzhenmao <57878927+yuzhenmao@users.noreply.github.com> (First time contributor) ⭐️
- yxie66 <zjuxyee@gmail.com>
🎉🎉 Thank you! 🎉🎉