Submission: PyDataPeek Draft #49

MrThomasPin · 2020-03-16T18:47:38Z

Submitting Author: Thomas Pin @MrThomasPin
Package Name: PyDataPeek
One-Line Description of Package: Simple EDA for .csv or .xlsx documents
Repository Link: Repo Link
Version submitted:
Editor: @kvarada
Reviewer 1: Elliott Ribner @elliott-ribner
Reviewer 2: Aman Kumar Garg @amank90
Archive: TBD
Version accepted: TBD

Description

PyDataPeek is a package that enables data scientists to efficiently generate a visual summary of a dataset. This package includes functions that show the size of the dataset, a visual summary of missing data, a sample of the dataset showing the data types as well as exploratory visualizations for quantitative and qualitative data.

Scope

Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data munging
- Data deposition
- Reproducibility
- Geospatial
- Education
- Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

Explain how and why the package falls under these categories (briefly, 1-2 sentences):

PyDataPeek is used for quick visualization that individuals can use to understand their data better and have report ready visualizations for their documents.

Who is the target audience and what are scientific applications of this package?

Business individuals who want quick and effective visualizations of their data.

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

Several Python packages are available that support exploratory data analysis but none are specific to the targeted use cases here - a simple and technologically friendly way of summarizing data.

Pandas Profiling: This package generates a report of a dataframe that has some of the features in the proposal. Our package will differ from this by offering the user simpler summaries that are friendlier to a non-technical audience.
Python Pandas: Our package will leverage pandas functionality to manipulate dataframes. Our package functionality overlaps with some functions such as pd.describe which computes summary statistics for dataframes. The package differs in that it aims to offer summary statistics dependent on data type, including long form text data.
Python Altair, Python Seaborn and Python WordCloud: These visualization packages will be used to create visualizations that summarize the dataset as well as user-defined features in the dataset.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

The text was updated successfully, but these errors were encountered:

MrThomasPin · 2020-03-16T18:58:19Z

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has an OSI approved license
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

MrThomasPin · 2020-03-16T18:58:31Z

Publication options

Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor 'utility' packages, including 'thin' API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

MrThomasPin · 2020-03-16T18:59:57Z

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

MrThomasPin · 2020-03-16T19:00:20Z

Code of conduct

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

ribner · 2020-03-18T08:36:32Z

Reviewer: Elliott

Package Review

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all user-facing functions
Examples for all user-facing functions
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.

Readme requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
Installation instructions
Any additional setup required (authentication tokens, etc)
Brief demonstration usage
Direction to more detailed documentation (e.g. your documentation files or website).
If applicable, how the package compares to other similar packages and/or how it relates to other packages
Citation information

Functionality

Installation: Installation succeeds as documented.

I received an error when trying to install. I think this is an issue with poetry, that it does not auto install dependencies.

Collecting pydatapeek
  Downloading https://test-files.pythonhosted.org/packages/1e/27/5a49ffb2261be9541e88d0ae9e076862e2a8029d779a78812a5f210f850f/pydatapeek-0.1.9-py3-none-any.whl
ERROR: Could not find a version that satisfies the requirement altair_saver<0.2.0,>=0.1.0 (from pydatapeek) (from versions: none)
ERROR: No matching distribution found for altair_saver<0.2.0,>=0.1.0 (from pydatapeek)

Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4

---#### Review Comments

Altogether, great job on the project. I think there is many useful features contained in the package, and it is well implemented! I found the code and structure, well written, and well documented. I found very few points to improve, but if time allowed to fix there is three things worth noting:

Unused file: I think there is an unused file titled pbc.py in the test directory.
I could not understand the heatmap documentation. To be more specific I did not understand how the function would be used or interpreted from reading the docs. Looking closer at the visualization on the readme, there is some labels on the edge of the image but they were very hard to read. I might suggest a more involved example of how it could be used, with a written description on how to interpret the output.
Lastly, this seems to be an issue with all of the projects including my own, but the fact that you need to manually install all the dependencies before pip installing seems like an issue. I think pip packages should generally install dependencies automatically.

Thank you for your time.

Sincerely,

Elliott Ribner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission: PyDataPeek Draft #49

Submission: PyDataPeek Draft #49

MrThomasPin commented Mar 16, 2020 •

edited

Loading

MrThomasPin commented Mar 16, 2020

MrThomasPin commented Mar 16, 2020 •

edited

Loading

MrThomasPin commented Mar 16, 2020 •

edited

Loading

MrThomasPin commented Mar 16, 2020 •

edited by alistair-clark

Loading

ribner commented Mar 18, 2020 •

edited

Loading

Submission: PyDataPeek Draft #49

Submission: PyDataPeek Draft #49

Comments

MrThomasPin commented Mar 16, 2020 • edited Loading

Description

Scope

MrThomasPin commented Mar 16, 2020

Technical checks

MrThomasPin commented Mar 16, 2020 • edited Loading

Publication options

MrThomasPin commented Mar 16, 2020 • edited Loading

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

MrThomasPin commented Mar 16, 2020 • edited by alistair-clark Loading

Code of conduct

Editor and Review Templates

ribner commented Mar 18, 2020 • edited Loading

Package Review

Documentation

Functionality

Final approval (post-review)

MrThomasPin commented Mar 16, 2020 •

edited

Loading

MrThomasPin commented Mar 16, 2020 •

edited

Loading

MrThomasPin commented Mar 16, 2020 •

edited

Loading

MrThomasPin commented Mar 16, 2020 •

edited by alistair-clark

Loading

ribner commented Mar 18, 2020 •

edited

Loading