Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add tidyxl's xlsx_table to the io module. #990

Merged
merged 35 commits into from
Jan 18, 2022
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
86cd0a9
Merge pull request #1 from pyjanitor-devs/dev
samukweku Apr 23, 2021
7ee2e19
Merge branch 'pyjanitor-devs:dev' into dev
samukweku May 24, 2021
32be96c
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Jun 5, 2021
513ef04
updates
samukweku Jul 26, 2021
f3d9b11
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Aug 1, 2021
4f98e9d
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Aug 15, 2021
37af6a6
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Aug 19, 2021
facb52c
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Aug 19, 2021
5f3c8e3
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Aug 20, 2021
057c39b
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Aug 20, 2021
c25235a
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Aug 22, 2021
5a90734
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Sep 2, 2021
d0fb585
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Sep 5, 2021
cdef368
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Sep 5, 2021
ccbab57
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Sep 12, 2021
c4a47ba
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Oct 2, 2021
92e99aa
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Oct 3, 2021
5563104
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Oct 7, 2021
4987fc2
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Oct 11, 2021
1a6ef85
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Oct 17, 2021
ee2a51a
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Oct 25, 2021
fe0fac6
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Nov 1, 2021
b3c3bff
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Nov 2, 2021
d5a4169
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Nov 5, 2021
9a6cadf
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Dec 10, 2021
375bb47
Merge branch 'pyjanitor-devs:dev' into dev
samukweku Jan 13, 2022
361a694
xlsx_table
Jan 16, 2022
c44cab3
test files
samukweku Jan 16, 2022
e2665df
tests
Jan 17, 2022
d091e2e
tests
Jan 17, 2022
90f14d1
tests
Jan 17, 2022
c1732b9
changelog
Jan 17, 2022
e82849f
updates
Jan 17, 2022
3950fbf
updates
Jan 18, 2022
a6672db
Merge branch 'dev' into excel_related
samukweku Jan 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions janitor/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import subprocess
from glob import glob
from io import StringIO
from openpyxl import load_workbook
samukweku marked this conversation as resolved.
Show resolved Hide resolved
from typing import Iterable, Union

import pandas as pd
Expand Down Expand Up @@ -110,3 +111,66 @@ def read_commandline(cmd: str, **kwargs) -> pd.DataFrame:
else:
outcome = outcome.stdout
return pd.read_csv(StringIO(outcome), **kwargs)


def xlsx_table(
path: str,
sheetname: str,
table: Union[str, list, tuple] = None,
header: bool = True,
) -> pd.DataFrame:
"""
Returns a DataFrame of values in a table in the Excel file.
If the `table` argument is provided, a pandas DataFrame is returned;
if the `table` argument is None, or a list/tuple of names,
a dictionary of DataFrames is returned, where the keys of the dictionary
are the table names.

samukweku marked this conversation as resolved.
Show resolved Hide resolved
:param path: path to the Excel File.
ericmjl marked this conversation as resolved.
Show resolved Hide resolved
:param sheetname: Name of the sheet
from which the tables are to be extracted.
:param table: name of a table, or list of tables in the sheet.
ericmjl marked this conversation as resolved.
Show resolved Hide resolved
:raises ValueError: if there are not tables in the sheet.
ericmjl marked this conversation as resolved.
Show resolved Hide resolved
:param header: If the first row should be used as column names.
:returns: A pandas DataFrame, or a dictionary of DataFrames.
"""

wb = load_workbook(filename=path, read_only=False, keep_links=False)
thatlittleboy marked this conversation as resolved.
Show resolved Hide resolved
ws = wb[sheetname]

contents = ws.tables
if not contents:
raise ValueError(f"There is no table in `{sheetname}` sheet.")
contents = contents.items()

if isinstance(table, str):
table = [table]
if table is not None:
check("table", table, [list, tuple])

if isinstance(table, (list, tuple)):
for entry in table:
if entry not in contents:
raise ValueError(
f"""
{entry} is not a table
in the {sheetname} sheet.
"""
samukweku marked this conversation as resolved.
Show resolved Hide resolved
)
contents = ((key, value) for key, value in contents if key in table)

frame = {}
for key, value in contents:
content = ((cell.value for cell in row) for row in ws[value])
if header:
column_names = next(content)
content = zip(*content)
frame[key] = dict(zip(column_names, content))
else:
content = zip(*content)
frame[key] = {f"C{num}": val for num, val in enumerate(content)}

if len(frame) == 1:
_, frame = frame.popitem()
return pd.DataFrame(frame)
return {key: pd.DataFrame(value) for key, value in frame.items()}