Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAS7BDAT parser: Speed up RLE/RDC decompression #47405

Merged
merged 22 commits into from
Oct 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 15 additions & 22 deletions asv_bench/benchmarks/io/sas.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,23 @@
import os
from pathlib import Path

from pandas import read_sas

ROOT = Path(__file__).parents[3] / "pandas" / "tests" / "io" / "sas" / "data"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is a style choice orthogonal to the rest of the PR? no real problem with it, but in general best to minimize these to make it easier to focus on the important bits

Copy link
Contributor Author

@jonashaag jonashaag Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASV file would’ve been very confusing if I left the old code because my additions can’t use the old code and then we’d end up with two almost identical but different versions.



class SAS:
def time_read_sas7bdat(self):
read_sas(ROOT / "test1.sas7bdat")

params = ["sas7bdat", "xport"]
param_names = ["format"]
def time_read_xpt(self):
read_sas(ROOT / "paxraw_d_short.xpt")

def setup(self, format):
# Read files that are located in 'pandas/tests/io/sas/data'
files = {"sas7bdat": "test1.sas7bdat", "xport": "paxraw_d_short.xpt"}
file = files[format]
paths = [
os.path.dirname(__file__),
"..",
"..",
"..",
"pandas",
"tests",
"io",
"sas",
"data",
file,
]
self.f = os.path.join(*paths)
def time_read_sas7bdat_2(self):
next(read_sas(ROOT / "0x00controlbyte.sas7bdat.bz2", chunksize=11000))

def time_read_sas(self, format):
read_sas(self.f, format=format)
def time_read_sas7bdat_2_chunked(self):
for i, _ in enumerate(
read_sas(ROOT / "0x00controlbyte.sas7bdat.bz2", chunksize=1000)
):
if i == 10:
break
Loading