-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAS7BDAT parser: Speed up RLE/RDC decompression #47405
Merged
Merged
Changes from 16 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
0e02b8d
Speed up RLE/RDC decompression
jonashaag eca0db4
Update tests
jonashaag 041a04b
ssize_t -> size_t
jonashaag 0451c31
Merge branch 'main' into sas/decompress3
jonashaag f2c8b0e
Update sas.pyx
jonashaag 17c72f8
Merge branch 'main' into sas/decompress3
jonashaag 91f8436
Merge branch 'main' into sas/decompress3
jonashaag 221f20c
Merge branch 'main' into sas/decompress3
jonashaag 213b08f
Don't use null byte as except value
jonashaag 4b24773
Nit
jonashaag 263aea6
Simplify condition
jonashaag 785f752
Review feedback
jonashaag 1f36f99
Docstring -> comment
jonashaag 26aea28
Revert "Simplify condition"
jonashaag afdfc1c
Merge branch 'main' into sas/decompress3
jonashaag 6a3fd55
Merge branch 'main' into sas/decompress3
jonashaag fc5621b
Merge branch 'main' into sas/decompress3
jonashaag 0d3daa8
Merge branch 'main' into sas/decompress3
jonashaag 0588d18
Merge branch 'main' into sas/decompress3
jonashaag 21ba0b2
Lint
jonashaag 55cceb7
Speed up some Cython `except`
jonashaag ba9b019
Typo
jonashaag File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,23 @@ | ||
import os | ||
from pathlib import Path | ||
|
||
from pandas import read_sas | ||
|
||
ROOT = Path(__file__).parents[3] / "pandas" / "tests" / "io" / "sas" / "data" | ||
|
||
|
||
class SAS: | ||
def time_read_sas7bdat(self): | ||
read_sas(ROOT / "test1.sas7bdat") | ||
|
||
params = ["sas7bdat", "xport"] | ||
param_names = ["format"] | ||
def time_read_xpt(self): | ||
read_sas(ROOT / "paxraw_d_short.xpt") | ||
|
||
def setup(self, format): | ||
# Read files that are located in 'pandas/tests/io/sas/data' | ||
files = {"sas7bdat": "test1.sas7bdat", "xport": "paxraw_d_short.xpt"} | ||
file = files[format] | ||
paths = [ | ||
os.path.dirname(__file__), | ||
"..", | ||
"..", | ||
"..", | ||
"pandas", | ||
"tests", | ||
"io", | ||
"sas", | ||
"data", | ||
file, | ||
] | ||
self.f = os.path.join(*paths) | ||
def time_read_sas7bdat_2(self): | ||
next(read_sas(ROOT / "0x00controlbyte.sas7bdat.bz2", chunksize=11000)) | ||
|
||
def time_read_sas(self, format): | ||
read_sas(self.f, format=format) | ||
def time_read_sas7bdat_2_chunked(self): | ||
for i, _ in enumerate( | ||
read_sas(ROOT / "0x00controlbyte.sas7bdat.bz2", chunksize=1000) | ||
): | ||
if i == 10: | ||
break |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is a style choice orthogonal to the rest of the PR? no real problem with it, but in general best to minimize these to make it easier to focus on the important bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ASV file would’ve been very confusing if I left the old code because my additions can’t use the old code and then we’d end up with two almost identical but different versions.