Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fn to make path from prot_id #18

Merged
merged 4 commits into from
Nov 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions pyriksdagen/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,3 +516,33 @@ def get_context_sequences_for_protocol(protocol, context_type, target_length = 1
output_dict = {'id' : id_list,
'text' : texts_with_contexts}
return output_dict


def pathize_protocol_id(protocol_id):
"""
Turn the protocol id into a path string from riksdagen-records root. Handles 'unpadded' protocol numbers.
"""

spl = protocol_id.split('-')
parliament_year = spl[1]
suffix = ""
if len(spl) == 4:
nr = spl[3]
pren = '-'.join(spl[:3])
else:
nr = spl[5]
pren = '-'.join(spl[:5])
if len(spl) == 7:
suffix = f"-{spl[-1]}"
path_ = f"data/{parliament_year}/{pren}-{nr:0>3}{suffix}.xml"
if os.path.exists(path_):
# if path_ exists, return path_
return path_
else:
# try to remove strings like "extra" and "höst" from path_ ...
# (these were removed from some of the protocol filenames after goldstandard annotation)
# check again if new path_ exists and return
path_ = re.sub(f'((extra)?h[^-]+st|")', '', path_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Add a comment maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if os.path.exists(path_):
return path_
raise FileNotFoundError(f"Can't find {path_}")
Loading