Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s_attributes() for speech bundle does not work for attributes on document level in nested corpora #283

Closed
ChristophLeonhardt opened this issue Nov 21, 2023 · 1 comment

Comments

@ChristophLeonhardt
Copy link
Contributor

Using GermaParl2 and polmineR v0.8.9.9001 I noticed that for bundles created by as.speeches() s_attributes() works in some but not in all circumstances.

It does work for attributes which are on "speaker level" such as "speaker_name" or "speaker_role". It does not work for attributes on the "document level" such as "protocol_date" or "protocol_lp". For these attributes, this results in the following error message:

Error in check_strucs(corpus = corpus, s_attribute = s_attribute, strucs = struc, :
highest value of strucs may not be larger than size of structural attribute

As an reproducible example:

library(polmineR)

speeches_germaparl <- corpus("GERMAPARL2") |>
  as.speeches(s_attribute_date = "protocol_date",
              s_attribute_name = "speaker_name",
              gap = 0)

s_attributes(speeches_germaparl, s_attribute = "protocol_date")

For other bundles this works fine.

date_bundle <- corpus("GERMAPARL2") |>
  split(s_attribute = "protocol_date")

s_attributes(date_bundle, s_attribute = "protocol_lp")
s_attributes(date_bundle, s_attribute = "speaker_name")
@ablaette
Copy link
Collaborator

Use the following (lightweight) example to check the solution I implemented.

library(polmineR)
use("polmineR")

speeches <- corpus("GERMAPARLMINI") |>
  as.speeches(
    s_attribute_date = "protocol_date",
    s_attribute_name = "speaker",
    gap = 300,
    progress = FALSE
  )

# s-attribute is sibling
s_attributes(speeches, s_attribute = "party")

# s-attribute is ancestor
s_attributes(speeches, s_attribute = "protocol_date")

# s-attribute is descendent
speeches_germaparl <- corpus("GERMAPARLMINI") |>
  split(s_attribute = "protocol_date") %>%
  s_attributes("speaker")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants