Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collaborate with Lund on 19th century speaker identification #42

Open
MansMeg opened this issue Sep 6, 2024 · 3 comments
Open

Collaborate with Lund on 19th century speaker identification #42

MansMeg opened this issue Sep 6, 2024 · 3 comments

Comments

@MansMeg
Copy link

MansMeg commented Sep 6, 2024

At Lund they have fixed the speaker mapping during the 19th century. We should add this to our corpus.

@fredrik1984
Copy link

After meeting with Agustin it seems that they did not get far with this (or did nothing at all). Howerver, they would like to help out fixing the mapping quality for the 19th century, e.g. see #323 in the backlog (Westac issue).

@ninpnin ninpnin changed the title Add data from Lund on speaker identification Collaborate with Lund on 19th century speaker identificatio Sep 17, 2024
@ninpnin ninpnin changed the title Collaborate with Lund on 19th century speaker identificatio Collaborate with Lund on 19th century speaker identification Sep 17, 2024
@fredrik1984
Copy link

fredrik1984 commented Oct 23, 2024

Let´s use this as an overall issue for the current work of improving the mapping algorithm (connecting correct MP to correct speech).

  • @ninpnin and @BobBorges create an aggregated CSV file with the most common unknown speaker introduction in the Swerik corpus. This should include, among other things, links to speeches, time period of common unknown speaker introductions, and a column to insert the correct MP-ID.
  • Agustin Goanaga and his team at Lund University will use the CSV file to add MP-IDs to common unknown speaker introductions in the 19th century. We should also schedule a meeting with them on how to do this work.
  • Lotta and Mattias can help out adding MP-IDs for common unknown speaker introductions from the 20th century + difficult cases that Agustin et al. could not fix. See also url in <pb> elem points to wrong page #24
  • Update record corpus.

@BobBorges
Copy link
Contributor

I'll come up with the csv file asap (today-ish).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants