Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambigious TV-Shows such as "Wilfred.US" wrecks title parsing #75

Open
dezza opened this issue Mar 24, 2024 · 3 comments
Open

Ambigious TV-Shows such as "Wilfred.US" wrecks title parsing #75

dezza opened this issue Mar 24, 2024 · 3 comments

Comments

@dezza
Copy link

dezza commented Mar 24, 2024

Hello.

Nice lib, but there is one issue I found that I think needs to be fixed, I'll gladly help as long as we can agree on the issue.

For example Wilfred exists as both an AU and US show.

AU (first released, 2007)

https://www.themoviedb.org/tv/3297

US (2011)

https://www.themoviedb.org/tv/39525-wilfred

This means that now the title is parsed as Wilfred US.

It would be a safe assumption to think that any tag in capitalized country-code US|UK|AU|NZ|CA would mean ambigous titles and narrowing down to the specific show in respective country.

Of course the rare occassion could happen that some title would be.. Toys.R.Us, but unlikely that it would be capitalized.. If so thats a real corner-case not worth optimizing for!

https://scenerules.org/html/2020_WDX_unformatted.html

    19.8) Different shows with the same title produced in different countries must have the ISO 3166-1 alpha 2 country code in  the show name.
        19.8.1) Except for UK shows, which must use UK, not GB.
        19.8.2) This rule does not apply to an original show, only shows that succeed the original.
                e.g. The.Office.S01E01 and The.Office.US.S01E01.
@scttcper
Copy link
Owner

I've mostly ignored the tv show parsing, if you want to improve it feel free. I think i fixed a similar issue in movies by looking for the movie year and assuming things before it were the title. I'm sure something similar can be done for tv

@dezza
Copy link
Author

dezza commented Mar 25, 2024

I guess there could be a small possibility for a cornercase something like:

Food.in.the.US # the country

But then.. Why would a show end with "the" (given that the default is to strip the US country at the end).. Thats something you could check for if that ever became a thing, which is unlikely but chance never zero.. Assuming something about titles in the first place is flaky at best, there is always possibility for another weird title.

It kind of sucks the scene does it like this, because there is no way to discern if its actually part of the title or not except for the small clues such as the case of the as mentioned above.

@dezza
Copy link
Author

dezza commented Apr 3, 2024

I wrote some logic for this that I think makes sense. I think you will be able to tell from it how I think the most reasonable way to handle it would be.

If next last word is not the its definetily not "referring to an actual country"

/**
 * @param {SceneTags} scenetags 
 */
function stripTVShowCountry(scenetags) {
  const lastElement = -1
  const words = scenetags.title.split(' ')
  if (scenetags.type === 'tvshow' &&
      words.at(lastElement)?.match(/(?<country>US|UK|NZ|AU|CA)/u) &&
      words.at(lastElement-1) !== 'the'
   ) {
    scenetags.title = words.slice(0, lastElement).join(' ')
  }
  return scenetags
}

// Ends with country
console.log("Ends with country")
console.log(stripTVShowCountry(null, {title: 'Wilfred US', type: 'tvshow'}))
console.log(stripTVShowCountry(null, {title: 'Oy mate Crocodile Hunter AU', type: 'tvshow'}))

console.log()

// Ends with actual country, next last is "the". Concludes its a real title
console.log("Ends with country, next last is 'the'. Concludes its a real title")
console.log(stripTVShowCountry(null, {title: 'Soldiers in the US', type: 'movie'}))
console.log(stripTVShowCountry(null, {title: 'Food in the US', type: 'tvshow'}))
console.log(stripTVShowCountry(null, {title: 'Queen of the UK', type: 'tvshow'}))

Example:

Output

Ends with country
{ title: 'Wilfred', type: 'tvshow' }
{ title: 'Oy mate Crocodile Hunter', type: 'tvshow' }

Ends with country, next last is 'the'. Concludes its a real title
{ title: 'Soldiers in the US', type: 'movie' }
{ title: 'Food in the US', type: 'tvshow' }
{ title: 'Queen of the UK', type: 'tvshow' }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants