Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding the number of players/clubs/competitions included in the hosted instance #51

Open
tvqt opened this issue Dec 17, 2022 · 5 comments
Labels
question Further information is requested

Comments

@tvqt
Copy link
Contributor

tvqt commented Dec 17, 2022

I've been joining the data on the FIFA 23 player database, and only about half of the players in the FIFA database are present in the scraped Transfermarkt data. The rest are on Transfermarkt data, but not scraped. These fall into a few different categories:

  1. Non-European leagues - teams from Brazil, United States, Argentina, Australia, Chile etc.
  2. Lower level European leagues - there are no teams from League One or lower in England, Ligue 2 or lower in France, LaLiga 2 or lower in Spain etc.

This works out to be about 10,000 players, so it would be great to find a way of incorporating them (if it doesn't make it too unwieldy!)

@tvqt tvqt changed the title Expanding the number of players/clubs/competitions Expanding the number of players/clubs/competitions included in the hosted instance Dec 17, 2022
@dcaribou
Copy link
Owner

@tvqt - Thanks for the suggestion.

I haven't really tried to scrape non EU and lower level leagues, but I'd assume that's possible without changing the scraper by providing appropriate parameters / parent files.

Is you question whether all this leagues could be added to the datasets? If so, perhaps we can discuss in a new issue in https://github.com/dcaribou/transfermarkt-datasets

@dcaribou dcaribou added the question Further information is requested label Dec 24, 2022
@visheugene
Copy link

@dcaribou, hi! And thanks for the nice product.
Why do you scrape only up to 25 competitions per type (first_tier, domestic_cup...) from confederations?

@dcaribou
Copy link
Owner

dcaribou commented Jan 9, 2023

Hey @visheugene.

There's not an explicit limit on the number of competitions scraped by the competitions crawler. However, this crawler does scrape the first page from competitions list in the confederation page only, which contains exactly 25 competitions.

The reason why it scrapes the first page only is that it was simple enough and it already covered most relevant competitions (top 25 countries by market cap), so I stopped there.

Screenshot 2023-01-09 at 19 34 11

It should not be too hard to modify the competitions scraper so it recurses through the rest of the pages in the competitions list though, it that's needed.

@ScottishWolverine
Copy link

Hey @dcaribou,

Would you be able to help me with modifying the scraper so it recurses through the rest of the pages in the competitions list? I'm having difficulties setting this up.

@dcaribou
Copy link
Owner

dcaribou commented Nov 9, 2023

Hey @dcaribou,

Would you be able to help me with modifying the scraper so it recurses through the rest of the pages in the competitions list? I'm having difficulties setting this up.

Hey @ScottishWolverine. Sure. If you are having problems settings things up you may raise a new issue describing your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants