GitHub Trending widget #1578
dzmitry-kankalovich
started this conversation in
Show and tell
Replies: 2 comments 1 reply
-
Tagging @senorprogrammer for visibility :) |
Beta Was this translation helpful? Give feedback.
0 replies
-
Great work on the GitHub Trending widget! It looks impressive, and your approach to scraping the GitHub Trending page seems well thought out given the limitations of their API. Opening a PR sounds like a good idea. Also, you might want to consider checking out Crawlbase for reliable scraping. Keep it up! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
I've been using
wtf
for a while now - mostly as a dashboard to read IT-related news daily. So naturally I usehackernews
,feedreader
andsubreddit
widgets a lot.However one thing that I was missing is the GitHub Trending repositories feed. It's a great source of new info on whats going on. GitHub itself made it publicly available at https://github.com/trending, but it's not part of their public API, as well as there is no RSS feed.
There are some thirdparty services that source this info by scraping the page, but they're unreliable.
The widget
Anyway, I did a bit of binge coding and implemented the GitHub Trending widget. It uses colly to scrape the GH Trending page and display it as widget in WTF. It works and configures in a very similar fashion to
hackernews
,feedreader
andsubreddit
widgets.I will talk about API vs. scraping situation and why I went there, but before - here are some examples.
Examples
en
)Configuration example
There reasonable defaults for each custom configuration field, but with all bells and whistles it'll look like this:
Why scraping
The situation is that GH does not have a public API to pull this data. The closest you can get is to use their Search Repository API, which allows you to pull repositories created after given time and rank them by stars. This has many problems, like for example what about repositories created earlier, but getting recognition only now, which is often the case. This API allows filtering by programming language, but sadly not by spoken language. As it happens right now, your results will likely contain a lot of repositories which are using Chinese (zh) as the only language for documentation.
The https://github.com/trending page however does have filtering by spoken language, as well as programming language. The trouble is that there is no any API available to reverse engineer at least. The page is served rendered with data, so scraping seemingly is the only option here.
There are some 3-rd party tools and services that do exactly that, but as of this moment they are not reliable (if you found one seemingly good - please share it with me).
So what I did is that I implemented a widget that scrapes the page (by default) on a startup/refresh, with the ability to configure it to use Search Repository API as a fallback in case scraper stopped working or GH bot detection (if there is any) gives troubles.
Anyway, I have changes waiting in my fork, so if there is any interest - I can open PR.
Beta Was this translation helpful? Give feedback.
All reactions