Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Feature Request): Request for ability to download the RYD database. #473

Open
1 of 2 tasks
eclipsek20 opened this issue Feb 4, 2022 · 25 comments
Open
1 of 2 tasks
Labels
enhancement New feature or request

Comments

@eclipsek20
Copy link

Extension or Userscript?

Extension

Request or suggest a new feature!

As the title suggests this would allow other people to work on a similar project by themselves, by not doing this RYD is basically going to become a monopoly. If they don't give this ability, I guess the saying "You either die a hero or live long enough to see yourself become the villain" makes a lot of sense.

Ways to implement this!

No response

Can you work on this?

  • Yes
  • No
@eclipsek20 eclipsek20 added the enhancement New feature or request label Feb 4, 2022
@SirLich
Copy link
Contributor

SirLich commented Feb 6, 2022

by not doing this RYD is basically going to become a monopoly

Isn't that the point of a collective replacement for the dislike counter? Fracturing the data is the worst thing you could do.

@Mohamed3on
Copy link

I'd love this to happen if only for data analysis reasons, to be able to sort videos by like/dislike counts, etc.

@eclipsek20
Copy link
Author

by not doing this RYD is basically going to become a monopoly

Isn't that the point of a collective replacement for the dislike counter? Fracturing the data is the worst thing you could do.

What if the database gets deleted or worse the author decides to make a paywall?

@dvingerh
Copy link

dvingerh commented Feb 8, 2022

I believe it's relevant to mention that the scraped data of videos prior to the removal of the dislike counter can be found publicly here:
https://wiki.archiveteam.org/index.php/YouTube#Removal_of_public_video_dislikes_.28December_2021.29
https://archive.org/details/archiveteam_youtubedislikes

@severtheskyline
Copy link

Because this extension has similar features to sponsorblock (IE. An public API) Would make sense to be able to download the database.

@eclipsek20
Copy link
Author

I guess we will not receive any answers for this...

@sy-b
Copy link
Contributor

sy-b commented Feb 20, 2022

@eclipsek20
Copy link
Author

It seems that in spite of "positivity" towards this suggestion, no action to date has been taken apart from lip service.

@sy-b
Copy link
Contributor

sy-b commented Apr 16, 2022

Userbase fragmentation might be a concern.

Originally posted by @Anarios in /issues/45#issuecomment-997230662_ December 18, 2022 UTC

By the way - a bunch of copycat-extensons died today once I enabled IP rate limiting.


They were just calling my api in their backend - no own DB, no caching - nothing. Just pretending to provide a service while in reality they didn't. Now imagine they had a DB dump and server code - what good would it make - more userbase fragmentation, less reliable votes? And all while using my work for free.

Posted here: #45 (comment)

@eclipsek20
Copy link
Author

That last statement is that of great concern, "And all while using my work for free.", whilst do believe that he has some right to be compensated, it also greatly increases the probability of the scenario where the author going rouge, e.g. a paywall. The statement "more userbase fragmentation, less reliable votes?", is just being used as a way to justify not giving the db to the people, also if fragmentation is such a scary thing then why does SponsorBlock not worry about it? (This is a rhetorical question). I do not want to discredit the author, but there are signs that the following may be happening: he might think he is entitled to the data and therefore everyone should bow to him for allowing access to the API. If I am wrong then please forgive me otherwise well I would be speechless.

@sy-b
Copy link
Contributor

sy-b commented Apr 16, 2022

That last statement is that of great concern, "And all while using my work for free.",

I don't think so. That statement (acc. to my view) was probably towards those freeloaders pretending to provide dislike data. That is just my guess cause I am not the owner of this repo.


whilst do believe that he has some right to be compensated, it also greatly increases the probability of the scenario where the author going rouge, e.g. a paywall.

The paywall will technically destroy the project. This project is dependent on it users. More users == better data/service. Of course this also has server costs which are hopefully covered by Patreon supporters.


The statement "more userbase fragmentation, less reliable votes?", is just being used as a way to justify not giving the db to the people,

idk but uploading TBs of data when only a few actually have the capacity to storage download it, doesn't seem very useful to me. I do want the data but how many others can afford it (storage)? My personal opinion is that the data dumps must be provided but unlike like sponsorblock's live dumps, these should be biweekly or monthly or whatever suits the bests.
Also, hosting & serving such large dumps isn't going to be cheap.


I do not want to discredit the author, but there are signs that the following may be happening: he might think he is entitled to the data and therefore everyone should bow to him for allowing access to the API

😆 . Well I feel its going in some other direction. From what I notice around me I can say that people hardly care about dislikes (Just tested this out). Most of them have acclimatized to not having dislikes. But that's just about people around me. So, getting rogue will most probably lead to downfall of this extension. (Backend can be fiddle with without any publicly visible effects).


Note

I am not @Anarios & these are my own perspectives.


@Anarios can you update us on your plans about database dumps?

@sy-b
Copy link
Contributor

sy-b commented Apr 16, 2022

I think the database should be segmented into small downloadable chunks (10MB) & should be hosted on torrent. Let the community handle its hosting 😄.

@sy-b
Copy link
Contributor

sy-b commented Apr 16, 2022

Another idea -
This database can be live hosed if RYD server uploads the changes every few minutes/hours. This might be problematic when used with torrents, but I think gun.eco might help here.

Some people were requesting decentralized database, this can be it.

@RuboGubo
Copy link

To be honest (and i know this will be unpopular) this sounds like what a block-chain would fix. After all, the block chain does not have to have money, and can simply be there to allow people to have a copy of the database that is automatically synced with the rest of the databases.

It solves fragmentation and centralization without necessarily money.

@eclipsek20
Copy link
Author

To be honest (and i know this will be unpopular) this sounds like what a block-chain would fix. After all, the block chain does not have to have money, and can simply be there to allow people to have a copy of the database that is automatically synced with the rest of the databases.

It solves fragmentation and centralization without necessarily money.

Aka Torrents

@sy-b
Copy link
Contributor

sy-b commented Apr 18, 2022

what a block-chain would fix

The problem is "immutability without bloating".

Vote count changes constantly & is perfect for bloating the block-chain.

@Anarios
Copy link
Owner

Anarios commented Apr 27, 2022

also if fragmentation is such a scary thing then why does SponsorBlock not worry about it?

Because sponsorblok only needs one submission per video, and we need exact dislike count.

DB size is becoming an issue as well, SB database is 200-300MB if I'm not mistaken, RYD database is several terrabytes.

@eclipsek20
Copy link
Author

eclipsek20 commented Oct 10, 2022

BUMP This is issue is currently very relevant seeing as the API is down without any alternative, at this rate someone should create an alternative project seeing how the author is unwilling to give the raw db.

@Anarios
Copy link
Owner

Anarios commented Oct 10, 2022

@eclipsek20 wdym API is down?

@eclipsek20
Copy link
Author

@eclipsek20 wdym API is down?

It was down for a couple of minutes, cloudflare returned a host error.

@Roman2K
Copy link

Roman2K commented Feb 5, 2023

Is there still interest around providing database exports?

I for one am very much interested. I think @Anarios did a phenomenal job on the browser extensions, evidently the backend -- at least from an end-user perspective witnessing its results' relevance and its reliability across desktop, mobile and TV clients -- including hosting such a massive database, and even communication through social media and the website.

The delay/opacity/doubts around releasing the source code are certainly understandable. That said, we're back to this fragile relationship of dependency toward a centralized host of valuable data with no easy way to mirror all of it, except scraping through the API.

My point here is that, to me, the data itself is more important of an issue than releasing the source code. Not to diminish the work behind it, I mean that with effort we could replicate closed-source work. Sadly, the same can't be said of the data...

So my question to @Anarios is two-fold: are you open to easing the export of your database? If so, how could the community help, whether financially or technically?

Given your willingness to compensate for YouTube's unfortunate decision, I'm guessing and hoping that you would be inclined to help prevent, or at least alleviate, further SPOF-dependency situations. I would gladly help any way I can, and judging by a good number of open GitHub issues, more people would join in 🙂

@MavisCelus19201
Copy link

Was there ever any follow up on this? This feels like something that could be extremely beneficial young forwards.

@eclipsek20
Copy link
Author

Was there ever any follow up on this? This feels like something that could be extremely beneficial young forwards.

The author of this plugin is very hesitant towards doing this, officially he will probably say that he will create such a function, only to not do anything until the day he dies. I am very saddened by the fact that this is the man that ended up managing this project, but I guess some mistakes can not be undone.

@eclipsek20
Copy link
Author

eclipsek20 commented Jan 29, 2024

Bump once again we see that this would be useful as currently the API is down
Screenshot_20240129-161229_Firefox Nightly.png

@NoPlagiarism
Copy link

Seems like duplicate of #258

Surprised, I forgot to write about it earlier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests