Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate less exploitative solutions to bypassing captcha #1256

Closed
lambdadog opened this issue Jun 17, 2020 · 41 comments
Closed

Evaluate less exploitative solutions to bypassing captcha #1256

lambdadog opened this issue Jun 17, 2020 · 41 comments
Labels
question Further information is requested

Comments

@lambdadog
Copy link

lambdadog commented Jun 17, 2020

Invidious is currently using https://anti-captcha.com to solve and bypass captchas.

I and many others feel negatively about supporting an exploitative mechanical turk-like company to enable invidious to continue to function. I'd very much like to start a discussion around more ethical alternatives to handling Google captchas.

There will certainly have to be tradeoffs to make this happen, but I'd like to see invidious be as ethical a piece of software as possible and I think a big first step is finding a way to stop supporting harmful and exploitative services like anti-captcha.com.

Ideally I'd like to see this issue left open as an open discussion about alternatives, even if they're just configurable for personal instances.

@lambdadog
Copy link
Author

An immediately actionable step that could be taken is to note the use of anti-captcha.com in Invidious somewhere. Currently the only way to find out about this is to look through the source code itself, and I think many users who want to avoid google poison also care about issues such as this and would appreciate being notified that Invidious is using them.

@lambdadog lambdadog changed the title Evaluate other solutions to bypassing captcha Evaluate less exploitative solutions to bypassing captcha Jun 17, 2020
@TheFrenchGhosty
Copy link
Member

Hello, I am the person that suggested the use of anti-captcha.

I really care about those issue and it's something we really checked before.

We spend some time analyzing the company, and came to the conclusion that this is a real company (it's not run by one guy from his garage), that provide "acceptable" work with acceptable to okay salary for people that live in countries with really low salary.

From what we gathered the salary they offer to their worker is on par with the medium salary of those countries, however the work they do is "easier" (physically) that what those countries could propose them.

Most of those countries are third world countries where the only work available is in mining / food production for first world country (literally us), anti-captcha allow some of those people to at least avoid this by getting an okay salary doing "easy" work.

In my opinion this is okay, yes it could be better but this is, in my opinion, better for people to solve captcha than to produce stuff for a company that don't care about their life.

This is a far deeper issue than just Invidious and Anti-captcha, and I don't think that stopping giving them this work will make things better.

@unixfox
Copy link
Member

unixfox commented Jun 18, 2020

I've successfully used a program like uncaptcha2 with the help of puppeteer in order to solve the audio version of Google reCAPTCHA.

This is certainly another possible way to avoid using anti-captcha and due to the fact that Invidious solve a reCAPTCHA only every 3 hours, Google may not detect that it's a not a human that is solving the audio reCAPTCHA.

The main issue is that it's not 100% reliable and has a major annoyance:

  • Google can block the usage of the audio version of reCAPTCHA if it considers an abuse.
  • Google is improving over time its way of detecting bots that are using the audio version of reCAPTCHA.
  • It requires to include a full fat browser into Invidious just for solving the reCAPTCHA.

I plan on doing a distributed audio reCAPTCHA solving solution and offering it as a clone of the API of anti-captcha so that every Invidious instance owner can just change the API URL in the source code and profit a CAPTCHA free experience.
It just takes time to do it but I'm certain that it will benefit to the community.

@lambdadog
Copy link
Author

I would love to see that, @unixfox ! While it might not be viable for a large instance like the flagship, for people hosting their own instances I imagine it could work well

@lambdadog
Copy link
Author

@TheFrenchGhosty I appreciate the time that you and those involved took to look into the company.

That said, I do personally still believe this is an issue. Looking at their site, anti-captcha.com reads as an incredibly stressful job to perform with the metaphorical guillotine of losing your livelyhood hanging over your head at all moments if you fill out too many captchas wrong (as we all know, something very easy to do). They seem to particularly take pride in banning workers quickly when they see that they're "cheating".

I agree that there's no good answer here. I wish there was something that I could open up with and say, "let's replace anti-captcha.com with this solution I have that isn't as exploitative!", but at the very least I want Invidious to be open about using such services for now instead of it just staying tucked away in the source code where your average end-user will never hear about it.

@m4teh
Copy link

m4teh commented Jun 18, 2020

May I ask what exactly makes paying individuals to fulfill a market demand and make your life easier exploitative? Are you suggesting free humans shouldn't have the freedom of voluntary employment, and therefore further pushing people in to economic distress?
The only person who gets to decide if their wage is enough is the person voluntary agreeing to do the job. Nobody would agree to do a job that they felt was an unfair market wage for their skillset or competencies.
The only logical free market solution that I see to this is create your own 'ethical' captcha solving business if you're that concerned.

@elypter
Copy link

elypter commented Jun 22, 2020

people are getting exploited because they are being tricked or pressured not because they enjoy it. the only kind of people who seem to love it are those who see themselves as temporarily embarrassed millionaires and advocate explotive behavior just because "choice" or "freedom". in reality this is just instinctive submissive to what they percieve as alpha male, even if its just a company or "the market"

@m4teh
Copy link

m4teh commented Jun 23, 2020 via email

@Perflyst
Copy link
Contributor

How are people being tricked?

I dont know, people who work there are not forced to work there.

Who are these people being tricked?

They are not.

Have you surveyed people working in this field?

No, but I registered as worker on their website and "worked" around an hour to test this out.

How do you know all of this?

Online reviews, own experience.

Where can I find out more about this crusade against people filling out captchas?

crusade?!

Why is the captcha completion industry the focus of your attention?

Google blocks the access to youtube.com with captcha if they saw too many requests from the same IP. Therefore we need to solve a captcha to access the youtube.com data again. This hits only big instances.

Do captcha crusaders understand Austrian economics or freedom of individual choice?

I cant answer that, I dont know if they do.

What makes this job of completing captchas exploitative, over any other job that someone might do, like say a toilet or hazmat cleaner?

I think it is way better than a cleaner in this country.

Is the concern about how much money people are being paid? If so, how do you claim to know what someone is worth, or
should be paid, more than they themselves know what they’re worth?

Yes, it is all about money. We dont claim anything, we just compare our first world standard with theirs and do understand that they get less than us. We did not decide that. We cant change it with a finger snaps

They would be worse off without it, or they wouldn’t agree to trade their time for money, right?

Most likely, but just because it is worse somewhere else it doesnt mean it is good here.

@m4teh
Copy link

m4teh commented Jun 23, 2020

Thanks for the response @Perflyst. I feel somewhat more confident I'm not losing my mind now. I say crusader because this is the second anti-captcha sentiment I've seen lurking in open source projects. The other was in the searx issues, where the social justice warriors there deleted my comment out of the thread for merely questioning the same exploitation logic. Who doesn't love a bit of censorship of opposing opinions.
Additionally I say crusader as I suspect the anti-captcha people are in a collective group of some sort, given the amount of up votes this issue received within hours of posting.

So is anyone able to shed some light on this whole captcha-completion-is-expliotation virtue signalling? My guess is these well intentioned people don't realize that the very people they think they're helping are actually getting screwed if their jobs/income go away.

@elypter

This comment has been minimized.

@Perflyst Perflyst added enhancement Improvement of an existing feature question Further information is requested labels Jun 23, 2020
@lambdadog

This comment has been minimized.

@lambdadog

This comment has been minimized.

@m4teh

This comment has been minimized.

@gripped

This comment has been minimized.

@Perflyst
Copy link
Contributor

Thank you for the discussion but please stay on topic and actually evaluate an alternative to anti captcha services.

@m4teh
Copy link

m4teh commented Jun 24, 2020

Thank you for the discussion but please stay on topic and actually evaluate an alternative to anti captcha services.

The above discussion is relevant. The fundamentals of the initial premise are being challenged, due to being based on incorrect or unsound logic; that there is in fact no need to seek alternatives. That's what the discussion intends to (and appears to have) showcase(d).

@gripped
Copy link
Contributor

gripped commented Jun 24, 2020

How you think my comment is off topic is bewildering !
Surely if we are to evaluate less exploitative solutions to bypassing captcha some facts about the current solution are necessary ?
If I don't know the level of exploitativeness of anti-captcha, how can I judge whether an alternative is less or more so ?
Surreal :)
Anyway I don't use it. I'll leave you all to it.

@Perflyst
Copy link
Contributor

The fact is that no worker from the third world country should work for us to consume more or less trivial things. In my opinion the aim is to find a technical solution as @unixfox proposed in #1256 (comment)

@unixfox
Copy link
Member

unixfox commented Jun 24, 2020

I asked perflyst to hide some comments because this issue is moving out of the initial subject which is talking about alternatives to anti-captcha and not about if anti-captcha is an ideal solution or not for Invidious.

If you would like to talk about "if the main solution to solve the Google reCAPTCHA is bad or good" then feel free to open another issue.

My main concern is that I feel like this discussion is refraining the potential peoples that would talk about the alternatives because this issue is too focused on the ethics of using anti-captcha. I'm really interested to see if someone else would come up with another good solution and I don't want to unsubscribe of this issue because it's not talking about the alternatives of anti-captcha anymore.

@lambdadog
Copy link
Author

lambdadog commented Jun 24, 2020

I'm not certain I understand why Perflyst's "people are not being tricked or forced to work there" comment and m4teh's comments where they quiz me on how this could possibly be exploitative remain while mine that responds to them and attempts to explain the opposite side of the discussion are hidden.

I can understand the wish to hide the discussion on the nature of the ethics of this situation so we can focus on the technical side, but completely hiding one side of the argument while only hiding the late rebuttals on the other side seems a little disingenuous to me.

@cjslep
Copy link

cjslep commented Jun 24, 2020

All I see is a lot of thumbs down and no responses of substance

As one of the thumbs-down-ers: @m4teh, I don't think you're going to get a substantive response. As soon as you opened with your very first reply with

May I ask

The answer should have been "no". It's been a massive derail into a pseudo-ethical discussion on what should be a technical issue with a very reasonable, limited request by @lambdadog:

even if they're just configurable for personal instances.

and it seems everyone else is on-board with the idea of giving the user the freedom (and we know how you feel about freedom) to personally choose a technical alternative, if one can be found. Everyone else seems to recognize that, independent of each of our internal motivations, perhaps offering this choice would be a net positive for everyone, even if an individual person is not motivated to opt-in.

Perhaps let the good hardworking folks create this opt-in alternative option. And then after it launches in invidious, evangelizing your personal ethical system of "opt-out is morally right" wherever it pleases you and compete against the "opt-in" sinners. Right now your derails are denying that other side the freedom to even compete with you.

Edit: With my sincerest apologies to the invidious folks. Please feel free to mark this as off-topic or delete it as you see fit.

@elypter
Copy link

elypter commented Jun 24, 2020

going back to the technical side of the issue i think there are the following ways and probably many more to tackle it

  • do nothing
    - moot point. more options are always better than less
  • trigger less captchas
    - do more caching and possibly p2p hosting of meta information
    - connect to the api of other instances
    - redirect users to other instances
    - give the user more options to voluntarily reduce the number of requests
    - outsource requests to the devices of volunteer invidious users
    - integrate peertube and other video mirrors
    - investigate how to look less "botlike" to youtube
    - reduce possible invidious api abuse by bots
    - connect through vpns, tor, botnets or proxies(could also trigger more captchas)
  • solve captchas more effectively
    -set up an access limited proxy(just enough to make recaptcha work) and let volunteer invidious users connect to it. whenever invidious is not allowed to proceed any user can fire up a secondary browser with that proxy configured and solve a captcha.
    -use a browser emulation library like selenium to emulate a browser and then forward its screen content and controlls to a voluntaring invidious user
    -copy the anti-captcha service but with voluntary users
    -use tor exit nodes or vpns and thus let others solve the captchas for you indirectly(not ideal either)
    -investigate computationally expensive yet possible ways to solve captchas with neural networks possibly on the client side. (like uncaptcha2 but also for images)

many of these points reflect ideas and suggestions spread across many different issues. unfortunately i dont have those at hand but im confident they will resurface if there is interest. possibilities are plentiful. the difficulty is to decide the most effective path.

@artshevtsov
Copy link

artshevtsov commented Jul 12, 2020

Another solution is to use dynamic mobile proxies with multiport which make every request from new 3G/4G ip address. Multiport proxy costs ~$41 per month. (I have tested this service https://airsocks.in/en#tariffs, you can test any proxy type for free for a few hours). We talked about that with Omarroth, and i have tested everything locally - it worked perfectly. I was surprised when I saw that invidious is using anticaptcha to solve that issue with youtube request limits from one ip.

When I made my tests, I have found a line in a source code with http client initialization and hardcoded the proxy address and port there. I asked to add a config option for proxy host and port here at issues some time before, but it is still not implemented 😅

I am not familiar with crystal lang. It would be great if somebody can make a pull request and implement this simple config options for http client.
I have an old example for api-only branch here:
api-only...artshevtsov:api-only

@elypter
Copy link

elypter commented Jul 12, 2020

while expensive it seems to be a useful piece to the puzzle. mobile ips are clearly labled as consumer devices and usually many people share one ip so they might go easy on the rate limiting. multiport could however be a problem. most captchas are prevented by users having a google cookie. multiport only makes sense if you dont use a cookie because you have a new ip each connection. that could mean that you have to solve an initial captcha every time. no idea if google is currently asking for this but they could demand it without hurting regular users. would be a lot better if you could change ip manually.

@unixfox
Copy link
Member

unixfox commented Jul 13, 2020

mobile ips are clearly labled as consumer devices and usually many people share one ip so they might go easy on the rate limiting.

From my experience, Google doesn't lower their rate limit for consumers IPs. It's the same rate limit for everyone.
The reason why you would be rate limited quicker on a hosting provider IP is that Google takes into account a whole IP block not just your IP and the odds of having someone doing a lot of requests to Google are more common in hosting provider VPS than on a computer behind a home connection.

Another solution is to use dynamic mobile proxies with multiport which make every request from new 3G/4G ip address. Multiport proxy costs ~$41 per month. (I have tested this service airsocks.in/en#tariffs, you can test any proxy type for free for a few hours). We talked about that with Omarroth, and i have tested everything locally - it worked perfectly. I was surprised when I saw that invidious is using anticaptcha to solve that issue with youtube request limits from one ip.

You don't really have to find proxies specifically from home connection or cellular connection, you just need a bunch of clean IPs. Google supports IPv6 and there are way more IP blocks in IPv6 so if you were to find a provider of IPv6 proxies and those proxies have clean IP then you found a way less expensive solution.

The big issue with your solution is that the bandwidth is very low (max 15 Mbit/s) and this bandwidth is very crucial for a service like Invidious that proxy actual videos not just text. Imagine using that service with a lot of users watching videos, the Invidious instance would have huge playback issues.

@artshevtsov
Copy link

A lot of developers are using an api-only branch for youtube data fetching, bandwidth is not a problem for such use case.

The big issue with your solution is that the bandwidth is very low (max 15 Mbit/s) and this bandwidth is very crucial for a service like Invidious that proxy actual videos not just text. Imagine using that service with a lot of users watching videos, the Invidious instance would have huge playback issues.

There are a lot of dynamic proxy services with an APIs which allow to rotate ipv4/ipv6 address when you need with or without any delay depending on tariff options.
When google shows captcha, invidious can rotate proxy ip and repeat the request.

@lambdadog
Copy link
Author

lambdadog commented Jul 14, 2020

A lot of developers are using an api-only branch for youtube data fetching, bandwidth is not a problem for such use case.

Can you explain what exactly the api-only branch is doing, @artshevtsov? Is it using the youtube API for searching, etc, but still using non-API mechanisms to fetch the actual video and proxy it?

There are a lot of dynamic proxy services with an APIs which allow to rotate ipv4/ipv6 address when you need with or without any delay depending on tariff options.

The biggest concern with this solution to me is that the proxy services mentioned are likely used for a lot of spammy and malicious behavior and you may be likely to get a captcha right off the bat when rotating a good percentage of the time. I'd love to see some actual metrics from using this method though that might prove me wrong.

@lambdadog
Copy link
Author

With regards to needing "clean IPs", I'm curious of the reality of how google handles "dirty IPs" in this case. Are we familiar with how long it takes for them to be considered clean again?

@artshevtsov
Copy link

You can read about the API functionality here:

https://github.com/omarroth/invidious/wiki/API

This project doesn’t use youtube API.

Can you explain what exactly the api-only branch is doing, @artshevtsov? Is it using the youtube API for searching, etc, but still using non-API mechanisms to fetch the actual video and proxy it?

@unixfox
Copy link
Member

unixfox commented Dec 7, 2020

I just found this service that doesn't involve humans for solving Google Recaptcha: https://capmonster.cloud/en/, it's cheaper than anti-captcha and compatible with anti-captcha API so technically everyone can start using it thanks to #1473.

captcha_api_url: https://api.capmonster.cloud
captcha_key: captcha key from capmonster dashboard

@unixfox
Copy link
Member

unixfox commented Dec 9, 2020

@FireMasterK Let's move here.

That's not normal. Are you running multiple instances of Invidious at the same time? Do you often restart Invidious?

@FireMasterK
Copy link
Contributor

Nope, just one instance of Invidious.
I do restart Invidious every hour, maybe I should change it to 3 hours?

Also, is anyone aware if Invidious saves the cookies obtained so there's no need to get new ones next time on restart?

@unixfox
Copy link
Member

unixfox commented Dec 9, 2020

Yes Invidious should store the cookies inside the config.yaml. If it doesn't then there is a permission issue.

@Perflyst
Copy link
Contributor

Perflyst commented Dec 9, 2020

does anyone know how this specific service works in more detail?

@FireMasterK
Copy link
Contributor

Yes Invidious should store the cookies inside the config.yaml. If it doesn't then there is a permission issue.

Is this true even with docker? Isn't the config overridden by the environment variable?

@unixfox
Copy link
Member

unixfox commented Dec 9, 2020

Yes Invidious should store the cookies inside the config.yaml. If it doesn't then there is a permission issue.

Is this true even with docker? Isn't the config overridden by the environment variable?

Yes. I'm using the Docker image and you just need to mount the config.yml into the container.

@unixfox
Copy link
Member

unixfox commented Dec 10, 2020

@FireMasterK do you have less reCAPTCHA attempts now that you save the config.yml between each restart?

@unixfox
Copy link
Member

unixfox commented Dec 10, 2020

@Perflyst

does anyone know how this specific service works in more detail?

Well the images that Google gives for the captcha can be recognized by for example an image recognition system. There are a lot of systems like this on the market, here is one from Microsoft: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

By associating a system like this and an automated browser, the company (capmonster) is able to automate this reCAPTCHA solving without any human interaction.

You can find a demo of what I explained on my twitter: https://nitter.snopyta.org/unixf0x/status/1075068461720702979 (sorry it's in French but just play the video).

@FireMasterK
Copy link
Contributor

@FireMasterK do you have less reCAPTCHA attempts now that you save the config.yml between each restart?

I have not tinkered with my docker-compose file yet, I'll do so soon today.

@TheFrenchGhosty TheFrenchGhosty removed the enhancement Improvement of an existing feature label Feb 3, 2021
@github-actions
Copy link

This issue has been automatically locked since there has not been any activity in it in the last 30 days. If this is still applicable to the current version of Invidious feel free to open a new issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

10 participants