Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to login to a page #581

Open
batata004 opened this issue Dec 21, 2023 · 6 comments
Open

How to login to a page #581

batata004 opened this issue Dec 21, 2023 · 6 comments

Comments

@batata004
Copy link

batata004 commented Dec 21, 2023

Hi,

I cant believe how easy and well documented this library is! For years I needed to control Chrome and I could never find an easy alternative because most require me to use Node or Python. I have good knowledge of PHP and this library is helping A LOT!

Thank you so much devs! This is a pearl!

I've been using this for the last 2 months for several projects and I still cant find a way to solve this one particular problem: I need to scrape a website that first I need to login. I can use userDataDir to make the browser session/cookies persistent, however I dont know how I can login because it's, of course, head less. I tried using "customFlags" => ["--remote-debugging-port=2288"] and it works only on localhost: if I have a server on another IP, I cant access remote debugging because chrome prevents external IPs.

So I ask you: how I can login to the website? After I login, I can rely on userDataDir to make it persistent across reboots. But I have no idea how I can login...

THANKS A LOT! You saved my last 2 months with this library, it's awesome!

@enricodias
Copy link
Member

You mean, you need to login manually? Is there a captcha or something preventing the script from login on it's own?

@batata004
Copy link
Author

batata004 commented Dec 23, 2023

I will try to be more clear, english is not my main language sorry I speak portuguese.

My question is this: in a very specific website, I first need to log in into it, it has captcha and I need to type my password and confirm an SMS code. After this first login, everytime I open the same website, I am already automatically logged in. So, after I make the first login, I can use your library to access in the future this same website and extract the data I need.

So my problem is this: using --remote-debugging-port=2288 I can start the browser (using your library) in my server (it's thousands miles away from me, then I need to use my Chrome (running on my local computer) to connect to the browser I started on the server using your library. If I could connect to it, then I could see the interface window, I could type my password, solve the captcha and type the SMS code. However, I cant find a way to connect to this browser running on the server from my local computer. Using --remote-debugging-port=2288 works great if I am in the same network.

Sorry sorry, really sorry if I cannot make myself clear. If you have any suggestion how can I talk better, please tell me I will do my best to be clearer next time.

@enricodias
Copy link
Member

In those cases your script can relay the login information to you in realtime, including the captcha. That's actually how captcha farms work. So basically you can have this script running in a normal server listening for requests, then access that url just like in any other website, the script will load the login page using the headless chrome and relay that information and you'll receive it as the content for that request.

You could use websockets to receive and send information to the same php process or you could just use ignore_user_abort to keep that process alive even after closing the first http connection, but with this second approach you'll need to send the login data to it in another way. You could use another script to receive the login info and insert it in a db or a local file and have the original script controlling chrome fetching this file every second.

You could also login locally and just upload the contents of your userDataDir to the server.

@batata004
Copy link
Author

@enricodias I appreciate all your kindness explaining in great details all of that to me!

What you said in your last comment, is exactly what I need: I need to run Chrome (with this library) in a server located in US and in my home computer (in Brazil) I need to access the screen and be able to login to my Bank (I need to be able to use mouse/keyboard). After I am logged in, I can use your library to automate the stuff I need.

My problem so far is that I cant login, I have no access to the screen. When I run Chrome headless directly in my computer (without PHP) I can use --remote-debugging-port=2288 to access the screen of Chrome and I can even control the website with mouse/keyboard. However, with your library I cant do that because I cant find a way to SEE what is the screen inside Chrome and control it with mouse and keyboard. For some reason, headless Chrome does not work with remote IP (even using --remote-debugging-port=2288).

So my dear friend, I ask you: I succesfully managed to use your library to open Chrome, access websites, get DOM nodes... however I still cant find a way to see the screen and click inside the webiste so I can type my login/password of my bank. After I am logged in, the rest will be very easy because the cookies will be saved in the user directory and every time I open the browser in the future, I will already be logged in!

@enricodias
Copy link
Member

As I explained before, you won't be able to control chrome directly remotely like that. The best you could do is to relay the information needed for the login in another way to your server, so that your php script can complete the login. You don't need to see the page rendered for that, you just need to relay that info to your local browser.

If that page is rendered in a way that you can't extract the login information (java applets, web assembly or anything like that), you won't be able to login by relaying the page. You can try login locally and transfer the whole userDataDir to the server or setup a GUI in the server and access it remotely.

In case the login session gets locked with the ip used to logged in, you can setup a normal proxy in your server to login using the server ip before copying the data dir.

@batata004
Copy link
Author

Thank you, you were very clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants