Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added firefox binary path to config #168

Merged
merged 14 commits into from
Dec 22, 2019
Merged

Conversation

GrayHat12
Copy link
Contributor

This should save users like me from WebDiver permission denied errors.

I've added Firefox Binary path to config.example.py file now. So when the users create the config file, they can specify the path and not have to encounter WebDriver errors the way I did.

This should save users like me from WebDiver permission denied errors
Copy link
Owner

@sundowndev sundowndev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments. Seems not valid for merging. This will break support for Linux/MacOS/Docker users.

@@ -23,6 +25,7 @@ def closeBrowser():
browser.quit()

def search(req, stop):
time.sleep(5)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this sleep command here helped me avoid being blocked by google. I've previously done some bot scripts and a small wait usually prevents our scripts from being detected by the website.

google_cx_id=''
firefox_exe_path = r'C:\Program Files\Mozilla Firefox\firefox.exe'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not working. You setup a executable path for your own installation of Firefox. This will not work for Linux users.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should put an UNIX path (/usr/bin/firefox) as default value, and add a statement in documentation for Windows users. I don't want to provide explicit support for Windows.

@sundowndev sundowndev added geckodriver/selenium invalid This doesn't seem right labels Dec 15, 2019
now the script should check if system is windows or not.
If windows then it uses firefox binary path else dosen't use it.
This should make the script compatible with docker/linux/mac and easy for windows users.
Also increased the time.sleep to 8 seconds since I experienced some blocks with 5 second timeout.
@GrayHat12
Copy link
Contributor Author

Check out the latest commit. This should resolve all your current issues. Also the time.sleep(8) should now prevent you from being blocked so you can use selenium webdriver in headless mode. Sure it makes the script a bit slower but should be good overall since the user won't have to keep solving captcha's

Copy link
Owner

@sundowndev sundowndev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea to add a field in config, but the default value shouldn't be related to Windows.

google_cx_id=''
firefox_exe_path = r'C:\Program Files\Mozilla Firefox\firefox.exe'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should put an UNIX path (/usr/bin/firefox) as default value, and add a statement in documentation for Windows users. I don't want to provide explicit support for Windows.

@@ -32,7 +36,11 @@ def search(req, stop):
if os.environ.get('webdriverRemote'):
browser = webdriver.Remote(os.environ.get('webdriverRemote'), webdriver.DesiredCapabilities.FIREFOX.copy())
else:
browser = webdriver.Firefox()
if os.name == 'nt':
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the path directly, or check if it's defined/not empty.

Added default binary path for linux as requested

now instead of checking the operating system type for using binary path , the script simply checks if the binary path holds a value or is empty and uses the binary path accordingly

Hope this should finally resolve all issues
@@ -23,6 +26,7 @@ def closeBrowser():
browser.quit()

def search(req, stop):
time.sleep(10)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting 10 sec at each request is ridiculous, it shouldn't be handled that way. A rotating proxy must be used to avoid getting blocked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the rotating proxy. Have a look

a rotating proxy as you asked for
This is my first ever implementation of a rotating proxy. Might have some bugs. I just tested and it seemed to work fine for me.
We can sure fix if any bugs arise in the future
@sundowndev sundowndev added kind/feature New feature or request and removed invalid This doesn't seem right labels Dec 18, 2019
@sundowndev
Copy link
Owner

Well.. I didn't ask you to add a rotating proxy feature but anyway I appreciate your concern about this issue 😄

  • Scraping free-proxy-list.net is a anti-pattern, HTML is subject to changes, just create an array of IPs instead
  • Google blocks free proxies
  • If we create such a feature, it has to be isolated in a component, then used in the search feature to avoid increasing code complexity; we also have to be able to disable it in the config file

What I'd suggest is to stay focus on the initial issue, which is the Firefox executable path, and simply use the variable firefox_exe_path if it's not empty; this means it is defined but empty by default. Also why not adding a mention about this in documentation.

Of course you can still open another pull request to suggest changes about the rotating proxy feature.

@GrayHat12
Copy link
Contributor Author

okay i mentioned it in the docs. Regarding the rotating proxy, actually I did that in excitement because if you merged this pull request then it'd be my first useful open source contribution. 😃
I'm not sure how to open another pull request to the same repo and branch but okay i'll search it on google and maybe go with that.
Although I can implement the disable proxy from configuration right now in this pull request if you want.

@sundowndev
Copy link
Owner

sundowndev commented Dec 19, 2019

Just so you know, open source maintainers prefer 10 small PRs rather than 1 big PR with tons of features. 1 pull request = 1 feature/fix. Small PR are useful anyway! You can create a new branch on your fork to create another PR if you want to.

@sundowndev
Copy link
Owner

Can you please move rotating proxy related changes to another branch so I can review and merge this one ? Thank you

@GrayHat12
Copy link
Contributor Author

done. i'll open another pr for rotating proxies later

Copy link
Owner

@sundowndev sundowndev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Thank you

@sundowndev sundowndev merged commit f332ff8 into sundowndev:develop Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants