Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential fix for google_search() #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bowditch-c
Copy link

A suggested fix for google_search()

@chrispetrou
Copy link
Owner

I tried the pull-request and it doesn't seem to work for me. Every email I tested gets reported as not found in the google search results which is not the case!

@bowditch-c
Copy link
Author

The major change is that the function now performs a google search using quotes, e.g “username@email.com”. It will search for that email exactly as typed. It works for me! Any public facing email addresses return results, whilst private emails don’t. If that’s not exactly the intended function, my apologies.

@chrispetrou
Copy link
Owner

The function does pretty match what you described but when I test the following script using your pull-request:

import os, sys
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

os.environ['MOZ_HEADLESS'] = '1'
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True

def google_search(email):
    endpoint = 'https://google.com/search?q=%22{}%22'.format(email)
    try:
        with webdriver.Firefox(capabilities=cap) as d:
            d.get(endpoint)
            if "No results found" or "did not match any documents" in d.page_source:
                return False
            else:
                return True
    except Exception as error:
        raise(error)

try:
    email = sys.argv[1]
    breached = google_search(email)
    if breached:
        print("{} shows up on google search results".format(email))
    else:
        print("{} doesn't show up on google search results.".format(email))
except IndexError:
    sys.exit(0)

I get positive (by positive I mean not showing up in google search results) results for every email I test. When I use your method manually it works but through that script it doesn't for some reason. I've tried it even for very simple emails that have been in thousands breaches and it keeps reporting them as safe...

@jsfan
Copy link

jsfan commented Aug 29, 2019

The patch ignores a race condition. Google's search is rendered via Javascript and the script does not make sure that it waits for the DOM to have been assembled before trying to read from it.

cf. https://selenium-python.readthedocs.io/waits.html

@bowditch-c
Copy link
Author

Aha! Excellent catch. Thank you! I was stumped. I couldn’t recreate the issue on my end with my set of test emails. An explicit wait should resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants