Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refang_url converts unknown schemes (such as 'tcp') to 'http' #32

Closed
jekyc opened this issue Jun 19, 2019 · 2 comments
Closed

refang_url converts unknown schemes (such as 'tcp') to 'http' #32

jekyc opened this issue Jun 19, 2019 · 2 comments

Comments

@jekyc
Copy link

jekyc commented Jun 19, 2019

It seems that refang'ing urls with a scheme not listed in line: https://github.com/InQuest/python-iocextract/blob/4da913206d8e94a6a3b137c011c89e9707cb3966/iocextract.py#L626
replaces it with 'http': https://github.com/InQuest/python-iocextract/blob/4da913206d8e94a6a3b137c011c89e9707cb3966/iocextract.py#L631.

Maybe a hard-coded conversion mapping could be used, e.g.:

refang_schemes = {
    'http': ['hxxp'],
    'https': ['hxxps'],
    'ftp': ['ftx', 'fxp'],
    'ftps': ['ftxs', 'fxps']
}
for scheme, fanged in refang_schemes.items():
    if parsed.scheme in fanged:
        parsed = parsed._replace(scheme=scheme)
        url = parsed.geturl().replace(scheme + ':///', scheme + '://')

        try:
            _ = urlparse(url)
        except ValueError:
            # Last resort on ipv6 fail.
            url = url.replace('[', '').replace(']', '')

        parsed = urlparse(url)

        break

This is not as catch-all as the current solution, but on the other hand it does not alter the indicator.

Example:

In [1]: import iocextract                                                                              

In [2]: content = """tcp://example[.]com:8989/bad"""                                                   

In [3]: list(iocextract.extract_urls(content))                                                         
Out[3]: ['tcp://example[.]com:8989/bad', 'tcp://example[.]com:8989/bad']

In [4]: list(iocextract.extract_urls(content, refang=True))                                            
Out[4]: ['http://example.com:8989/bad', 'http://example.com:8989/bad']

Note: This behavior is shown in the output examples in the README.rst in the 'Usage' section related to refang.

@battleoverflow
Copy link
Contributor

Hi, @jekyc!

I believe this was resolved in this commit: 9abe5f2

I've set it up to allow the user to decide if a scheme check will even occur during execution. If this does not fix your issue, feel free to let me know. If you have the time, feel free to submit a PR for any improvements you think could be useful.

This new release is not available on PyPI yet, but I'll be sure to make another comment here once it's available.

You can see an example of the change in this issue: #34

@battleoverflow
Copy link
Contributor

New version is now available on PyPI: https://pypi.org/project/iocextract/1.14.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants