1.1 Information Gathering

📑 Prerequisites

Basic familiarity with Linux

Basic familiarity with web technologies

📕 Learning Objectives

Differences between active and passive information gathering

Perform passive and active information gathering with various tools and resources

What is Information Gathering?

The initial phase of any penetration test involves information gathering. This step revolves around gathering data about an individual, company, website, or system that is the target of the assessment.
Success in the later stages of a penetration test is closely linked to the extent of information gathered about the target. In other words, the more comprehensive the data collected, the higher the chances of success.
Information gathering can be categorized into two main types: passive and active.

Passive Information Gathering

It entails collecting data without directly engaging with the target.

Identifying IP addresses & DNS information.
Identifying domain names and domain ownership information.
Identifying email addresses and social media profiles.
Identifying web technologies being used on target sites.
Identifying subdomains

Active Information Gathering

It involves actively interacting with the target system, but it requires proper authorization before being performed.

Discovering open ports on target systems.
Learning about the internal infrastructure of a target network/organization.
Enumerating information from target systems.

Passive Information Gathering

Website Recon & Footprinting

Reconnaissance, aka "recon", (often associated with information gathering) involves collecting information about a target system or organization using passive methods. This means gathering data without directly interacting with the target system, network, or organization.

Footprinting is a specific phase of reconnaissance that focuses on gathering detailed information about a target system, network, or organization in order to create a "footprint" of its digital presence. This involves both passive and active methods to collect data about IP addresses, domain names, network infrastructure, organization structure, technical contacts, and more.

In a website, we try to seek information about:

IP address,
Directories hidden from search engines,
Names,
Email address,
Phone numbers,
Physical Addresses,
Web technologies being used.

Whatis

The first thing that we can do is perform a DNS lookup by whatis command:

whatis website_url

to find information about website, for example if we have two address: it can means that website is behind a firewall/proxy.

Another good activity to do first to start is find website/robotx.txt file

The "robots.txt" file is a text file used by websites to communicate with web crawlers and search engine bots about which parts of the site should not be crawled or indexed. It is a standard used to provide instructions to web robots, also known as web crawlers or spiders, that automatically navigate and index websites for search engines like Google, Bing, and others.

Here are some key points about the "robots.txt" file:

Purpose: The main purpose of the "robots.txt" file is to control the behavior of web crawlers on a website. It allows website administrators to specify which parts of their site should not be crawled by search engine bots.
Location: The "robots.txt" file is typically placed in the root directory of a website. For example, if your website's domain is "example.com," the "robots.txt" file would be accessible at "https://www.example.com/robots.txt."
Syntax: The "robots.txt" file uses a simple syntax to specify user-agent (crawler) directives and disallow rules. User-agents are identifiers for specific web crawlers. Common user-agents include "Googlebot" for Google's crawler and "Bingbot" for Bing's crawler.
Directives:
- User-agent: Specifies the web crawler to which the following rules apply.
- Disallow: Instructs the specified user-agent not to crawl the specified URL paths.
- Allow: Overrides a disallow rule for specific URL paths.
- Sitemap: Specifies the location of the XML sitemap file that contains a list of URLs on the site.
Example: Here's an example of a simple "robots.txt" file:
```
User-agent: *
Disallow: /private/
Disallow: /temp/
Allow: /public/
Sitemap: https://www.example.com/sitemap.xml
```
In this example, the asterisk (*) as the user-agent means the rules apply to all crawlers. It disallows crawling of the "/private/" and "/temp/" directories while allowing crawling of the "/public/" directory. It also specifies the location of the sitemap.

XML Sitemap

Another good resource is XML Sitemap website/sitemap_index.xml, that provide an organized way of indexing the website.

To identify website technology or content management system used in a website, we can use:

Add_-_ons:

BuilWith,
Wappalyzer,

Cmd app:

WhatWeb

If we wanted to download the entire website and analyze the source code locally, we can use:

HTTrack Website Copier

Whois

Whois is a command and network protocol used to obtain detailed information about an internet domain name or an IP address. The term "whois" comes from the question "Who is?"

When you execute the "whois" command on a domain name or an IP address, you request information about the domain's owner, the registrar (the organization responsible for managing the domain registration), registration and expiration dates, technical and administrative contacts associated with the domain, and other related information.

For example, if you want to find out the registration details of a domain like "example.com," you can run the command "whois example.com" to retrieve relevant information about that domain.

Netcraft

Netcraft is a website tool used to do website recon, in details can be used for::

Website Analysis: Netcraft offers tools and services to analyze websites. You can use their tools to gather information about a website's hosting provider, domain history, technology stack, and other related information.

Security Research: Netcraft monitors and reports on internet security threats, including phishing attacks, malware distribution, and other online scams. They maintain a database of phishing websites and offer browser extensions that help users detect and avoid potentially harmful websites.

Server Surveys: Netcraft conducts regular surveys to gather data about web servers and internet technologies. They track trends in web server software usage, operating systems, and other related statistics. This information can be valuable for businesses and researchers to understand the landscape of internet infrastructure.

Domain and Hosting Information: Netcraft provides tools to retrieve information about domain registrations, IP addresses, and hosting providers. This can be useful for investigative purposes or for businesses looking to gather competitive intelligence.

Cybersecurity Services: Netcraft's services include monitoring for DDoS attacks, phishing attempts, and other security threats. They offer services to help organizations identify and mitigate online threats to their websites and online services.

Data Visualization: Netcraft often presents its data and research through visualizations and reports. These can be helpful for understanding internet trends, security threats, and other relevant information.

Historical Data: Netcraft's historical data can be used to track changes in websites and online infrastructure over time. This can be useful for businesses looking to analyze their own or their competitors' online presence.

DNS Reconnaissance

DNS recon,aka DNS reconnaissance, refers to the process of gathering information about domain names, their associated IP addresses, and other related domain records.

DNS (Domain Name System) is the protocol used to translate human-readable domain names (like www.example.com) into IP addresses that computers use to communicate over the internet. DNS recon involves querying DNS servers to gather information about the target domain's DNS records. The information collected can provide valuable insights into the target's internet presence and infrastructure.

Here are some common aspects of DNS reconnaissance:

DNS Records: Different types of DNS records provide various types of information. For instance, an A record maps a domain to an IP address, an MX record specifies mail servers for email delivery, and a TXT record can hold arbitrary text data, often used for verification or DNS-based authentication.

Subdomain Enumeration: Subdomain enumeration is a common aspect of DNS recon. It involves discovering subdomains associated with a target domain. Attackers often use subdomain enumeration to identify potential entry points into a target's network.

Reverse DNS Lookup: Reverse DNS lookup involves querying the DNS system to find the domain name associated with a given IP address. This can help in identifying the domains hosted on a specific IP address.

DNS Zone Transfers: Zone transfers are a mechanism by which authoritative DNS servers share their DNS records with other authorized servers. Misconfigured DNS servers might allow unauthorized zone transfers, which can provide attackers with valuable information about the target's DNS infrastructure.

Cache Snooping: DNS caches store recently resolved DNS records to improve query response times. Cache snooping involves querying a DNS cache to retrieve information about recently resolved domain names. This can reveal information about internal network structure and previously visited websites.

Local DNS with /etc/hosts

The "hosts" file, often located at "/etc/hosts" on Unix-like systems (including Linux and macOS), is a local text file that maps hostnames to IP addresses. It's used by the operating system to resolve hostnames to IP addresses before making DNS queries. This file predates the Domain Name System (DNS) and provides a simple and local method for hostname resolution.

The format of the hosts file is straightforward: each line contains an IP address followed by one or more hostnames associated with that IP address. For example:

127.0.0.1   localhost
192.168.1.1 example.com

In this example, the "localhost" hostname is mapped to the loopback IP address 127.0.0.1, and the "example.com" hostname is mapped to the IP address 192.168.1.1.

The utility of the hosts file includes:

Local DNS Resolution: The hosts file provides a way to manually define hostname-to-IP mappings without relying on external DNS servers. This can be useful for testing and development purposes, as well as for mapping local network resources.
Ad Blocking and Content Redirection: Users can use the hosts file to block access to certain websites by redirecting their domain names to a non-existent or local IP address. This is often used to block ads and unwanted content at the local level.
Avoiding DNS Lookup Delays: Since the hosts file is consulted before DNS queries are made, it can provide faster resolution for frequently accessed domains, improving the speed of network operations.
Local Network Customization: In small local networks or closed environments, administrators can use the hosts file to ensure that specific hostnames resolve to specific IP addresses, even if those hostnames are not registered in the global DNS.

DNSRecon

DNSRecon is a Python script that provides the ability to perform:

Check all NS Records for Zone Transfers.
Enumerate General DNS Records for a given Domain (MX, SOA, NS, A, AAAA, SPF and TXT).
Perform common SRV Record Enumeration.
Top Level Domain (TLD) Expansion.
Check for Wildcard Resolution.
Brute Force subdomain and host A and AAAA records given a domain and a wordlist.
Perform a PTR Record lookup for a given IP Range or CIDR.
Check a DNS Server Cached records for A, AAAA and CNAME
Records provided a list of host records in a text file to check.
Enumerate Hosts and Subdomains using Google.

Command Examples

Scan a domain and save the results to a SQLite database:

# dnsrecon --domain example.com --db path/to/database.sqlite

Scan a domain, specifying the nameserver and performing a zone transfer:

# dnsrecon --domain example.com --name_server nameserver.example.com --type axfr

Scan a domain, using a brute-force attack and a dictionary of subdomains and hostnames:

# dnsrecon --domain example.com --dictionary path/to/dictionary.txt --type brt

Scan a domain, performing a reverse lookup of IP ranges from the SPF record and saving the results to a JSON file:

# dnsrecon --domain example.com -s --json

Scan a domain, performing a Google enumeration and saving the results to a CSV file:

# dnsrecon --domain example.com -g --csv

Scan a domain, performing DNS cache snooping:

# dnsrecon --domain example.com --type snoop --name_server nameserver.example.com --dictionary path/to/dictionary.txt

Scan a domain, performing zone walking:

# dnsrecon --domain example.com --type zonewalk

DNSdumpster

DNSdumpster is a domain research tool that can discover hosts related to a domain. Finding visible hosts from the attackers perspective is an important part of the security assessment process.

No brute force subdomain enumeration is used as is common in dns recon tools that enumerate subdomains. We use open source intelligence resources to query for related domain data. It is then compiled into an actionable resource for both attackers and defenders of Internet facing systems.

More than a simple DNS lookup this tool will discover those hard to find sub-domains and web hosts. The search relies on data from our crawls of the Alexa Top 1 Million sites, Search Engines, Common Crawl, Certificate Transparency, Max Mind, Team Cymru, Shodan and scans.io.

WAF Recon

What is WAF?

WAF stands for "Web Application Firewall." It is a security solution designed to protect web applications from a variety of online threats and attacks. Web applications are software applications that run on web servers and are accessible via web browsers. Examples of web applications include online stores, social media platforms, online banking sites, and more.

A Web Application Firewall works by analyzing incoming and outgoing HTTP/HTTPS requests to a web application. It aims to identify and mitigate various types of attacks that could exploit vulnerabilities in the application's code or configuration. Some of the common types of attacks that WAFs help prevent include:

SQL Injection: Attackers attempt to insert malicious SQL statements into input fields to manipulate or extract data from a database.
Cross-Site Scripting (XSS): Attackers inject malicious scripts into web pages viewed by other users, potentially leading to the theft of sensitive data or the hijacking of user sessions.
Cross-Site Request Forgery (CSRF): Attackers trick users into performing actions on a web application without their consent or knowledge.
Remote File Inclusion (RFI) and Local File Inclusion (LFI): Attackers attempt to include malicious files on the server to execute arbitrary code or access sensitive information.
Brute Force Attacks: Attackers repeatedly try different username and password combinations to gain unauthorized access to an application.
OWASP Top Ten Vulnerabilities: WAFs help address vulnerabilities listed in the OWASP Top Ten, a well-known list of the most critical web application security risks.

Wafw00f

WAFW00F stands for "Web Application Firewall Detection Tool." It is an open-source Python tool designed to identify and fingerprint Web Application Firewalls (WAFs) that are in use by a target website. A Web Application Firewall is a security tool designed to protect web applications from various online threats, such as SQL injection, cross-site scripting (XSS), and other types of attacks.

The scope of WAFW00F is to help security professionals, penetration testers, and web developers identify whether a target website is behind a WAF and potentially gather information about the specific WAF being used. This information can be valuable for assessing the security posture of a web application and understanding its defensive measures.

WAFW00F works by sending specially crafted HTTP requests to a target website and analyzing the responses. Based on patterns, headers, and behavior in the responses, it attempts to determine whether a WAF is in place and, if possible, provide information about the WAF vendor or technology. It can also help identify bypass techniques that attackers might use to evade detection by the WAF.

root@kali:~# wafw00f -h
Usage: wafw00f url1 [url2 [url3 ... ]]
example: wafw00f http://www.victim.org/

Options:
  -h, --help            show this help message and exit
  -v, --verbose         Enable verbosity, multiple -v options increase
                        verbosity
  -a, --findall         Find all WAFs which match the signatures, do not stop
                        testing on the first one
  -r, --noredirect      Do not follow redirections given by 3xx responses
  -t TEST, --test=TEST  Test for one specific WAF
  -o OUTPUT, --output=OUTPUT
                        Write output to csv, json or text file depending on
                        file extension. For stdout, specify - as filename.
  -f FORMAT, --format=FORMAT
                        Force output format to csv, json or text.
  -i INPUT, --input-file=INPUT
                        Read targets from a file. Input format can be csv,
                        json or text. For csv and json, a `url` column name or
                        element is required.
  -l, --list            List all WAFs that WAFW00F is able to detect
  -p PROXY, --proxy=PROXY
                        Use an HTTP proxy to perform requests, examples:
                        http://hostname:8080, socks5://hostname:1080,
                        http://user:pass@hostname:8080
  -V, --version         Print out the current version of WafW00f and exit.
  -H HEADERS, --headers=HEADERS
                        Pass custom headers via a text file to overwrite the
                        default header set.

Subdomain Enumeration

What is Subdomain?

A subdomain is a part of a larger domain in the Domain Name System (DNS) hierarchy. In a domain name like "example.com," the "example" is the main domain, and a subdomain would be a prefix added to it, such as "sub.example.com." Subdomains are used to organize and navigate different sections or services within a domain, and they allow for better management and segregation of content or functionality.

The scope of subdomains is to create a structured hierarchy within a domain and provide a way to categorize and manage various aspects of a website or online presence. Here are some common use cases and benefits of using subdomains:

Content Organization: Subdomains can be used to organize different types of content or services. For example, a website might have "blog.example.com" for a blog section and "shop.example.com" for an online store.
Geographical Segmentation: Companies with a global presence might use subdomains to serve localized content, like "us.example.com" for the U.S. and "uk.example.com" for the United Kingdom.
Departmental Separation: Large organizations can use subdomains to separate different departments or teams. For instance, "hr.example.com" could be used for the Human Resources department.
Testing and Development: Subdomains are often used for testing new features or development versions of a website. For example, "dev.example.com" might host a development version.
Mobile and App Access: Subdomains can be used to provide access to mobile or app-specific content. "m.example.com" might be used for the mobile version of a site.
API Endpoints: Subdomains can be dedicated to hosting APIs, such as "api.example.com," providing a clear separation of API resources from the main website.
Tracking and Analytics: Subdomains can be used to set up tracking and analytics services separately from the main website, helping to manage data more effectively.
Security and Isolation: Subdomains can offer a level of isolation between different parts of a website. If one subdomain is compromised, it might not automatically jeopardize other subdomains.

Sublist3r

Sublist3r

is an open-source Python tool designed for subdomain discovery. Its primary purpose is to enumerate and gather information about subdomains associated with a particular domain. Subdomains are essentially sub-sections of a larger domain, and discovering them can be useful for various purposes, including security assessments, penetration testing, and gathering information about an organization's online presence.

The scope of Sublist3r includes:

Subdomain Enumeration: Sublist3r aims to identify as many subdomains as possible associated with a given domain. This process involves querying different sources such as search engines, DNS records, and public databases to uncover subdomains.
Security Assessments: Security professionals, ethical hackers, and penetration testers often use tools like Sublist3r to gather information about the subdomains of a target domain. This information can be valuable for identifying potential entry points for attacks, security vulnerabilities, or misconfigurations.
Red Team Operations: In red teaming scenarios, where security teams simulate real-world attacks to test an organization's defenses, Sublist3r can be used to identify potential attack surfaces that adversaries might exploit.
Domain Footprinting: Subdomain enumeration can help security researchers and analysts build a comprehensive profile of an organization's online presence, which can be useful for understanding an organization's attack surface and digital footprint.
Asset Discovery: Organizations might use Sublist3r to discover subdomains associated with their domain to ensure they have visibility into all the resources and services available online.
Monitoring and Threat Intelligence: Regularly monitoring subdomains can help organizations identify any unauthorized or potentially malicious subdomains that could be set up by attackers for phishing or other malicious purposes.

root@kali:~# sublist3r -h
usage: sublist3r [-h] -d DOMAIN [-b [BRUTEFORCE]] [-p PORTS] [-v [VERBOSE]]
                 [-t THREADS] [-e ENGINES] [-o OUTPUT] [-n]

OPTIONS:
  -h, --help            show this help message and exit
  -d DOMAIN, --domain DOMAIN
                        Domain name to enumerate it's subdomains
  -b [BRUTEFORCE], --bruteforce [BRUTEFORCE]
                        Enable the subbrute bruteforce module
  -p PORTS, --ports PORTS
                        Scan the found subdomains against specified tcp ports
  -v [VERBOSE], --verbose [VERBOSE]
                        Enable Verbosity and display results in realtime
  -t THREADS, --threads THREADS
                        Number of threads to use for subbrute bruteforce
  -e ENGINES, --engines ENGINES
                        Specify a comma-separated list of search engines
  -o OUTPUT, --output OUTPUT
                        Save the results to text file
  -n, --no-color        Output without color

Example: python3 /usr/bin/sublist3r -d google.com

Google Dorks

Google Dorks, also known as Google Hacking or Google Search Operators, refer to specific search queries that leverage Google's advanced search operators to discover hidden or sensitive information on the internet. These operators allow users to refine their search queries and retrieve specific types of information that might not be readily available through standard searches. Google Dorks are often used by security professionals, hackers, and researchers to find vulnerabilities, exposed data, and potential security issues.

The scope of Google Dorks includes:

Information Gathering: Google Dorks can be used to gather information about a target organization, including exposed files, directories, and subdomains. This information can be used by security professionals to assess potential risks and vulnerabilities.
Vulnerability Discovery: Hackers and security researchers use Google Dorks to discover websites or web applications that might be misconfigured, vulnerable to attacks, or have exposed sensitive information.
Sensitive Data Disclosure: Google Dorks can help uncover sensitive information like passwords, usernames, database credentials, and confidential documents that have been inadvertently exposed online.
Website Enumeration: Attackers might use Google Dorks to enumerate directories and files on a website, identifying potential entry points for attacks or unauthorized access.
Exploitation: If an attacker identifies a vulnerable website or application using Google Dorks, they might attempt to exploit the vulnerabilities they find.
Digital Footprint Analysis: Organizations can use Google Dorks to assess their digital footprint and identify potential security gaps that could be exploited by malicious actors.
Phishing and Social Engineering: Attackers might use Google Dorks to find email addresses, employee names, and other information to craft targeted phishing emails or conduct social engineering attacks.
Research and Education: Security professionals and researchers can use Google Dorks to study online security issues, gather data for research, and understand potential risks and trends.

Here are a few examples of how Google Dorks can be used to search for specific types of information:

Finding Open Directories:
```
intitle:"index of" "parent directory"
```
This dork searches for directories that might not have an index file and could potentially expose the contents of the directory.
Finding Exposed Documents:
```
filetype:pdf site:example.com
```
This dork searches for PDF files on a specific website domain. It can be used to find potentially sensitive documents that are publicly accessible.
Identifying Vulnerable Servers:
```
intitle:"Apache HTTP Server Test Page" intext:"It works!"
```
This dork looks for servers running the Apache HTTP Server with its default test page. It indicates that the server might be misconfigured or not fully secured.
Discovering Login Pages:
```
inurl:login site:example.com
```
This dork searches for login pages within a specific website domain. It might help identify potential entry points for unauthorized access.
Finding Publicly Exposed Files:
```
site:example.com ext:sql
```
This dork searches for SQL files on a specific website. It could potentially expose database files or SQL scripts.
Identifying Security Cameras:
```
inurl:view/index.shtml
```

We can find more custom and pretty Google Dorks using query on https://www.exploit-db.com/google-hacking-database.

https://www.exploit-db.com/google-hacking-database

Wayback Machine

The Wayback Machine, operated by the Internet Archive, is a digital archive that captures and stores snapshots of websites and web pages over time. Its primary purpose is to preserve the history of the internet by archiving web content, allowing users to access past versions of websites and track changes that have occurred over the years.

The scope of the Wayback Machine includes:

Historical Website Access: The Wayback Machine provides a valuable resource for researchers, historians, and the general public to access and explore the historical content of websites. It allows users to see how websites looked and what content they contained at different points in the past.
Content Verification: The archived snapshots on the Wayback Machine can serve as a means to verify information and claims made on websites. If a website's content changes or is taken down, the archived version can still be accessed for reference.
Legal and Regulatory Compliance: Archived versions of web pages can be used as evidence in legal cases or to demonstrate compliance with certain regulations at specific points in time.
Website Evolution: Website owners and developers can use the Wayback Machine to track the evolution of their websites, review design changes, and identify past content.
Recovery of Lost Content: If a website experiences data loss or accidental deletion, the Wayback Machine might have archived versions that can help recover lost content.
Research and Academic Study: Researchers and scholars can use the Wayback Machine to study the evolution of online trends, technologies, and information dissemination.
Cultural and Social Analysis: The archived web content can offer insights into the cultural and social changes reflected in online communication over time.
Digital Preservation: The Wayback Machine plays a crucial role in digital preservation by preventing the loss of valuable online content due to website closures, changes, or technological obsolescence.

example of usage for google.com

Email Harvesting

Email harvesting refers to the process of collecting or extracting email addresses from various sources, typically with the intention of building a list for marketing, communication, or other purposes. This practice involves using automated tools, scripts, or manual methods to gather email addresses from websites, online platforms, or databases. Email addresses obtained through email harvesting are often used for email marketing campaigns, spamming, phishing attacks, or other unsolicited communications.

Email harvesting can be conducted in various ways:

Web Scraping: Automated bots or scripts crawl websites and extract email addresses from publicly accessible pages, such as contact pages, forums, comment sections, or user profiles.
Search Engine Queries: Attackers might use specific search queries, also known as "Google Dorks," to find web pages containing email addresses, and then extract them.
Online Directories: Some email harvesters target online directories, business listings, or social media platforms to gather email addresses associated with individuals or organizations.
Spamming and Phishing: Cybercriminals may harvest email addresses for the purpose of sending spam emails or conducting phishing attacks to trick recipients into revealing personal information.
Data Breaches: In some cases, attackers obtain email addresses through data breaches of websites, databases, or organizations, where sensitive information is leaked.
Credential Stuffing: Attackers use harvested email addresses to perform credential stuffing attacks, where they use lists of email and password combinations to gain unauthorized access to online accounts.

The Harvester

The Harvester is an open-source information-gathering tool used in cybersecurity and penetration testing. It's designed to gather valuable data and information about a target domain from various public sources on the internet. This tool is particularly useful during the reconnaissance phase of security assessments to collect information that could help identify potential vulnerabilities, weaknesses, and attack vectors.

The scope of The Harvester includes:

Email Harvesting: The Harvester can search for email addresses associated with the target domain by querying search engines, social media platforms, online profiles, and other sources. This can be helpful for building contact lists, understanding an organization's communication structure, and assessing email address exposure.
Subdomain Enumeration: The tool can discover subdomains associated with the target domain by querying DNS servers, search engines, and other resources. Subdomain enumeration aids in identifying additional entry points and potential targets.
Host Discovery: The Harvester can identify hosts and IP addresses linked to the target domain. This information provides insights into the organization's infrastructure and can help assess potential vulnerabilities.
Network Enumeration: The tool can determine open ports on hosts, revealing the services running on those systems. This can assist in identifying potential attack vectors and points of entry.
Employee Name Gathering: The Harvester can search for employee names associated with the target organization on platforms like LinkedIn and other public profiles. This information might be used for social engineering or targeted attacks.
Metadata Extraction: The tool can extract metadata from documents and files linked to the target domain, potentially revealing sensitive information embedded in files.
Use in Penetration Testing: Penetration testers and cybersecurity professionals utilize The Harvester during reconnaissance to amass data that aids in identifying potential weak spots, avenues of attack, and areas of concern.
Information Verification: Results from The Harvester can be used to verify publicly available information about the target organization.

root@kali:~# theHarvester -h
*******************************************************************
*  _   _                                            _             *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __|  _ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* theHarvester 4.3.0                                              *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* cmartorella@edge-security.com                                   *
*                                                                 *
*******************************************************************
usage: theHarvester [-h] -d DOMAIN [-l LIMIT] [-S START] [-p] [-s]
                    [--screenshot SCREENSHOT] [-v] [-e DNS_SERVER] [-t]
                    [-r [DNS_RESOLVE]] [-n] [-c] [-f FILENAME] [-b SOURCE]

theHarvester is used to gather open source intelligence (OSINT) on a company
or domain.

options:
  -h, --help            show this help message and exit
  -d DOMAIN, --domain DOMAIN
                        Company name or domain to search.
  -l LIMIT, --limit LIMIT
                        Limit the number of search results, default=500.
  -S START, --start START
                        Start with result number X, default=0.
  -p, --proxies         Use proxies for requests, enter proxies in
                        proxies.yaml.
  -s, --shodan          Use Shodan to query discovered hosts.
  --screenshot SCREENSHOT
                        Take screenshots of resolved domains specify output
                        directory: --screenshot output_directory
  -v, --virtual-host    Verify host name via DNS resolution and search for
                        virtual hosts.
  -e DNS_SERVER, --dns-server DNS_SERVER
                        DNS server to use for lookup.
  -t, --take-over       Check for takeovers.
  -r [DNS_RESOLVE], --dns-resolve [DNS_RESOLVE]
                        Perform DNS resolution on subdomains with a resolver
                        list or passed in resolvers, default False.
  -n, --dns-lookup      Enable DNS server lookup, default False.
  -c, --dns-brute       Perform a DNS brute force on the domain.
  -f FILENAME, --filename FILENAME
                        Save the results to an XML and JSON file.
  -b SOURCE, --source SOURCE
                        anubis, baidu, bevigil, binaryedge, bing, bingapi,
                        bufferoverun, brave, censys, certspotter, criminalip,
                        crtsh, dnsdumpster, duckduckgo, fullhunt, github-code,
                        hackertarget, hunter, hunterhow, intelx, otx,
                        pentesttools, projectdiscovery, rapiddns, rocketreach,
                        securityTrails, sitedossier, subdomainfinderc99,
                        threatminer, urlscan, virustotal, yahoo, zoomeye

Leaked Password Database

A Leaked Password Database refers to a collection of compromised or stolen passwords that have been exposed through data breaches or security incidents. These databases contain passwords that were once considered confidential but have been exposed to unauthorized parties due to various factors such as vulnerabilities, poor security practices, or cyberattacks on websites, services, or organizations.

The leaked password databases typically include:

Username-Password Pairs: These databases often pair usernames or email addresses with the corresponding passwords. This information is valuable to cybercriminals as they can use it for unauthorized access to various online accounts.
Cryptography and Hashes: Passwords in these databases might be stored in hashed or encrypted forms. Cybersecurity researchers and attackers alike can attempt to crack these hashes to reveal the original passwords.
Data Breach Sources: Leaked password databases might specify the source of the data breach, allowing individuals to understand which websites or services have been compromised.

Haveibeenpwned

Have I Been Pwned (HIBP) is a free online service and website created by security researcher Troy Hunt. The purpose of Have I Been Pwned is to help individuals determine if their email addresses, usernames, and other personal information have been exposed in data breaches and compromised in various security incidents.

The main features and scope of Have I Been Pwned include:

Data Breach Monitoring: HIBP continuously monitors and collects data from known data breaches, security incidents, and leaks. It compiles information about compromised email addresses, usernames, passwords, and other personal data.
Searchable Database: Users can visit the Have I Been Pwned website and enter their email addresses or usernames to check if their information has been involved in any data breaches. If a match is found, the service provides details about the breach and the type of information exposed.
Password Exposure Check: Have I Been Pwned also offers a feature that allows users to check if their passwords have been compromised in breaches. Users can input a password to see if it has appeared in any known breaches. This is done without sending the actual password to the service, ensuring security.
Notification Service: Users can subscribe to the Have I Been Pwned notification service, which sends an alert if their email address appears in future data breaches.
API Access: Developers and organizations can access the Have I Been Pwned API to integrate breach data and notification features into their own applications or services.
Education and Security Awareness: The website educates users about the importance of strong, unique passwords, proper password management, and the risks associated with data breaches.

Active Information Gathering

{% embed url="https://www.lmgsecurity.com/the-pentesters-code-of-conduct-rules-that-keep-everyone-safe/" %}

DNS Zone Transfers

DNS records, or Domain Name System records, are essentially pieces of information stored in DNS servers that help translate human-readable domain names (like www.example.com) into IP addresses (like 192.168.1.1) that computers use to identify each other on a network. DNS records play a crucial role in facilitating the browsing experience and various other network-related activities. Here are some common types of DNS records:

A Record (Address Record): This record maps a domain name to an IPv4 address. It is used to translate domain names into the numerical IP addresses that computers use to identify each other on the internet.
AAAA Record (IPv6 Address Record): Similar to the A record, the AAAA record maps a domain name to an IPv6 address. IPv6 addresses are used to handle the growing need for unique IP addresses due to the increasing number of devices connected to the internet.
CNAME Record (Canonical Name Record): The CNAME record is used to create an alias for a domain name. It points one domain name to another, allowing multiple domain names to be associated with the same IP address. This is often used for subdomains and load balancing.
MX Record (Mail Exchange Record): MX records specify the mail servers responsible for receiving email messages on behalf of a domain. These records help route email messages to the correct email servers.
TXT Record (Text Record): TXT records store various text-based information about a domain. They are often used for verification and authentication purposes, such as SPF (Sender Policy Framework) records for email authentication.
SRV Record (Service Record): SRV records provide information about available services on a domain. They are commonly used for services like VOIP, instant messaging, and other communication protocols.
PTR Record (Pointer Record): PTR records perform the reverse of what A and AAAA records do. They map IP addresses to domain names, aiding in reverse DNS lookup, which can help verify the authenticity of an IP address.
NS Record (Name Server Record): NS records indicate which DNS servers are authoritative for a particular domain. They point to the DNS servers that hold the definitive DNS information for a domain.
SOA Record (Start of Authority Record): The SOA record contains administrative information about the domain, including the primary authoritative DNS server, contact details for the domain administrator, and various timing information related to updates and caching.

DNS zone transfer is a process used to replicate or synchronize DNS data (zone data) from one DNS server to another. It is commonly employed in environments where multiple DNS servers need to maintain consistent and up-to-date records for a particular domain or set of domains. Zone transfers are critical for redundancy, fault tolerance, and load distribution within the DNS infrastructure.

Here's how the DNS zone transfer process works:

Primary (Master) Server: The primary DNS server (also known as the master server) holds the authoritative copy of the DNS zone data. This server is considered the source of truth for the zone's records.
Secondary (Slave) Servers: Secondary DNS servers (also called slave servers) are responsible for obtaining a copy of the zone data from the primary server. These secondary servers are set up to request and receive updates from the primary server periodically or when changes occur.
Zone Transfer: When a secondary server is initially configured or when there are changes to the zone data on the primary server, it initiates a zone transfer. This transfer involves the secondary server requesting the updated zone data from the primary server.
Transfer Methods: There are two main methods of DNS zone transfer:
- Full Zone Transfer (AXFR): This method involves the secondary server requesting the entire zone data from the primary server. It's used when the secondary server doesn't have any data or needs a complete refresh.
- Incremental Zone Transfer (IXFR): In this method, the secondary server requests only the changes or updates that have occurred since its last synchronization. This reduces the amount of data transferred and is more efficient.
Maintaining Consistency: After the zone transfer, the secondary server updates its records to match those of the primary server. This ensures that both primary and secondary servers have consistent DNS information.

We can find DNS record using again tools for passive/active recon how: DNSDumpster and DNSRecon, or active recon tools as:

DNSenum

DNSenum is a network reconnaissance tool used for gathering information from DNS (Domain Name System) servers. Its primary purpose is to perform comprehensive DNS enumeration, which involves querying DNS servers to gather various types of information about a target domain. DNSenum is particularly useful for penetration testing, network security assessments, and information gathering during the reconnaissance phase of an attack.

The scope of DNSenum includes:

DNS Information Gathering: DNSenum queries the DNS servers of a target domain to gather a wide range of information. This includes discovering subdomains, IP addresses associated with domain names, mail server information (MX records), name server information (NS records), and more.
Subdomain Enumeration: DNSenum is often used to discover subdomains associated with a target domain. This can help identify potential entry points or misconfigurations that could be exploited by attackers.
Zone Transfer Testing: DNSenum can check if a DNS server allows zone transfers. Zone transfers, when improperly configured, can leak sensitive DNS data, potentially aiding attackers in mapping out an organization's network structure.
Brute-Force Enumeration: DNSenum can perform brute-force enumeration of subdomains and hostnames, attempting to find valid DNS entries by trying different combinations of names.
Reverse DNS Lookup: The tool can perform reverse DNS lookups to find domain names associated with given IP addresses.
Query Multiple DNS Servers: DNSenum can query multiple DNS servers to gather a broader perspective on the DNS records and improve the accuracy of the information gathered.
Network Mapping: By identifying various DNS records associated with a domain, DNSenum can assist in mapping out an organization's network infrastructure, including mail servers, subdomains, and other services.
Penetration Testing and Security Auditing: Security professionals use DNSenum to identify potential weaknesses in DNS configurations that could be exploited by attackers. It's an essential tool for penetration testers and ethical hackers during security assessments.

root@kali:~# dnsenum -h
dnsenum VERSION:1.2.6
Usage: dnsenum [Options] <domain>
[Options]:
Note: If no -f tag supplied will default to /usr/share/dnsenum/dns.txt or
the dns.txt file in the same directory as dnsenum
GENERAL OPTIONS:
  --dnsserver 	<server>
			Use this DNS server for A, NS and MX queries.
  --enum		Shortcut option equivalent to --threads 5 -s 15 -w.
  -h, --help		Print this help message.
  --noreverse		Skip the reverse lookup operations.
  --nocolor		Disable ANSIColor output.
  --private		Show and save private ips at the end of the file domain_ips.txt.
  --subfile <file>	Write all valid subdomains to this file.
  -t, --timeout <value>	The tcp and udp timeout values in seconds (default: 10s).
  --threads <value>	The number of threads that will perform different queries.
  -v, --verbose		Be verbose: show all the progress and all the error messages.
GOOGLE SCRAPING OPTIONS:
  -p, --pages <value>	The number of google search pages to process when scraping names,
			the default is 5 pages, the -s switch must be specified.
  -s, --scrap <value>	The maximum number of subdomains that will be scraped from Google (default 15).
BRUTE FORCE OPTIONS:
  -f, --file <file>	Read subdomains from this file to perform brute force. (Takes priority over default dns.txt)
  -u, --update	<a|g|r|z>
			Update the file specified with the -f switch with valid subdomains.
	a (all)		Update using all results.
	g		Update using only google scraping results.
	r		Update using only reverse lookup results.
	z		Update using only zonetransfer results.
  -r, --recursion	Recursion on subdomains, brute force all discovered subdomains that have an NS record.
WHOIS NETRANGE OPTIONS:
  -d, --delay <value>	The maximum value of seconds to wait between whois queries, the value is defined randomly, default: 3s.
  -w, --whois		Perform the whois queries on c class network ranges.
			 **Warning**: this can generate very large netranges and it will take lot of time to perform reverse lookups.
REVERSE LOOKUP OPTIONS:
  -e, --exclude	<regexp>
			Exclude PTR records that match the regexp expression from reverse lookup results, useful on invalid hostnames.
OUTPUT OPTIONS:
  -o --output <file>	Output in XML format. Can be imported in MagicTree (www.gremwell.com)

Example of usage

Here we've two NS: nsztm1.digi.ninja and nsztm2.digi.ninja.

If the result of scan contains internal IP/records we can have a clearly idea of network.

Another good tool to do DNS zone transfer is dig, it doens't do a brute force but only zone transfer:

Dig

dig axfr @nsztm1.digi.ninja zonetransfer.me

Another famous tool is fierce, it's more less invasive than last two.

Host Discovery With Nmap

Here's below my Nmap cheatsheet:

{% content-ref url="http://127.0.0.1:5000/s/iS3hadq7jVFgSa8k5wRA/pratical-ethical-hacker-notes/nmap" %} Nmap {% endcontent-ref %}

Nmap, short for "Network Mapper," is an open-source and widely used network scanning tool. It is designed to discover and map devices and services on a computer network, thereby providing information about the network's topology, hosts, open ports, and various services running on those hosts.

Nmap operates by sending packets to the target network and analyzing the responses it receives. It can perform a variety of network reconnaissance tasks, including:

Host Discovery: Nmap can determine which hosts are alive on a network by sending ICMP (ping) probes or other types of network requests.
Port Scanning: Nmap can scan target hosts for open ports and services. This is crucial for identifying potential security vulnerabilities and understanding the services running on the network.
Service Detection: Once open ports are identified, Nmap can attempt to determine the specific services that are running on those ports. This helps in understanding the software and protocols in use.
Operating System Detection: Nmap can also attempt to identify the operating system of the target hosts based on various network responses and characteristics.
Version Detection: Nmap can gather information about the versions of services running on open ports, which can aid in assessing vulnerabilities.
Scripting and Automation: Nmap allows users to write and execute custom scripts (using the Nmap Scripting Engine) to perform specific tasks, like vulnerability scanning, banner grabbing, and more.

We can do host discovery using:

nmap -Sn <target/subnet>

The -Pn flag in Nmap is used to instruct Nmap not to perform "ping" scanning to determine whether hosts are reachable or not. This can be useful when you want to scan hosts where ping might be blocked or disabled (usually in windows device).

However, DNS resolution is a separate aspect, and the -Pn option does not directly impact DNS resolution. If you want to avoid DNS resolution during scanning and only want to detect hosts without performing ping, you can use the -n option. Here's how you can combine the two options:

nmap -n -Pn <target>

Where <target> represents the IP address or hostname you want to scan. With this syntax, you're telling Nmap not to perform DNS resolution (-n) and not to perform ping (-Pn), focusing solely on scanning ports and services on the specified hosts.

Net Discovery

Another good tool to do host discovery is: NetDiscovery, that permits to do:

Network Discovery: Netdisco uses various methods, such as SNMP (Simple Network Management Protocol) and ARP (Address Resolution Protocol) scanning, to discover devices on the network and collect information about them.
Device Tracking: It tracks the physical location of devices based on the switch and port they are connected to. This can be useful for troubleshooting and inventory management.
Port and VLAN Management: Netdisco helps in managing port configurations, tracking the status of ports, and managing VLAN (Virtual LAN) assignments.
IP Address Management: It can manage IP address allocations and assist in tracking IP address usage and assignments.
Visualization: Netdisco provides visual representations of network topologies and connections, making it easier to understand the layout of the network.
Historical Data: The tool can keep historical data about device and network changes, aiding in troubleshooting and understanding network behavior over time.

Port Scanning With Nmap

nmap --open -p0- -n -Pn -sCV -vvv --min-rate 5000 <Target> -oG nmap/port_scan

command	result
--open	only open ports
sC	run default scripts
sV	enumerate versions
-p0-	search all ports [0 - 65535]
--min-rate	minimun packet sent for second
vvv	more verbosity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.1-information-gathering.md

1.1-information-gathering.md

1.1 Information Gathering

What is Information Gathering?

Passive Information Gathering

Active Information Gathering

Passive Information Gathering

Website Recon & Footprinting

Whatis

XML Sitemap

Whois

Netcraft

DNS Reconnaissance

Local DNS with /etc/hosts

DNSRecon

Command Examples

DNSdumpster

WAF Recon

What is WAF?

Wafw00f

Subdomain Enumeration

What is Subdomain?

Sublist3r

Google Dorks

Wayback Machine

Email Harvesting

The Harvester

Leaked Password Database

Haveibeenpwned

Active Information Gathering

DNS Zone Transfers

DNSenum

Dig

Host Discovery With Nmap

Net Discovery

Port Scanning With Nmap

Files

1.1-information-gathering.md

Latest commit

History

1.1-information-gathering.md

File metadata and controls

1.1 Information Gathering

What is Information Gathering?

Passive Information Gathering

Active Information Gathering

Passive Information Gathering

Website Recon & Footprinting

Whatis

XML Sitemap

Whois

Netcraft

DNS Reconnaissance

Local DNS with /etc/hosts

DNSRecon

Command Examples

DNSdumpster

WAF Recon

What is WAF?

Wafw00f

Subdomain Enumeration

What is Subdomain?

Sublist3r

Google Dorks

Wayback Machine

Email Harvesting

The Harvester

Leaked Password Database

Haveibeenpwned

Active Information Gathering

DNS Zone Transfers

DNSenum

Dig

Host Discovery With Nmap

Net Discovery

Port Scanning With Nmap