Add a Integration Test to scan the top 1k domains and validate that ZDNS's result is correct #370

phillip-stephens · 2024-05-28T21:37:18Z

We have an extensive suite of integration tests that validate that ZDNS can correctly pull most DNS record types for zdns-testing.com. This validates ZDNS's ability to parse dns records but doesn't really validate ZDNS being able to scan many domains successfully. ZDNS's use of multiple worker threads each requesting a domain has logic that should be tested with a larger-scale scan.

The challenge with doing any sort of validation that a given IP A actually hosts a domain X is that many domains are hosted with hosting providers that provide anti-DDoS protection. This makes any sort of automated requesting difficult.

However, a solution is to validate ZDNS only against domains which do not have such anti-DDoS measures.

Therefore, this test has 2 phases:

get which of the top-1k domains can be successfully reached with a GET request. -> domains that are request-able in an automated fashion
From these request-able domains, we run ZDNS on it and then attempt to send a GET request to the returned IP. These should all be successful.

Changes

Changed Github CI to only run on pull requests and branches named main. This prevents the previous behavior where an action was started for every branch change and every PR change, effectively meaning every time a PR had a push 2 CI actions would be kicked off for each test, with 1 of each being redundant.
Added a integration test to scan the top domains.

…oubling the effort on all PRs

…s into phillip/large-scale-integration

zakird · 2024-05-29T06:32:52Z

I'm not opposed to this PR, but it does feel like a very roundabout way of testing this too by relying on similarity in HTTP(S) requests, which I suspect could be fragile. Is there any reason that this is necessary, versus, for example, looking up domains and seeing the success rate that we see from the scan?

phillip-stephens · 2024-05-29T16:55:33Z

In my mind, that rests on the assumption that the A record returned by ZDNS is accurate and ZDNS is able to correctly ascertain whether it was able to get the correct record for a given domain.

I could imagine all sorts of bugs that could mess with that and cause ZDNS to not error but also return inaccurate records for a given domain. (Ofc if the name server returns an invalid IP, that's not a ZDNS bug. I'm imagining a ZDNS bug where either through bugs in the cache or recursive logic it returns a record that is inaccurate with reference to the name server for said domain).

This test validates that a given A record for a domain has an IP that truly hosts said domain without relying on ZDNS for any proxy of success and IMO that's what we really want from an integration test. Presumably, the top 100 domains are going to have more strict quality control on their DNS infrastructure and so I'd imagine we'd rarely run into bugs where the issue is on the DNS name server itself.

As for the fragility of the test, yeah I agree it's a bit fragile since it doesn't differentiate between a bad result returned by a name server or a bug in ZDNS. Also, that requests library can simply error. I think there's immense value in having a sort of real scan sanity check for accuracy like this, but I'm very open to suggestions on how to improve its fragility.

zakird · 2024-05-29T16:59:29Z

Can we separate this test out in Github so that we can monitor success/failure separately?

phillip-stephens · 2024-05-29T22:59:21Z

@zakird Looking at the Checks on this PR, the test is run separately (called build-and-test-large-scale)

phillip-stephens added 24 commits May 24, 2024 10:01

testing if github runner can scan 1k domains without crazy timeouts

92412fd

seeing results

bb83377

able to request X domain to Y IP

75e5fdc

working a and alookup tests

ab3c87c

working validators

2a789ce

cleaning up unneeded files and added some logs

dec7f03

cleanup and add as github action

f48992e

updated zdns exe path

202d766

updated test name in ci.yml

f706612

made python test a script

48a16c6

updated script permissions

2eeeae9

removed dead import

1855325

updated file name

f5cc9e3

made error msg less scary

e0ca48b

only run CI on commits to main branch and all PR's, otherwise we're d…

fb07c87

…oubling the effort on all PRs

remove limit to top 100, run against all 1k domains

b7005b1

moved to only scanning 100 domains

e9e7ee2

cleanup

581fa70

fixed name of CI action

4b04b9f

added rest of domains

fc889fd

moved to 500 domains

1b87244

moved back to 100 domains

ac820d1

Merge branch 'main' into phillip/large-scale-integration

6399896

add comments explaining integration tests

bed56ec

phillip-stephens marked this pull request as ready for review May 28, 2024 22:55

phillip-stephens requested a review from a team as a code owner May 28, 2024 22:55

phillip-stephens assigned zakird May 28, 2024

phillip-stephens added the enhancement label May 28, 2024

Merge branch 'phillip/large-scale-integration' of github.com:zmap/zdn…

cdb6659

…s into phillip/large-scale-integration

phillip-stephens mentioned this pull request May 28, 2024

Split ZDNS into a library and CLI tool #360

Merged

zakird approved these changes May 29, 2024

View reviewed changes

zakird merged commit 736b1ca into main May 29, 2024
3 checks passed

zakird deleted the phillip/large-scale-integration branch May 29, 2024 23:23

phillip-stephens mentioned this pull request Jun 3, 2024

Add reliable large-scale scanning integration test #374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Integration Test to scan the top 1k domains and validate that ZDNS's result is correct #370

Add a Integration Test to scan the top 1k domains and validate that ZDNS's result is correct #370

phillip-stephens commented May 28, 2024 •

edited

Loading

zakird commented May 29, 2024

phillip-stephens commented May 29, 2024 •

edited

Loading

zakird commented May 29, 2024

phillip-stephens commented May 29, 2024

Add a Integration Test to scan the top 1k domains and validate that ZDNS's result is correct #370

Add a Integration Test to scan the top 1k domains and validate that ZDNS's result is correct #370

Conversation

phillip-stephens commented May 28, 2024 • edited Loading

Changes

zakird commented May 29, 2024

phillip-stephens commented May 29, 2024 • edited Loading

zakird commented May 29, 2024

phillip-stephens commented May 29, 2024

phillip-stephens commented May 28, 2024 •

edited

Loading

phillip-stephens commented May 29, 2024 •

edited

Loading