-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
please add an additional check for the chromium DGA test #161
Comments
@RoyArends There is a fundamental problem there. If the recursive resolver does QName minimization, the root will see "BOUYGUESTELECOM" even in cases when the client looked for "BT1SVWQM.NOE.BOUYGUESTELECOM". With your suggestion, the client's requests for BOUYGUESTELECOM will be split in two different buckets. My first suggestion would be to investigate how big the problem really is. We can do that by dumping out all the NXDOMAIN name targets in which the name meets the DGA classification, and then see how many have multipart names. Count that, and see how much of the DGA total that is. If it is just a tiny fraction, then there is not much to worry about. If the fraction is significant, then we need secondary analysis. I would do a count the number of occurrences of TLD in the "multipart DGA" category. If we saw some TLD used with significant frequency, we can add it to a list of "TLD that should not be mistaken for DGA", and use that list as part of the DGA classification. If I remember correctly, BOUYGHESTELECOM used to be a registered TLD. Maybe, as precaution, we could special case all these "formerly registered TLD" to the special case list. |
@RoyArends I am looking into this issue, and there is a tension between more precise accounting and compatibility with historic series. The current algorithm can be summarized as:
To avoid breaking the compatibility with the existing statistics, I propose leaving the existing accounting unchanged, but adding 2 new listing in the summary files:
Alpha_7 to Alpha_15 would map the generation algorithm used by Google. This would become the new definition of "dga". We could then compute:
I think that would match expectations, but I would like confirmation. |
Actually, the way the ithitools program is structured, the "leak type" has to be a function of just the TLD. We could easily see some requests for "subdomain.no-such-domain" and others for "no-such-domain". What we can easily do is for each such domain count both the total number of references, and also in parallel the number of references with subdomains, and export all that in the "address and names" report. |
1 similar comment
Actually, the way the ithitools program is structured, the "leak type" has to be a function of just the TLD. We could easily see some requests for "subdomain.no-such-domain" and others for "no-such-domain". What we can easily do is for each such domain count both the total number of references, and also in parallel the number of references with subdomains, and export all that in the "address and names" report. |
Christian, the chromium DGA will only issue a single label as top level domain. Your code tests for this top level domain, but also classifies top level domains with subdomains as DGA, such as:
BT1SVWQM.NOE.BOUYGUESTELECOM
GT7TRSFP0.APPLIS.SI.INTERNE
etc.
Maybe an additional classification where one is DGA (in general) of which Chromium DGA is a subset.
Roy
The text was updated successfully, but these errors were encountered: