Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it inconsistent with the C version implemented by the author? #33

Closed
BingqingXue1 opened this issue Apr 7, 2024 · 9 comments
Closed

Comments

@BingqingXue1
Copy link
Contributor

When we compared your code with the author's C code, we found that the calculation results were inconsistent. We found that a function was not implemented. Is this a bug?

static int has_common_substring(const char *s1, size_t s1len, const char *s2, size_t s2len)
{
  size_t i, j;
  uint32_t hashes[SPAMSUM_LENGTH - (ROLLING_WINDOW - 1)];

  if (s1len < ROLLING_WINDOW)
    return 0;
  if (s2len < ROLLING_WINDOW)
    return 0;

  // there are many possible algorithms for common substring
  // detection. In this case I am re-using the rolling hash code
  // to act as a filter for possible substring matches

  memset(hashes, 0, sizeof(hashes));

  // first compute the windowed rolling hash at each offset in
  // the first string
  struct roll_state state;
  roll_init (&state);

  for (i = 0; i < ROLLING_WINDOW - 1; i++)
    roll_hash(&state, (unsigned char)s1[i]);
  for (i = ROLLING_WINDOW - 1; i < s1len; i++)
  {
    roll_hash(&state, (unsigned char)s1[i]);
    hashes[i - (ROLLING_WINDOW - 1)] = roll_sum(&state);
  }
  s1len -= (ROLLING_WINDOW - 1);

  roll_init(&state);

  // now for each offset in the second string compute the
  // rolling hash and compare it to all of the rolling hashes
  // for the first string. If one matches then we have a
  // candidate substring match. We then confirm that match with
  // a direct string comparison */
  for (j = 0; j < ROLLING_WINDOW - 1; j++)
    roll_hash(&state, (unsigned char)s2[j]);
  for (j = 0; j < s2len - (ROLLING_WINDOW - 1); j++)
  {
    roll_hash(&state, (unsigned char)s2[j + (ROLLING_WINDOW - 1)]);
    uint32_t h = roll_sum(&state);
    for (i = 0; i < s1len; i++)
    {
      // confirm the match after checking potential match
      if (hashes[i] == h && !memcmp(s1 + i, s2 + j, ROLLING_WINDOW))
	  return 1;
    }
  }

  return 0;
}
@BingqingXue1
Copy link
Contributor Author

If you think this function needs to be implemented, I can implement it and submit a PR

@glaslos
Copy link
Owner

glaslos commented Apr 7, 2024

Do you have an example of a hash mismatch?

@glaslos
Copy link
Owner

glaslos commented Apr 7, 2024

The function you are mentioning is only used when comparing two hashes, so it will not have an impact on the calculated hash.

@BingqingXue1
Copy link
Contributor Author

Yes, It only affects the distance of ssdeep hash.
For example , the one hash is 24:YDVLfvBD0D1b8Hlapg2MbMdI6hPZ3QCw4qat:YDbM8HlOUbMu6HACwVat. And
the other is 24:YDVLfzQM/QEzpg7RMdlgx6h4Z3QCw4qat:YDmM/QO6MI6uACwVat.
We calculate the score of ssdeep to be 65. But the value we calculated using the author's C program is 60. The reason is that go does not implement this function.

@glaslos
Copy link
Owner

glaslos commented Apr 8, 2024

Then this is definitely missing. Thank you for raising it! You can either just add a test that fails for now or implement the extra condition. Otherwise I'll add it later this week.

@BingqingXue1
Copy link
Contributor Author

No problem. I'll raise PR tomorrow or the day after tomorrow.

@BingqingXue1
Copy link
Contributor Author

Here is PR #34

@glaslos
Copy link
Owner

glaslos commented Apr 9, 2024

Awesome, thanks a lot! I will prepare a new release later today!

@glaslos
Copy link
Owner

glaslos commented Apr 15, 2024

Released 0.4.0 with the fix by @BingqingXue1 Thank you for your contribution!

@glaslos glaslos closed this as completed Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants