Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong VATIN validation on duplicate country code #420

Open
simonphilips opened this issue Oct 13, 2023 · 2 comments
Open

Wrong VATIN validation on duplicate country code #420

simonphilips opened this issue Oct 13, 2023 · 2 comments

Comments

@simonphilips
Copy link

There seems to be a bug in the VATIN validation when you duplicate the country code:

In [3]: stdnum.vatin.is_valid("BE 0308.357.159")
Out[3]: True

In [4]: stdnum.vatin.is_valid("BE BE 0308.357.159")
Out[4]: True

In [5]: stdnum.eu.vat.is_valid("BE BE 0308.357.159")
Out[5]: False

[4] should have the same result as [5].

This is also happening for Dutch (NL) numbers but not for Mexican (MX) numbers for example.

@arthurdejong
Copy link
Owner

Hi @simonphilips,

Thanks for reporting this. I've been looking into this and I can't seem to find an elegant solution for this. The problem is that for some countries the country code is considered a valid part of the number. For most of these it is optional while for some others it is required (e.g. the Swiss VAT number). In a lot of other countries the country code is not considered part of the number prefix.

The vatin code currently first tries to validate the number without the country code prefix and if that fails falls back to validating the number with the prefix. This means that the country code may end up being stripped twice.

The only thing I can think of to fix this is to classify each VAT number into categories that define whether they strip a country code prefix. Since we have 70+ numbers keeping this classification up-to-date will be tricky.

Also explicitly checking if the prefix is there twice will not work because some numbers (e.g. the Mexican tax number) can start with letters that match the country code.

I'm willing to look at other suggestions for a solution to this.

@simonphilips
Copy link
Author

simonphilips commented Nov 20, 2023

Hello @arthurdejong,

I understand. I had also looked into the code and it was indeed pretty difficult to come up with a good solution.

My original thought was to do something like this:

def validate(number):
    number = clean(number, '').strip()
    module = _get_cc_module(number[:2])

    # Check if the full number is valid, if so then it's valid.
    try:
        return module.validate(number)
    except ValidationError:
        pass

    # In order for the number to still be considered valid, it must be valid
    # without the country code.
    country = number[:2].upper()
    validated = country + module.validate(number[2:])

    # The module considers number[2:] valid but that could be because of two
    # mutually exclusive reasons:
    #
    #   1. number[2:] is a fully valid number without country code: GOOD
    #   2. number[4:] is a fully valid number without country code and
    #      number[2:4] just duplicates the country code: BAD
    #
    # Therefore, let's check if number[4:] is valid. If so, then number[2:] is
    # only valid because it contains the country code and therefore number[0:]
    # contains the country code twice and is not valid.

    sub = clean(number[2:], '').strip()
    if sub[:2].upper() == country and module.is_valid(sub[2:]):
        raise InvalidFormat()

    return validated

Which works in these cases:

# BE 0308.357.159
validate("BE 0308.357.159")  # valid -> OK

# BE BE 0308.357.159 is not a valid VAT number
validate("BE BE 0308.357.159")  # invalid
validate("BE 0308.357.159")  # valid
validate("0308.357.159")  # valid -> original is invalid -> ERROR

# MXA001101007 is a valid VAT number
validate("MXA001101007")  # valid -> OK

# MX MXA001101007 is a valid VAT number
validate("MX MXA001101007")  # invalid
validate("MXA001101007")  # valid
validate("A001101007")  # invalid -> original is valid -> OK

But unfortunately, while it correctly identifies "BE BE 0308.357.159" as invalid, it still thinks BE.BE 0308.357.159 is valid because periods are stripped in the country's validator, not in vatin itself so it thinks the second country could would be .B instead. I'm not sure how to proceed with this unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants