Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPv6 string validation #357

Merged
merged 1 commit into from
Jan 22, 2018
Merged

Add IPv6 string validation #357

merged 1 commit into from
Jan 22, 2018

Conversation

NeQuissimus
Copy link
Contributor

Looks like IPv6 validation using a regex is what makes the most sense.

  • I looked at sun.net.util.IPAddressUtil but that's an internal JDK API, not supposed to be used, not available for Scala.JS (it is the underlying implementation for java.net.InetAddress)
  • java.net.InetAddress itself always tries to make DNS calls unless you convert the String into the address bytes. That would, however, require a lot of logic to undo compacting etc.
  • Guava and Apache Commons Validator are JVM-only and fairly large

@codecov
Copy link

codecov bot commented Nov 22, 2017

Codecov Report

Merging #357 into master will increase coverage by 0.19%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #357      +/-   ##
==========================================
+ Coverage   95.54%   95.73%   +0.19%     
==========================================
  Files          46       46              
  Lines         561      586      +25     
  Branches       12       12              
==========================================
+ Hits          536      561      +25     
  Misses         25       25
Impacted Files Coverage Δ
...red/src/main/scala/eu/timepit/refined/string.scala 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09b3f6c...4a6c50b. Read the comment docs.

Copy link
Owner

@fthomas fthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good but due to #356 there are now conflicts that need to be resolved. I also think there is an opportunity to make the regex more readable. Let me know what you think.

final case class IPv6()
object IPv6 {
val regex =
"^(^(([0-9A-Fa-f]{1,4}(((:[0-9A-Fa-f]{1,4}){5}::[0-9A-Fa-f]{1,4})|((:[0-9A-Fa-f]{1,4}){4}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,1})|((:[0-9A-Fa-f]{1,4}){3}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,2})|((:[0-9A-Fa-f]{1,4}){2}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,3})|(:[0-9A-Fa-f]{1,4}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,4})|(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,5})|(:[0-9A-Fa-f]{1,4}){7}))$|^(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,6})$)|^::$)|^((([0-9A-Fa-f]{1,4}(((:[0-9A-Fa-f]{1,4}){3}::([0-9A-Fa-f]{1,4}){1})|((:[0-9A-Fa-f]{1,4}){2}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,1})|((:[0-9A-Fa-f]{1,4}){1}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,2})|(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,3})|((:[0-9A-Fa-f]{1,4}){0,5})))|([:]{2}[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,4})):|::)((25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{0,2})\\.){3}(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{0,2})$$".r.pattern
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we factor out common subexpressions to reduce the length of that regex? [0-9A-Fa-f]{1,4} seems to be a good candidate on a first glance. I'd imagine doing it like this:

val regex = {
  val group = "[0-9A-Fa-f]{1,4}"
  s"^(^(($group(((:$group){5}::$group) ..."
}

@derekmorr
Copy link
Contributor

java.net.InetAddress only makes DNS calls if the input isn't an IP address literal. IPv6 address parsing is pretty complicated. It doesn't look like this parser handles embedded IPv4-literals:

scala>     val regex =
     |       "^(^(([0-9A-Fa-f]{1,4}(((:[0-9A-Fa-f]{1,4}){5}::[0-9A-Fa-f]{1,4})|((:[0-9A-Fa-f]{1,4}){4}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,1})|((:[0-9A-Fa-f]{1,4}){3}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,2})|((:[0-9A-Fa-f]{1,4}){2}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,3})|(:[0-9A-Fa-f]{1,4}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,4})|(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,5})|(:[0-9A-Fa-f]{1,4}){7}))$|^(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,6})$)|^::$)|^((([0-9A-Fa-f]{1,4}(((:[0-9A-Fa-f]{1,4}){3}::([0-9A-Fa-f]{1,4}){1})|((:[0-9A-Fa-f]{1,4}){2}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,1})|((:[0-9A-Fa-f]{1,4}){1}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,2})|(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,3})|((:[0-9A-Fa-f]{1,4}){0,5})))|([:]{2}[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,4})):|::)((25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{0,2})\\.){3}(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{0,2})$$".r.pattern
regex: java.util.regex.Pattern = ^(^(([0-9A-Fa-f]{1,4}(((:[0-9A-Fa-f]{1,4}){5}::[0-9A-Fa-f]{1,4})|((:[0-9A-Fa-f]{1,4}){4}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,1})|((:[0-9A-Fa-f]{1,4}){3}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,2})|((:[0-9A-Fa-f]{1,4}){2}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,3})|(:[0-9A-Fa-f]{1,4}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,4})|(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,5})|(:[0-9A-Fa-f]{1,4}){7}))$|^(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,6})$)|^::$)|^((([0-9A-Fa-f]{1,4}(((:[0-9A-Fa-f]{1,4}){3}::([0-9A-Fa-f]{1,4}){1})|((:[0-9A-Fa-f]{1,4}){2}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,1})|((:[0-9A-Fa-f]{1,4}){1}::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,2})|(::[0-9A-Fa-f]{1,4}(:[0-9A-Fa-f]{1,4}){0,3})|((:[0-9A-Fa-f]{1,4}){0,5})))|([:]{2}[0-9A-Fa-f]{1,4}(:[0-9A-Fa-...

scala> regex.matcher("2001:db8:122:344::192.0.2.33").matches // example from https://tools.ietf.org/html/rfc6052#section-2.4
res1: Boolean = false

scala> import java.net.InetAddress // But the JDK's parser works fine
import java.net.InetAddress

scala> InetAddress.getByName("2001:db8:122:344::192.0.2.33")
res2: java.net.InetAddress = /2001:db8:122:344:0:0:c000:221

@fthomas
Copy link
Owner

fthomas commented Dec 5, 2017

Thanks @derekmorr for this test case. https://stackoverflow.com/a/17871737/460387 shows that a regex can handle embedded IPv4 literals. @NeQuissimus what do you think about updating the regex to also support embedded IPv4 literals?

@NeQuissimus
Copy link
Contributor Author

Sorry, I completely missed the feedback here. I will take a look.
I was aware of this, my regex only validates against RFC-8200.
I'll see if I can't make some improvements or maybe find a better regex online.

@kusamakura
Copy link
Contributor

It might also be a nice touch to have an Inference for IPv4 ==> IPv6, if that's possible. (Or also the other way around for valid values.)

@NeQuissimus
Copy link
Contributor Author

I fixed up the regex to account for embedded IPv4 addresses and I also extracted common elements.
I added tests for the new address formats.

@fthomas Can you kick off the Travis build again? No matter how often I run the tests locally, they pass...

@fthomas
Copy link
Owner

fthomas commented Jan 18, 2018

The Scala.js tests failed with

[info] ! StringValidate.IPv6.isValid.compact: Falsified after 0 passed tests.

This should be reproducible with coreJS/test.

@NeQuissimus
Copy link
Contributor Author

wow, this took me a while! JavaScript's Regex engine has different eagerness rules than the one in the JVM...
I fixed it now and hopefully Travis will agree...

Copy link
Owner

@fthomas fthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, many thanks @NeQuissimus!

@fthomas fthomas merged commit b18606a into fthomas:master Jan 22, 2018
@NeQuissimus NeQuissimus deleted the ipv6 branch January 23, 2018 14:04
@fthomas fthomas mentioned this pull request Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants