-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(outputs.iotdb): Handle paths that contain illegal characters #14519
Conversation
Thanks so much for the pull request! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Looking at the IoTBD this does appear to be the suggested way of handling special characters in the path.
Can you break out the regex checking into a new function and please add tests for that function with both cases including special characters? My only concern with these types of changes is future changes down the road or corner cases :)
Thanks!
@giovanni-bellini-argo please also sign the CLA! |
!signed-cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm my understanding after playing with this a bit more: A path is made up of three elements:
- root - reserved word
- storage group
- child - one or more items separated by periods
The storage group can only have:
The storage group name can only be characters, numbers and underscores.
I think this is what you are currently testing for. However, the child elements seem to allow a greater set of elements. This means the following are valid without escaping them.
root.foo.3
root.foo.()
Is that your understanding as well? I've been trying this with version 0.13.0, so that may play into this as well.
We are testing on V1.2.2 and could not create child nodes with any other characters outside of underscores and alphanumeric characters. even tho the documentation states otherwise |
Thanks!
I am glad we are trying on different versions. Unfortunately, having different behavior on different versions does mean we would need to make this opt-in as well. We cannot be breaking existing users on versions that do not run into this. How we typically handle this is add a config option, something like: # Mode to sanitize tags
# By default, tags are not sanitized. Possible options:
# 1.2 - only allows alpha/numeric and underscores, and non-numeric values. Otherwise,
# backticks are applied
sanitize_tag = "" Then if that value is set to "1.2", you can apply your sanitization function. I would still like to see a specific test as well that only tests the sanitize function with a various test cases. Does that make sense? |
I've also asked over in the iotdb repo for guidance on valid characters: |
Co-authored-by: SeanGaluzzi <SeanGaluzzi@users.noreply.github.com>
a list of updates:
|
Thanks for the updates and doing that research. Almost there!
plugins/outputs/iotdb/iotdb.go:277:28 gosimple S1007: should use raw string (`...`) with regexp.Compile to avoid having to escape twice
plugins/outputs/iotdb/iotdb.go:310:5 revive var-naming: don't use underscores in Go names; var tag_value should be tagValue
Happy to help with any of these updates, let me know if you do want assistence. |
- fixed lint issues - fixed failing tests - added documentation
That seems to be your browser saying it is blocked by your content security policy. Can you please try a different browser? Thanks! |
@powersj u seem to be right XD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, thanks again for the re-work, I think we are nearly there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this nice PR @giovanni-bellini-argo! I have some small comments, which moving the regexp-compilation out of the Write()
path being the most important one...
plugins/outputs/iotdb/iotdb.go
Outdated
matchUnsopportedCharacter, _ := regexp.Compile("[^0-9a-zA-Z_:@#${}\x60]") | ||
|
||
regex := []*regexp.Regexp{matchUnsopportedCharacter} | ||
regexArray = append(regexArray, regex...) | ||
|
||
// from version 1.x.x IoTDB changed the allowed keys in nodes | ||
case "1.0", "1.1", "1.2", "1.3": | ||
matchUnsopportedCharacter, _ := regexp.Compile("[^0-9a-zA-Z_\x60]") | ||
matchNumericString, _ := regexp.Compile(`^\d+$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, use MustCompile
and initialize regexArray
in Init()
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why initializing regexArray
in Init()
?
u think it would be better to include it in the struct for future changes?
and i would also delete those useless variables and directly append the regex to the array.
regexArray := []*regexp.Regexp{} // array of compiled regex patterns
switch s.SanitizeTags {
case "0.13":
regex := []*regexp.Regexp{
regexp.MustCompile("[^0-9a-zA-Z_:@#${}\x60]"),
}
regexArray = append(regexArray, regex...)
// from version 1.x.x IoTDB changed the allowed keys in nodes
case "1.0", "1.1", "1.2", "1.3":
regex := []*regexp.Regexp{
regexp.MustCompile("[^0-9a-zA-Z_\x60]"),
regexp.MustCompile(`^\d+$`),
}
regexArray = append(regexArray, regex...)
default:
return tag, nil
}
the idea is to make it easy to undertand and modify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your update @giovanni-bellini-argo! Please try to compile the regular-expressions only once as suggested in the code! If you need help, I can provide a patch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @giovanni-bellini-argo! I would just un-export the member variable and fix the typos and we are good to go. Tried to do it in my suggestions to ease your life but no sure I got all of them... :-)
done everything, thanks for the patience and help 😎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work and patience with my requests @giovanni-bellini-argo! :-)
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
) Co-authored-by: SeanGaluzzi <SeanGaluzzi@users.noreply.github.com> Co-authored-by: SeanGaluzzi <sean.galuzzi@argo.consulting> (cherry picked from commit 4c1d8e3)
…luxdata#14519) Co-authored-by: SeanGaluzzi <SeanGaluzzi@users.noreply.github.com> Co-authored-by: SeanGaluzzi <sean.galuzzi@argo.consulting>
Summary
Support for IoTDB ILLEGAL_PATH error when tags contains an illegal character, explained more in detail in issue.
Checklist
Related issues
resolves #14518