Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolves #931: Fixing problems with encoding in UseDepVersion and PomHelper #932

Merged

Conversation

jarmoniuk
Copy link
Contributor

@jarmoniuk jarmoniuk commented Mar 14, 2023

As shown in #931, recent changes introduced in 2.15.0 introduced a regression when the user used irregular characters.

The problem is highly platform dependent and occurs when the file.encoding system property does not equal to the one returned by Charset.defaultCharset().

In this particular case, we were using String(byte[]) to construct a String from a byte array which assumed incorrect encoding.

The file was an utf-8 file, but file.encoding was equal to latin1 or windows1252.

@slawekjaranowski please review

@jarmoniuk
Copy link
Contributor Author

@slachiewicz please review

@slawekjaranowski slawekjaranowski linked an issue May 9, 2023 that may be closed by this pull request
@slawekjaranowski slawekjaranowski added this to the 2.16.0 milestone May 9, 2023
*/
public static StringBuilder readFile(Path path) throws IOException {
try (BufferedReader reader = Files.newBufferedReader(path)) {
return reader.lines().collect(StringBuilder::new, StringBuilder::append, StringBuilder::append);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does line end character can be important here?
In old implementation we read bytes from file but here line end character can be drop.

Copy link
Contributor Author

@jarmoniuk jarmoniuk May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to debug if tests go wrong, if I remember correctly.

I can drop the new lines, it will matching regexes easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting to a draft -- doesn't quite work if I change the encoding of the file to ISO-8859-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still in a draft, but for the record: the issue is caused by a discrepancy between -Dfile.encoding and the actual encoding of the file. The system property determines the value of the Charset.defaultCharset() used to convert the bytes[] read in Files.read() into a string using new String(byte[]).

Copy link
Contributor Author

@jarmoniuk jarmoniuk May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't really know whether this is a bug or not. In 2.14.2 it did not need to use the raw file so it relied completely on Maven to process the file. In version 2.15.0, the plugin processes raw POM and thus is susceptible to problems with encoding.

In this case, the problem is caused by the user having the wrong value of their -Dfile.encoding system property, which causes String(byte[]) to use the wrong encoding (given by the user) to convert the byte array to string. This is why I also wasn't able to reproduce the issue initially.

This same problem will occur everywhere where we process raw pom and a user will have a system with the wrong config.

So, as an attempt to solve this problem -- I'm using Plexus's XmlReader with its detection capability to detect the encoding. Please tell me what you think of this solution.

@jarmoniuk jarmoniuk force-pushed the issue-931-use-dep-version-umlauts branch from ff62d6f to b877d17 Compare May 10, 2023 14:11
@jarmoniuk jarmoniuk marked this pull request as draft May 10, 2023 14:41
@jarmoniuk jarmoniuk marked this pull request as ready for review May 10, 2023 16:15
@jarmoniuk jarmoniuk force-pushed the issue-931-use-dep-version-umlauts branch from b877d17 to 6f8ab5c Compare May 10, 2023 16:20
…tring to circumvent problems with encoding

Uses Plexus XmlStreamReader to guess file encoding.
@jarmoniuk jarmoniuk force-pushed the issue-931-use-dep-version-umlauts branch from 6f8ab5c to b881d37 Compare May 11, 2023 20:48
@slawekjaranowski
Copy link
Member

There is a method - PomHelper.readXmlFile looks like we can use it instead of creating next method for xml reading.
I pushed commit with it.

@ajarmoniuk What do you think?

@jarmoniuk
Copy link
Contributor Author

There is a method - PomHelper.readXmlFile looks like we can use it instead of creating next method for xml reading. I pushed commit with it.

@ajarmoniuk What do you think?

Even better! 👍

@slawekjaranowski slawekjaranowski merged commit bdf6b1c into mojohaus:master May 14, 2023
@jarmoniuk jarmoniuk deleted the issue-931-use-dep-version-umlauts branch May 14, 2023 19:00
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Jun 18, 2023
### What changes were proposed in this pull request?
The pr aims to update some maven plugins to newest version. include:
- versions-maven-plugin from 2.15.0 to 2.16.0
- maven-source-plugin from 3.2.1 to 3.3.0
- maven-surefire-plugin from 3.1.0 to 3.1.2
- maven-dependency-plugin from 3.5.0 to 3.6.0

### Why are the changes needed?
- versions-maven-plugin
1.Release Notes: https://github.com/mojohaus/versions/releases/tag/2.16.0
2.Bug Fix:
Resolves: display-dependency-updates only shows updates from the most major allowed segment (mojohaus/versions#966) ajarmoniuk
Resolves mojohaus/versions#931: Fixing problems with encoding in UseDepVersion and PomHelper (mojohaus/versions#932) ajarmoniuk
Resolves mojohaus/versions#916: Partially reverted mojohaus/versions#799. (mojohaus/versions#924) ajarmoniuk
Resolves mojohaus/versions#954: Excluded plexus-container-default (mojohaus/versions#955) ajarmoniuk
Resolves mojohaus/versions#951: DefaultArtifactVersion::getVersion can be null (mojohaus/versions#952) ajarmoniuk
BoundArtifactVersion.toString() to work with NumericVersionComparator (mojohaus/versions#930) ajarmoniuk
Issue mojohaus/versions#925: Protect against an NPE if a dependency version is defined in dependencyManagement (mojohaus/versions#926) ajarmoniuk

- maven-source-plugin
v3.2.1 VS v3.3.0: apache/maven-source-plugin@maven-source-plugin-3.2.1...maven-source-plugin-3.3.0

- maven-surefire-plugin
Release Notes: https://github.com/apache/maven-surefire/releases/tag/surefire-3.1.2

- maven-dependency-plugin
v3.5.0 VS v3.6.0: apache/maven-dependency-plugin@maven-dependency-plugin-3.5.0...maven-dependency-plugin-3.6.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #41641 from panbingkun/SPARK-44085.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
### What changes were proposed in this pull request?
The pr aims to update some maven plugins to newest version. include:
- versions-maven-plugin from 2.15.0 to 2.16.0
- maven-source-plugin from 3.2.1 to 3.3.0
- maven-surefire-plugin from 3.1.0 to 3.1.2
- maven-dependency-plugin from 3.5.0 to 3.6.0

### Why are the changes needed?
- versions-maven-plugin
1.Release Notes: https://github.com/mojohaus/versions/releases/tag/2.16.0
2.Bug Fix:
Resolves: display-dependency-updates only shows updates from the most major allowed segment (mojohaus/versions#966) ajarmoniuk
Resolves mojohaus/versions#931: Fixing problems with encoding in UseDepVersion and PomHelper (mojohaus/versions#932) ajarmoniuk
Resolves mojohaus/versions#916: Partially reverted mojohaus/versions#799. (mojohaus/versions#924) ajarmoniuk
Resolves mojohaus/versions#954: Excluded plexus-container-default (mojohaus/versions#955) ajarmoniuk
Resolves mojohaus/versions#951: DefaultArtifactVersion::getVersion can be null (mojohaus/versions#952) ajarmoniuk
BoundArtifactVersion.toString() to work with NumericVersionComparator (mojohaus/versions#930) ajarmoniuk
Issue mojohaus/versions#925: Protect against an NPE if a dependency version is defined in dependencyManagement (mojohaus/versions#926) ajarmoniuk

- maven-source-plugin
v3.2.1 VS v3.3.0: apache/maven-source-plugin@maven-source-plugin-3.2.1...maven-source-plugin-3.3.0

- maven-surefire-plugin
Release Notes: https://github.com/apache/maven-surefire/releases/tag/surefire-3.1.2

- maven-dependency-plugin
v3.5.0 VS v3.6.0: apache/maven-dependency-plugin@maven-dependency-plugin-3.5.0...maven-dependency-plugin-3.6.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes apache#41641 from panbingkun/SPARK-44085.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NPE and wrong encoding for umlauts
2 participants