Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SnakeYAML for manual YAML Parser #3191

Merged
merged 8 commits into from
Jul 29, 2023

Conversation

sv99
Copy link
Contributor

@sv99 sv99 commented Jul 21, 2023

Serialize to Yaml without using any library.

Its first version without cleaning refs to snakeyaml.

@iBotPeaches
Copy link
Owner

Some initial thoughts:

  1. Should we introduce some sort of enum/static variable all the field names? It might cut down on the human error/rename errors we had in past.
  2. Do we effectively patch away any chance of untrusted deserialization since we don't take untrusted input into classes?

What is the main driving factor for this outside of what I know?

  • dropping a dependency
  • potentially plugging a constant security concern

@sv99
Copy link
Contributor Author

sv99 commented Jul 21, 2023

I dont find clear method for reading versioned data in the yaml serialization library.
For example - renamed fields as in our case.

I realized clear straightforward method for loading data from yaml.
Any unknown fields silently skipped. Any problems may add to test suite.

I think simple realization we can easily refactor in the future.

@sv99
Copy link
Contributor Author

sv99 commented Jul 21, 2023

I think enum/static variable add redundant complexity.

@iBotPeaches
Copy link
Owner

I think enum/static variable add redundant complexity.

In my head I am thinking just a file that exports static values. It seems like it would help have a consistent location for property names. Is that the reason that it adds complexity? Or that loops/etc that were previously just simple "strings" become like Foo.String?

@sv99
Copy link
Contributor Author

sv99 commented Jul 21, 2023

Internally this will be string anyway - additional lines.

@sv99
Copy link
Contributor Author

sv99 commented Jul 21, 2023

This solution can solve #3145?

@iBotPeaches
Copy link
Owner

This solution can solve #3145?

Yes I believe so. I'm just nervous it adds managing YAML logic into Apktool's codebase, which was previously something I didn't have to worry myself with. Some of these files that managing the spec of parsing/writing YAML looks like something that can easily expand for edge cases.

Unless if we stay strict to scalars - then the spec and associated logic is small.

@sv99
Copy link
Contributor Author

sv99 commented Jul 21, 2023

YamlWriter very simple.
YamlReader has some difficulties but in general it is quite straightforward.

Need add some test for writer - reader and clean refs to snakeyaml.

Snakeyaml is common purposes library based on reflection and the need to support the entire yaml standard hence the complexity.

@iBotPeaches
Copy link
Owner

Okay, let me get out a hotfix this weekend and then focus this. If we go forward with this, I want to make sure its slated for the next "feature" release of like v2.9.0 so it has a few months to stabilize and be cleaned up w/ removal of SnakeYAML

@sv99
Copy link
Contributor Author

sv99 commented Jul 22, 2023

  1. ApkInfo tests consolidated in the package brut.androlib.apk.
  2. Added ApkInfoSerializationTest read -> save -> read yml file with nonprintable items in the unknownFiles.
  3. Removed dependencies from snakyaml.

Copy link
Owner

@iBotPeaches iBotPeaches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I can't figure out is I pulled this locally - ran build and it failed w/

brut.androlib.apk.ApkInfoReaderTest > testFirstIncorrectIndent FAILED
    java.lang.AssertionError: Values should be different. Actual: 2.8.1
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failEquals(Assert.java:187)
        at org.junit.Assert.assertNotEquals(Assert.java:163)
        at org.junit.Assert.assertNotEquals(Assert.java:177)
        at brut.androlib.apk.ApkInfoReaderTest.testFirstIncorrectIndent(ApkInfoReaderTest.java:64)

So I rebased this and waiting to see output.

@sv99 sv99 force-pushed the manual_yaml_serialization branch 2 times, most recently from c183d03 to 2e08282 Compare July 23, 2023 08:24
@sv99
Copy link
Contributor Author

sv99 commented Jul 23, 2023

Unused methods removed and rebased

@iBotPeaches
Copy link
Owner

Odd. I still cannot build this locally without the indent test failures. Yet things pass on CI.

@sv99
Copy link
Contributor Author

sv99 commented Jul 23, 2023

Check last version. I think you tested version have version 2.8.1.

My copy have 2.8.1-SNAPSHOT-...

@sv99 sv99 force-pushed the manual_yaml_serialization branch from 9fb5017 to 6912065 Compare July 23, 2023 19:08
@sv99
Copy link
Contributor Author

sv99 commented Jul 23, 2023

rebased

@iBotPeaches
Copy link
Owner

My local tests pass now w/ latest change. Still trying to debug/follow the actual parser a bit more so I can make sure I understand it.

One personal thing that I wish we did is less shorthand if statements. I'd much rather have:

    public boolean nextLine() {
        if (isEnd()) {
            return false;
        }
        
        while (true) {
            mCurrent++;
            if (isCommentOrEmpty())  {
                continue;
            }
            return !isEnd();
        }
    }

vs

    public boolean nextLine() {
        if (isEnd())
            return false;
        while (true) {
            mCurrent++;
            // skip comments
            if (isCommentOrEmpty())
                continue;
            return !isEnd();
        }
    }

I don't think there is any need to make it as compact as possible. Its how the original AXML Parser was written and overtime as we had to add patches, etc. It just made such large diffs as we introduced curly braces.

@sv99 sv99 force-pushed the manual_yaml_serialization branch from 6912065 to aac062c Compare July 24, 2023 06:11
@sv99
Copy link
Contributor Author

sv99 commented Jul 24, 2023

Comment removed and rebased.

@sv99
Copy link
Contributor Author

sv99 commented Jul 24, 2023

curly braces corrected in all places

@sv99 sv99 force-pushed the manual_yaml_serialization branch from 1091cb3 to 7b09b5e Compare July 24, 2023 10:29
@sv99
Copy link
Contributor Author

sv99 commented Jul 24, 2023

Add test for serialization hieroglyphs in the doNotCompress field

@iBotPeaches
Copy link
Owner

Okay thanks. Let me do one more detailed pass through on this since its quite major.

@sv99 sv99 force-pushed the manual_yaml_serialization branch from a28a525 to 81825cc Compare July 24, 2023 11:10
@sv99 sv99 force-pushed the manual_yaml_serialization branch from 81825cc to 859f5e3 Compare July 26, 2023 06:43
@iBotPeaches iBotPeaches changed the title Simple straitforward yaml serialization Remove SnakeYAML for manual YAML Parser Jul 27, 2023
@iBotPeaches
Copy link
Owner

Okay I hammered this and can't find anything else wrong. I'll merge tomorrow morning and follow up with a commit of some minor cs changes.

then I am digging into some code style automation so I never have to worry about formatting again.

@iBotPeaches iBotPeaches merged commit 62b9eed into iBotPeaches:master Jul 29, 2023
31 checks passed
@kefir500
Copy link

kefir500 commented May 17, 2024

Heads up.

I'm not completely sure (haven't check the code from this PR), but it seems that the new YAML writer properly outputs numeric values as numbers (in contrary to the previous implementation where numbers are casted to strings for some reason).

It's not a big deal, but some dirty regex-related hacks (as in my project 😄) can break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants