Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for exporting to proper deobfuscation mapping file formats #1505

Merged
merged 24 commits into from
Jun 11, 2022

Conversation

NebelNidas
Copy link
Contributor

@NebelNidas NebelNidas commented May 30, 2022

Adds support for exporting manually renamed classes/methods/fields etc. in the Tiny v2 format, which is the most versatile out there and can handle a lot more features than JADX's default .jobf format. For more details, see its spec.

Depends on FabricMC's mapping-io library, which theoretically allows JADX to support many more formats in the future.
This PR doesn't add Tiny v2 import support yet, as I don't need it personally atm, but it can be easily added later when needed.

Fixes #1491.

Currently working:

  • Exporting classes
  • Exporting fields
  • Exporting methods
  • Exporting method args
  • Exporting method vars
  • Exporting comments

@lgtm-com
Copy link

lgtm-com bot commented May 30, 2022

This pull request introduces 1 alert when merging d0fc80d into 9d88592 - view on LGTM.com

new alerts:

  • 1 for Container contents are never accessed

@lgtm-com
Copy link

lgtm-com bot commented May 30, 2022

This pull request introduces 1 alert when merging b5c7ad9 into 9d88592 - view on LGTM.com

new alerts:

  • 1 for Container contents are never accessed

@lgtm-com
Copy link

lgtm-com bot commented May 30, 2022

This pull request introduces 1 alert when merging d14c59a into fcd58ae - view on LGTM.com

new alerts:

  • 1 for Container contents are never accessed

@NebelNidas
Copy link
Contributor Author

NebelNidas commented May 30, 2022

@skylot It's almost done now, but I need some help with the method args. How can I access them? I looked around the code, but couldn't find a solution. I mean, there's methodInfo.getArgsCount() and such, but I don't just need the arg count/types, but their renamed names, too!
I've tried adding something like this in between here, but it obviously doesn't work (arg.getName() always returns null. And what even are RegisterArgs?):

boolean wasLoaded = mth.isLoaded();
try {
    mth.load();
    for (RegisterArg arg : mth.getArgRegs()) {
        if (mappedMethodArgsAndVars.contains(methodInfo.getDeclClass() + methodInfo.getShortId())) {
            mappingTree.visitClass(classPath);
            mappingTree.visitMethod(methodName, methodDesc);
            mappingTree.visitMethodArg(arg.getRegNum(), 0, arg.getName());
            mappingTree.visitDstName(MappedElementKind.METHOD_ARG, 0, arg.getName());
        }
    }
} catch (DecodeException e) {
    LOG.error("Error while decompiling method " + methodInfo.getShortId());
}
if (!wasLoaded) {
    mth.unload();
}

@skylot
Copy link
Owner

skylot commented May 31, 2022

@NebelNidas information you are trying to get available at decompilation stage, and accessible inside decompile passes only.
After decompilation result data saved in code metadata. Code to get all method arg names can be found here:

private List<String> collectMethodArgNames(JavaMethod javaMethod) {
ICodeInfo codeInfo = javaMethod.getTopParentClass().getCodeInfo();
int mthDefPos = javaMethod.getDefPos();
int lineEndPos = CodeUtils.getLineEndForPos(codeInfo.getCodeStr(), mthDefPos);
List<String> argNames = new ArrayList<>();
codeInfo.getCodeMetadata().searchDown(mthDefPos, (pos, ann) -> {
if (pos > lineEndPos) {
return Boolean.TRUE; // stop at line end
}
if (ann instanceof NodeDeclareRef) {
ICodeNodeRef declRef = ((NodeDeclareRef) ann).getNode();
if (declRef instanceof VarNode) {
VarNode varNode = (VarNode) declRef;
if (varNode.getMth().equals(javaMethod.getMethodNode())) {
argNames.add(varNode.getName());
}
}
}
return null;
});
return argNames;
}

(replace JavaMethod with MethodNode and method names to similar ones)

If you need to get only renamed args you need to filter that using ICodeRename from code data.

@NebelNidas
Copy link
Contributor Author

NebelNidas commented May 31, 2022

Thank you! I've already gone down another route, which also works well so far, only issue is that

for (RegisterArg arg : mth.getArgRegs()) {
    arg.getSVar()
}

always returns null. Is there another way to access the arg's SSAVar? I need it to calculate the index for a new JadxCodeRef so I can compare it with the existing saved one (see my code here: NebelNidas@fe104a1)

@skylot
Copy link
Owner

skylot commented May 31, 2022

always returns null. Is there another way to access the arg's SSAVar?

As I said in previous comment: that info not available at that point. And, yes, you can access reg and ssa numbers from VarNode using code sample from above.

@NebelNidas
Copy link
Contributor Author

Thank you for the help, method args work now :) One question though: What's the reason for using getTopParentClass() instead of getParentClass() here?

ICodeInfo codeInfo = javaMethod.getTopParentClass().getCodeInfo();

@skylot
Copy link
Owner

skylot commented Jun 2, 2022

What's the reason for using getTopParentClass() instead of getParentClass()

Code generated only for top level classes, and inner classes can have any nesting level, i.e. parent class of inner can be not a top class.

@NebelNidas
Copy link
Contributor Author

How would I do the same for method vars? I guess I would have to replace the > lineEndPos check here with > methodEndPos, but I don't know how I can get that information (where the method ends / how big the method is):

if (pos > lineEndPos) {
return Boolean.TRUE; // stop at line end

@skylot
Copy link
Owner

skylot commented Jun 3, 2022

I think you can stop if mth in VarNode became different, i.e. !varNode.getMth().equals(methodNode).
In collectMethodArgs method I use that check just to be sure, but it not really needed.

@NebelNidas
Copy link
Contributor Author

NebelNidas commented Jun 3, 2022

Good idea! I still need to get a few properties from the resulting VarNodes though, and I'm not sure how do do it. For one, I need the var's offset to be able to use JadxCodeRef.forInsn(varOffset) on it. And mapping-io requires that I provide the method var's lv-index and lv-start-offset aka start-op-code:

<lv-index> refers to the local variable array index of the frames having the variable, see "index" in JVMS SE 8 §4.7.13
<lv-start-offset> is at most the start of the range in which the variable has a value, but doesn't overlap with another variable with the same <lv-index>, see "start_pc" in JVMS SE 8 §4.7.13. The start offset/range for tiny is measured in instructions with a valid opcode, not bytes.

For the local variable index I already have code (which I think works correctly), see here:

int lvIndex = mth.getAccessFlags().isStatic() ? 0 : 1;
for (VarNode arg : collectMethodArgs(mth)) {
int ssaVersion = arg.getSsa();
String key = methodInfo.getDeclClass() + methodInfo.getShortId()
+ JadxCodeRef.forVar(arg.getReg(), ssaVersion);
if (mappedMethodArgsAndVars.containsKey(key)) {
mappingTree.visitClass(classPath);
mappingTree.visitMethod(methodName, methodDesc);
mappingTree.visitMethodArg(arg.getReg(), lvIndex, null);
mappingTree.visitDstName(MappedElementKind.METHOD_ARG, 0, mappedMethodArgsAndVars.get(key));
}
lvIndex += Math.max(arg.getType().getRegCount(), 1);
}

Is there an easier way to get this result? And I have no idea yet how I can get the lv-start-offset.

@skylot
Copy link
Owner

skylot commented Jun 3, 2022

I need the var's offset to be able to use JadxCodeRef.forInsn(varOffset) on it

Offset stored in metadata annotation with type OFFSET and attached to line start, so offset on line before variable declaration is what you need.

method var's lv-index and lv-start-offset

Variables array can be acquired using MethodNode.getDebugInfo().getLocalVars() next use getStartOffset for lv-start-offset and getRegNum() - argsStartReg for lv-index (you can add getter for argsStartReg field in MethodNode). But be aware that getDebugInfo() can return null if debug info is not available.
Also, if you want to check the code, you can start at reader of vars attributes here.

@NebelNidas
Copy link
Contributor Author

Can I use the localVariable.getStartOffset() directly instead of looping through the class's code metadata looking for the OFFSET annotation?

@NebelNidas NebelNidas changed the title Proper deobfuscation mapping file format Support for exporting to Tiny v2 deobfuscation mapping file format Jun 3, 2022
@skylot
Copy link
Owner

skylot commented Jun 3, 2022

Can I use the localVariable.getStartOffset() directly

You can, but I am not sure how you will match it with VarNode, because there are can be several variables with same reg num. Anyway, looks like your format can't handle such cases, so we can't do much here.

@NebelNidas
Copy link
Contributor Author

use [...] getRegNum() - argsStartReg for lv-index

Are you sure? I have this static method:
image

And since it's static, the local variable index should start at zero, meaning that the lv-index for key would be 0 and for value 1. Using your suggested code:

int lvIndex = varNode.getReg() - methodNode.getArgsStartReg();

I get this:
image

Which I don't think is correct, is it? 🤔


there are can be several variables with same reg num

In which case?

@skylot
Copy link
Owner

skylot commented Jun 3, 2022

Well, I can miss something, that is why I give you a link to a code jadx using to read these values 🙂
And you can check java bytecode and docs for api you are using.
Also, do you know any other tools to work with this format, can you compare or apply/check your output?

In which case?

I don't know, but this can happen.

@NebelNidas
Copy link
Contributor Author

Enigma (download) can do everything except method vars. Using it on the same app as JADX (after running it through dex2jar) and doing the same renames, I get the expected results:
image

@NebelNidas
Copy link
Contributor Author

In which case?

I don't know, but this can happen.

Tiny v2 allows specifying an optional lvt-index:

<lvt-index> is the index into the LocalVariableTable attribute's local_variable_table array, see "local_variable_table" in JVMS SE 8 §4.7.13, not to be confused with "index" referred by <lv-index>

Would this help in such edge cases?

@NebelNidas NebelNidas changed the title Support for exporting to Tiny v2 deobfuscation mapping file format Support for exporting to proper deobfuscation mapping file formats Jun 8, 2022
@NebelNidas NebelNidas marked this pull request as ready for review June 9, 2022 03:26
@NebelNidas NebelNidas marked this pull request as draft June 9, 2022 03:27
@skylot
Copy link
Owner

skylot commented Jun 9, 2022

Shouldn't makeAliasRawFullName also take getAliasPkg() and getAliasShortName() as the first two parameters? If not, then I need to create such a method nevertheless, how should I call it?

Yeah, here should be alias pkg and name. Intention was to make aliased name while not using inner classes, i.e. replace back '.' with '$', but looks like this was not used anywhere. And to be honest, I have no idea why you may need this method.

And my issue: Somehow, the offset I get from here:
is different from the offset that's saved in JADX's project data (JadxCodeRef), computed here:

It is hard to tell. Maybe variable declaration without code don't have offset (like int a;) so you get offset from previous instruction. Jadx don't allow attaching comments for such variables for now, you can open a new issue if this is your case.

@NebelNidas
Copy link
Contributor Author

I'm an absolute idiot... The returned offset was correct the entire time; it was just my test comment that was placed on the wrong line! 🤦
Anyways, after the last two days being completely wasted, this PR is finally done! Definetely not the cleanest code I've ever written, but it does its job. 😅 I'll clean it up in the following PRs when I add mapping-io import support and completely replace the JOBF format with Tiny v2.

@NebelNidas NebelNidas marked this pull request as ready for review June 10, 2022 11:16
@skylot
Copy link
Owner

skylot commented Jun 10, 2022

@NebelNidas please move all your changes to a new class into a jadx-gui module and jadx.gui.plugins.mapping package (or any other name instead mapping). There are several reasons to do this:

  1. Looks like you don't use any methods of deobfuscator class, so don't mix these features
  2. Unfortunately mapping-io don't use maven central repository (and on top of that you just add jar file), so I don't want users of "jadx-lib" also add addition repo.
  3. I am trying to strip all additional features from jadx-core and keep only decompilation in it.
  4. Later I will try to move your changes to a new plugin and use it also in jadx-cli.

I understand that this will be make an import a little harder to implement (may need to add a new API to insert custom jadx passes), but result will be more modular (and less coupled).

@NebelNidas
Copy link
Contributor Author

NebelNidas commented Jun 11, 2022

Looks like you don't use any methods of deobfuscator class, so don't mix these features

I purposefully left it in there, because I want to replace the JOBF format with Tiny v2 in the next PR (and JOBF is also partly handled in the Deobfuscator class). But since that will be a separate PR, I can move it into your suggested package for now, no problem :)

@NebelNidas
Copy link
Contributor Author

Can you wait a few minutes before merging please? I'm making the exporting async

@skylot
Copy link
Owner

skylot commented Jun 11, 2022

Can you wait a few minutes before merging please?

Sure. You can mark PR as draft if it is not ready for merge :)

because I want to replace the JOBF format with Tiny v2 in the next PR

You don't need to replace jobf you're just adding support for other formats.
I plan to extract support for mapping format import/export into a separate optional plugin. Deobfuscator will also be moved to a separate plugin eventually.

I purposefully left it in there

Please, don't mix mapping, deobfuscation and renames these are different things, yes they are related, but yet these are different features. Also, you don't need deobfuscator because node aliases can be accessed (and also updated) from RootNode.

@NebelNidas
Copy link
Contributor Author

Took a bit longer than expected, but it's ready to be merged now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proper deobfuscation mapping file format
2 participants