Codealign is a tool for evaluating neural decopmilers that computes equivalence between two input functions at the instruction level. The intended use case is comparing the predictions of a neural decompiler with a reference correct answer like the original source code.
This work was introduced in Fast, Fine-Grained Equivalence Checking for Neural Decompilers
Codealign is a python package.
git clone https://github.com/squaresLab/codealign.git
cd codealign
pip install .
then optionally
python -m unittest
from codealign import align, Alignment
prediction = """
int write_response(int fd, char *buf, int len) {
int i;
for (i = 0; i < len; i += len) {
if ((i = write(fd, buf + i, len - i)) <= 0)
return 0;
}
return 1;
}
"""
reference = """
int write_response(int fd, char *response, int len) {
int retval;
int byteswritten = 0;
while (byteswritten < len) {
retval = write(fd, response + byteswritten, len - byteswritten);
if (retval <= 0) {
return 0;
}
byteswritten += retval;
}
return 1;
}
"""
alignment: Alignment = align(prediction, reference, 'c', partial_loops=True)
print(alignment)
Will yield
Alignment(candidate=write_response, reference=write_response)
%1 = phi 0 %7
%1 = phi 0 %8
%2 = < %1 len
%3 = < %1 len
loop %2
loop %3
%7 = + %5 len
%3 = + buf %1
%4 = + response %1
%4 = - len %1
%5 = - len %1
%5 = write(fd, %3, %4)
%6 = write(fd, %4, %5)
%6 = <= %5 0
%7 = <= %6 0
if %6
if %7
return 0
return 0
return 1
return 1
%8 = + %1 %6
Equivalent instructions are grouped together.
Alignment objects and be interacted with programmatically via several methods.
alignment.candidate_ir
alignment.reference_ir
These allow for access to individual functions in terms of codealign's internal representation.
alignment.alignment_list
Represents the alignment as pairs of instructions in the order (candidate_instruction, reference_instruction)
.
If an instruction does not align with anything the corresponding value will be None
.
Except in injective mode, an instruction can occur in more than one pair if it aligns with multiple other instructions.
alignment[instruction] # read-only
Returns the instruction(s) with which a given instruction is aligned.
Instructions in codealign IR can be found in the .candidate_ir
and .reference_ir
attributes.
Codealign describes code in term of an internal object model.
These can be imported from codealign.ir
.
The codealign object model includes, but is not limited to
Function
BasicBlock
SSAOperator
VarOperator
Variable
Where possible, codealign provides references to the tree-sitter
AST nodes from which a given instruction was derived.
To access this, use
from codealign.ir import SSAOperator
instruction: SSAOperator
instruction.ast_node