-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990
GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple notes (and one bug, I think).
Also, your stats seem to indicate that both the success and hit rates for LOAD_ATTR
are lower with this change. Are we sure it's worth doing?
If so, this seems like quite a bit of extra code and work just to save a branch on the oparg's low bit at the end of the instruction, in my opinion. I would be surprised if the separate instructions were really more performant.
if ((instr->op.arg & 1) == 0) { | ||
if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) { | ||
goto success; | ||
} | ||
} | ||
else { | ||
SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I find this flow a bit easier to follow:
if ((instr->op.arg & 1) == 0) { | |
if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) { | |
goto success; | |
} | |
} | |
else { | |
SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE); | |
} | |
if (instr->op.arg & 1) { | |
SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE); | |
} | |
else if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) { | |
goto success; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it seems more natural to have the success first, then the failure. A matter of taste, I suppose.
The stats for
to
which is a modest improvement, but the failure stats are now
Which means that fixing the materialization of |
The |
I'm confused. Wouldn't just adding a branch on |
I see what you mean now. I thought you were talking about the specialization, not the resulting instruction. I'll try that. |
The generated code is noticeably worse for the conditional case. Py_DECREF(self);
res = Py_NewRef(descr);
stack_pointer[-1] = res;
next_instr += 9; but with the res2 = Py_NewRef(descr);
if (oparg & 1) {
res = self;
}
else {
Py_DECREF(self);
}
STACK_GROW(((oparg & 1) ? 1 : 0));
if (oparg & 1) { stack_pointer[-(((oparg & 1) ? 1 : 0))] = res; }
stack_pointer[-(1 + ((oparg & 1) ? 1 : 0))] = res2;
next_instr += 9; So, I think the additional instruction is worth it. |
Note, the generated code actually has this sequence in it: STACK_GROW((0 ? 1 : 0));
if (0) { stack_pointer[-(1 + (0 ? 1 : 0))] = res2; } but even the worst C compiler will eliminate that code. |
Is the branchy form actually measurably slower? I would imagine that the C compiler would turn the five It's obviously not a huge deal (they're both okay approaches, which is why I approved this PR), but a branch at the end of the existing instruction feels easier to maintain than two new specialized forms. |
Maybe not now, but it mostly likely will be when JIT compiled. Is the branchy form otherwise better? |
This PR specializes for things like
obj.x
where:Stats
The miss rate for
LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES
is quite poor at 33%(Ignore the stats for
LOAD_ATTR_NONDESCRIPTOR_LAZY_DICT
, I've removed it)