Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990

Merged
merged 8 commits into from
Jul 10, 2023

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Jun 22, 2023

This PR specializes for things like obj.x where:

class C:
    x = 1
obj = C()
obj.x

Stats
The miss rate for LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES is quite poor at 33%
(Ignore the stats for LOAD_ATTR_NONDESCRIPTOR_LAZY_DICT, I've removed it)

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple notes (and one bug, I think).

Also, your stats seem to indicate that both the success and hit rates for LOAD_ATTR are lower with this change. Are we sure it's worth doing?

If so, this seems like quite a bit of extra code and work just to save a branch on the oparg's low bit at the end of the instruction, in my opinion. I would be surprised if the separate instructions were really more performant.

Comment on lines +858 to +865
if ((instr->op.arg & 1) == 0) {
if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) {
goto success;
}
}
else {
SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find this flow a bit easier to follow:

Suggested change
if ((instr->op.arg & 1) == 0) {
if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) {
goto success;
}
}
else {
SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE);
}
if (instr->op.arg & 1) {
SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE);
}
else if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) {
goto success;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it seems more natural to have the success first, then the failure. A matter of taste, I suppose.

Python/specialize.c Show resolved Hide resolved
Python/specialize.c Show resolved Hide resolved
Python/bytecodes.c Outdated Show resolved Hide resolved
Python/bytecodes.c Outdated Show resolved Hide resolved
@markshannon
Copy link
Member Author

The stats for LOAD_ATTR change from

Kind Count Ratio
specialization.deferred 1149226050 14.2%
specialization.deopt 7438533 0.1%
hit 6549029560 80.9%
miss 394372352 4.9%

to

Kind Count Ratio
specialization.deferred 997276207 11.7%
specialization.deopt 8561434 0.1%
hit 7089151162 83.0%
miss 453922508 5.3%

which is a modest improvement, but the failure stats are now

Failure kind Count Ratio
has managed dict 1,088,824 78.9%
metaclass attribute 113,508 8.2%
method 54,145 3.9%
shadowed 36,806 2.7%
not managed dict 35,316 2.6%
mutable class 19,767 1.4%
overridden 7,724 0.6%
class method obj 6,782 0.5%
class attr descriptor 6,640 0.5%
non object slot 6,120 0.4%
not in keys 1,680 0.1%
non overriding descriptor 1,072 0.1%
module attr not found 800 0.1%
class attr simple 710 0.1%
builtin class method 660 0.0%

Which means that fixing the materialization of __dict__ can get the specialization success (ignoring misses) rate up to about 97%, which is quite good.

@markshannon
Copy link
Member Author

If so, this seems like quite a bit of extra code and work just to save a branch on the oparg's low bit at the end of the instruction.

The oparg & 1 case is inserted for calls. It seems unlikely that a non-descriptor is being called much, as both Python functions and method-descriptors are descriptors. The stats support this. "class attr simple" is a tiny 0.1% of specialization failures with this change.

@markshannon markshannon marked this pull request as ready for review July 4, 2023 16:26
@brandtbucher
Copy link
Member

The oparg & 1 case is inserted for calls. It seems unlikely that a non-descriptor is being called much, as both Python functions and method-descriptors are descriptors. The stats support this. "class attr simple" is a tiny 0.1% of specialization failures with this change.

I'm confused. Wouldn't just adding a branch on (oparg & 1) at the end of the existing LOAD_ATTR_METHOD_WITH_VALUES and LOAD_ATTR_NONDESCRIPTOR_NO_DICT instructions have the exact same effect as this PR, but without adding two whole new instructions that are almost identical to the existing ones?

@markshannon
Copy link
Member Author

I see what you mean now. I thought you were talking about the specialization, not the resulting instruction.
The specialization paths are distinct, but the resulting instruction is very similar.

I'll try that.

@markshannon
Copy link
Member Author

The generated code is noticeably worse for the conditional case.
Comparing the code for the "action" part, the branchless form looks like this:

    Py_DECREF(self);
    res = Py_NewRef(descr);
    stack_pointer[-1] = res;
    next_instr += 9;

but with the oparg test it looks like this:

    res2 = Py_NewRef(descr);
    if (oparg & 1) {
       res = self;
    }
    else {
        Py_DECREF(self);
    }
    STACK_GROW(((oparg & 1) ? 1 : 0));
    if (oparg & 1) { stack_pointer[-(((oparg & 1) ? 1 : 0))] = res; }
    stack_pointer[-(1 + ((oparg & 1) ? 1 : 0))] = res2;
    next_instr += 9;

So, I think the additional instruction is worth it.

@markshannon
Copy link
Member Author

Note, the generated code actually has this sequence in it:

    STACK_GROW((0 ? 1 : 0));
    if (0) { stack_pointer[-(1 + (0 ? 1 : 0))] = res2; }

but even the worst C compiler will eliminate that code.

@brandtbucher
Copy link
Member

So, I think the additional instruction is worth it.

Is the branchy form actually measurably slower? I would imagine that the C compiler would turn the five (oparg & 1) branches into just one or two, and that the resulting code would still be about the same (or better than) the cost of bulking out the interpreter.

It's obviously not a huge deal (they're both okay approaches, which is why I approved this PR), but a branch at the end of the existing instruction feels easier to maintain than two new specialized forms.

@markshannon
Copy link
Member Author

Is the branchy form actually measurably slower?

Maybe not now, but it mostly likely will be when JIT compiled.

Is the branchy form otherwise better?
Not really. It may reduce the number of instructions, but it makes the resulting instruction harder to read and optimize.

@markshannon markshannon merged commit 0c90e75 into python:main Jul 10, 2023
@markshannon markshannon deleted the specialize-more-load-attr branch August 6, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants