GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990

markshannon · 2023-06-22T15:46:47Z

This PR specializes for things like obj.x where:

class C:
    x = 1
obj = C()
obj.x

Stats
The miss rate for LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES is quite poor at 33%
(Ignore the stats for LOAD_ATTR_NONDESCRIPTOR_LAZY_DICT, I've removed it)

Issue: Finish up LOAD_ATTR specialisation #100288

brandtbucher

A couple notes (and one bug, I think).

Also, your stats seem to indicate that both the success and hit rates for LOAD_ATTR are lower with this change. Are we sure it's worth doing?

If so, this seems like quite a bit of extra code and work just to save a branch on the oparg's low bit at the end of the instruction, in my opinion. I would be surprised if the separate instructions were really more performant.

brandtbucher · 2023-06-22T21:45:28Z

Python/specialize.c

+            if ((instr->op.arg & 1) == 0) {
+                if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) {
+                    goto success;
+                }
+            }
+            else {
+                SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE);
+            }


Nit: I find this flow a bit easier to follow:

Suggested change

if ((instr->op.arg & 1) == 0) {

if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) {

goto success;

}

}

else {

SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE);

}

if (instr->op.arg & 1) {

SPECIALIZATION_FAIL(LOAD_ATTR, SPEC_FAIL_ATTR_CLASS_ATTR_SIMPLE);

}

else if (specialize_attr_loadclassattr(owner, instr, name, descr, kind, false)) {

goto success;

}

To me, it seems more natural to have the success first, then the failure. A matter of taste, I suppose.

Python/specialize.c

Python/bytecodes.c

markshannon · 2023-06-26T13:20:36Z

The stats for LOAD_ATTR change from

Kind	Count	Ratio
specialization.deferred	1149226050	14.2%
specialization.deopt	7438533	0.1%
hit	6549029560	80.9%
miss	394372352	4.9%

to

Kind	Count	Ratio
specialization.deferred	997276207	11.7%
specialization.deopt	8561434	0.1%
hit	7089151162	83.0%
miss	453922508	5.3%

which is a modest improvement, but the failure stats are now

Failure kind	Count	Ratio
has managed dict	1,088,824	78.9%
metaclass attribute	113,508	8.2%
method	54,145	3.9%
shadowed	36,806	2.7%
not managed dict	35,316	2.6%
mutable class	19,767	1.4%
overridden	7,724	0.6%
class method obj	6,782	0.5%
class attr descriptor	6,640	0.5%
non object slot	6,120	0.4%
not in keys	1,680	0.1%
non overriding descriptor	1,072	0.1%
module attr not found	800	0.1%
class attr simple	710	0.1%
builtin class method	660	0.0%

Which means that fixing the materialization of __dict__ can get the specialization success (ignoring misses) rate up to about 97%, which is quite good.

markshannon · 2023-07-04T16:15:23Z

If so, this seems like quite a bit of extra code and work just to save a branch on the oparg's low bit at the end of the instruction.

The oparg & 1 case is inserted for calls. It seems unlikely that a non-descriptor is being called much, as both Python functions and method-descriptors are descriptors. The stats support this. "class attr simple" is a tiny 0.1% of specialization failures with this change.

brandtbucher · 2023-07-06T21:57:55Z

The oparg & 1 case is inserted for calls. It seems unlikely that a non-descriptor is being called much, as both Python functions and method-descriptors are descriptors. The stats support this. "class attr simple" is a tiny 0.1% of specialization failures with this change.

I'm confused. Wouldn't just adding a branch on (oparg & 1) at the end of the existing LOAD_ATTR_METHOD_WITH_VALUES and LOAD_ATTR_NONDESCRIPTOR_NO_DICT instructions have the exact same effect as this PR, but without adding two whole new instructions that are almost identical to the existing ones?

markshannon · 2023-07-07T09:07:09Z

I see what you mean now. I thought you were talking about the specialization, not the resulting instruction.
The specialization paths are distinct, but the resulting instruction is very similar.

I'll try that.

markshannon · 2023-07-07T09:44:05Z

The generated code is noticeably worse for the conditional case.
Comparing the code for the "action" part, the branchless form looks like this:

    Py_DECREF(self);
    res = Py_NewRef(descr);
    stack_pointer[-1] = res;
    next_instr += 9;

but with the oparg test it looks like this:

    res2 = Py_NewRef(descr);
    if (oparg & 1) {
       res = self;
    }
    else {
        Py_DECREF(self);
    }
    STACK_GROW(((oparg & 1) ? 1 : 0));
    if (oparg & 1) { stack_pointer[-(((oparg & 1) ? 1 : 0))] = res; }
    stack_pointer[-(1 + ((oparg & 1) ? 1 : 0))] = res2;
    next_instr += 9;

So, I think the additional instruction is worth it.

markshannon · 2023-07-07T09:49:36Z

Note, the generated code actually has this sequence in it:

    STACK_GROW((0 ? 1 : 0));
    if (0) { stack_pointer[-(1 + (0 ? 1 : 0))] = res2; }

but even the worst C compiler will eliminate that code.

brandtbucher · 2023-07-07T17:41:01Z

So, I think the additional instruction is worth it.

Is the branchy form actually measurably slower? I would imagine that the C compiler would turn the five (oparg & 1) branches into just one or two, and that the resulting code would still be about the same (or better than) the cost of bulking out the interpreter.

It's obviously not a huge deal (they're both okay approaches, which is why I approved this PR), but a branch at the end of the existing instruction feels easier to maintain than two new specialized forms.

markshannon · 2023-07-10T10:40:08Z

Is the branchy form actually measurably slower?

Maybe not now, but it mostly likely will be when JIT compiled.

Is the branchy form otherwise better?
Not really. It may reduce the number of instructions, but it makes the resulting instruction harder to read and optimize.

markshannon added 3 commits June 19, 2023 02:22

Add three more specializations of LOAD_ATTR.

ed977a1

Remove LOAD_ATTR_NONDESCRIPTOR_LAZY_DICT

192d074

Address review comments

3d23b98

bedevere-bot mentioned this pull request Jun 22, 2023

Finish up LOAD_ATTR specialisation #100288

Open

2 tasks

brandtbucher reviewed Jun 22, 2023

View reviewed changes

markshannon added 2 commits July 4, 2023 04:03

Merge branch 'main' into specialize-more-load-attr

d258c0f

Add news

1eec620

markshannon marked this pull request as ready for review July 4, 2023 16:26

bedevere-bot added the awaiting core review label Jul 4, 2023

markshannon added 3 commits July 5, 2023 17:04

Give the interpreter generator a helping hand.

5861e52

Merge branch 'main' into specialize-more-load-attr

16a2718

Merge branch 'main' into specialize-more-load-attr

8853284

brandtbucher approved these changes Jul 6, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jul 6, 2023

markshannon merged commit 0c90e75 into python:main Jul 10, 2023

bedevere-bot removed the awaiting merge label Jul 10, 2023

kumaraditya303 added a commit that referenced this pull request Jul 10, 2023

GH-100288: regen cases after #105990 (#106589)

3f9bc86

markshannon deleted the specialize-more-load-attr branch August 6, 2024 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990

GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990

markshannon commented Jun 22, 2023 •

edited by bedevere-bot

Loading

brandtbucher left a comment

brandtbucher Jun 22, 2023

markshannon Jul 4, 2023

markshannon commented Jun 26, 2023

markshannon commented Jul 4, 2023

brandtbucher commented Jul 6, 2023

markshannon commented Jul 7, 2023

markshannon commented Jul 7, 2023

markshannon commented Jul 7, 2023

brandtbucher commented Jul 7, 2023

markshannon commented Jul 10, 2023

GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990

GH-100288: Specialize LOAD_ATTR for simple class attributes. #105990

Conversation

markshannon commented Jun 22, 2023 • edited by bedevere-bot Loading

brandtbucher left a comment

Choose a reason for hiding this comment

brandtbucher Jun 22, 2023

Choose a reason for hiding this comment

markshannon Jul 4, 2023

Choose a reason for hiding this comment

markshannon commented Jun 26, 2023

markshannon commented Jul 4, 2023

brandtbucher commented Jul 6, 2023

markshannon commented Jul 7, 2023

markshannon commented Jul 7, 2023

markshannon commented Jul 7, 2023

brandtbucher commented Jul 7, 2023

markshannon commented Jul 10, 2023

markshannon commented Jun 22, 2023 •

edited by bedevere-bot

Loading