Centered rib #257

nix-apollo · 2023-12-11T14:13:39Z

Centred rib

Description

Incorporates a centring matrix Y (aka Gamma) into the calculation of the rib rotation
Isolates the bias direction when center = true, returning it in the 0th position.
Reorganizes the order of the residual stream so the last position is always the bias position.

Related Issue

Closes #248

Motivation and Context

This is a first version of centered rib. It seems likely that we might want to handle lambda and/or edge calculation differently in the future to make the calculation more principled. For instance, the baseline for IG is still the 0 point, instead of the mean activation.

How Has This Been Tested?

I have a test that checks invariants for the output of the centered rib build. Including that there is a single constant direction pointing in the direction we expect, and that activations in all other rib directions are centered.

This code was also used for various analysis in the OP report, where it seemed to do reasonable things.

Does this PR introduce a breaking change?

Residual stream reorder may break some analysis code. No interface changes.

rib/interaction_algos.py

rib/linalg.py

rib/interaction_algos.py

danbraunai-apollo

Haven't looked deeply yet, but flagging that the test_centered_rib_modadd() is failing heavily on this line:

        # Check 2: no other rib dir has non-zero component at bias positions
        assert_is_zeros(C_inv[1:][:, bias_positions], atol=atol, m_name=m_name)

where the tensor has many values that are on the order 1e-2.

danbraunai-apollo

I accidentally pressed submit review too early, so combine this review with the subsequent one. Sorry.

Misc:

EDIT: Oh, I see you have a todo for this as a comment, all good. Maybe put todos in the PR description so it's easier to find. Still an issue with the centered_rib_test causing tests to fail.
I don't like or understand why all of the test__build_graph went from an atol=0 to a bigger atol (and mnist went from 1e-5 to 1e-4), when none of those use centering. Hopefully a solution to the above issue will fix this one. Any ideas what this could be, or where the computation is different?
I think shift_matrix could do with unit tests. And maybe even elaborate on the example in the docstring to give an example x where it's used to shift the activations by the mean. I just find that function very confusing even after previously spending time and understanding it.
I wouldn't mind a find + replace on centered -> centred. My bad for not being consistent here.

rib/analysis_utils.py

rib/interaction_algos.py

rib/linalg.py

rib/utils.py

danbraunai-apollo

Looks good. Some requested changes/confusions.

tests/test_build_graph.py

nix-apollo · 2024-01-15T11:15:00Z

We only ever need a bias position for both:

making a shift matrix
identifying the constant rib dir (to move it to the 0th in ordering)
both of these only need a single constant pos. A good simplification would be to have get_dataset_means return a single int for the bias pos instead of a tensor. Even better this could be a separate helper function that only needs the means as input.

Possibly I'll postpone this and not improve the current situation in this PR.

…ch/rib into feature/centred-rib

nix-apollo · 2024-01-17T11:13:51Z

Re: test tolerance. This was because I had accidentally made a test stricter by going from rtol=1e-5 (pytorch's default) to rtol=0 (pytest's default). Not because computation got less precise. I've reverted the change for consistency.

danbraunai-apollo

Approved. Looks great, and much cleaner after swapping residual stream to be at the bottom. I really like your updates to calculate_interaction_rotations. That function is actually (on the way to becoming) understandable now.

I made a few comments. Have a quick look, and update yourself or shout out if you see an issue with them. Then you can merge.

rib/data_accumulator.py

tests/test_build_graph.py

nix-apollo added 30 commits December 4, 2023 11:59

add rib acts hook

e56c81f

add analysis utils

10ceea0

fix mypy errors

b807413

add pca in interaction_rotation

53b4d40

cherrypick of 1d12719

9eb5b3f

mean to sum

0f9ae5b

allow pca in mlp rib build

a29faf1

fix pca

adb9866

fix output pca

be34b4e

fix gram centring

681e1bc

Merge remote-tracking branch 'origin/main' into feature/get-rib-acts

2d3fc6b

rib acts fixes

58ef6a7

add test for get_rib_acts

80141f2

add helper for append to hooked list

bf1002a

clean test

69ad919

device error

e517af9

fix other device error

bdc15d8

move model and dataset loading to utility

935aaf6

mend

d78f0dd

adjust atols

a32f833

add runslow when debugging tests

81d5d67

only cache the activations we need

bdd918f

docsting

fc11243

fix test device

1ca1ef9

Merge branch 'feature/get-rib-acts' into feature/pca

2206021

cleanup hooks

29faab2

pca test

01f11c7

track all bias positions

af52024

fixes for pca

65e914e

add mlp test

06541e7

nix-apollo commented Jan 11, 2024

View reviewed changes

rib/interaction_algos.py Outdated Show resolved Hide resolved

nix-apollo commented Jan 11, 2024

View reviewed changes

rib/linalg.py Outdated Show resolved Hide resolved

nix-apollo commented Jan 11, 2024

View reviewed changes

rib/interaction_algos.py Show resolved Hide resolved

nix-apollo changed the title ~~[wip] centred rib~~ Centered rib Jan 11, 2024

danbraunai-apollo reviewed Jan 12, 2024

View reviewed changes

danbraunai-apollo requested changes Jan 12, 2024

View reviewed changes

tests/test_build_graph.py Outdated Show resolved Hide resolved

tests/test_build_graph.py Outdated Show resolved Hide resolved

nix-apollo added 2 commits January 15, 2024 10:10

Merge remote-tracking branch 'origin/main' into feature/centred-rib

2a2ee0f

test cleanup

d5fbc4f

nix-apollo and others added 11 commits January 15, 2024 17:15

no longer compute bias pos in get_means

ab0a130

atol consistency

5d35558

move residual position

b3d1cfd

always use -1 as bias pos

f6df0b1

Fix dtype mismatch in test_collect_dataset_means_pythia

ede6eb7

Add types and simple asserts

e988f81

cleanup interaction algo code

b4fb69e

Merge branch 'feature/centred-rib' of https://github.com/ApolloResear…

08e6a70

…ch/rib into feature/centred-rib

add assert about centered rib with orthogonal ablations

8ae7131

spelling fix

9c4f6b4

add assert

e5a0311

adjust tests

4cc7601

nix-apollo requested a review from danbraunai-apollo January 17, 2024 11:25

danbraunai-apollo added 4 commits January 17, 2024 11:43

Fix SequentialTransformer images

385ef38

Remove outdated docstring

57721a0

Add unittest for centering matrix

1828a34

Minor fixes and doc improvements

f1ae656

danbraunai-apollo approved these changes Jan 17, 2024

View reviewed changes

rib/data_accumulator.py Outdated Show resolved Hide resolved

tests/test_build_graph.py Show resolved Hide resolved

nix-apollo merged commit aaa53ea into main Jan 17, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centered rib #257

Centered rib #257

nix-apollo commented Dec 11, 2023 •

edited by danbraunai-apollo

Loading

danbraunai-apollo left a comment

danbraunai-apollo left a comment •

edited

Loading

danbraunai-apollo left a comment

nix-apollo commented Jan 15, 2024

nix-apollo commented Jan 17, 2024

danbraunai-apollo left a comment

Centered rib #257

Centered rib #257

Conversation

nix-apollo commented Dec 11, 2023 • edited by danbraunai-apollo Loading

Centred rib

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

danbraunai-apollo left a comment

Choose a reason for hiding this comment

danbraunai-apollo left a comment • edited Loading

Choose a reason for hiding this comment

danbraunai-apollo left a comment

Choose a reason for hiding this comment

nix-apollo commented Jan 15, 2024

nix-apollo commented Jan 17, 2024

danbraunai-apollo left a comment

Choose a reason for hiding this comment

nix-apollo commented Dec 11, 2023 •

edited by danbraunai-apollo

Loading

danbraunai-apollo left a comment •

edited

Loading