Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement vm message extraction #1027

Merged
merged 9 commits into from
Aug 10, 2022
Merged

feat: implement vm message extraction #1027

merged 9 commits into from
Aug 10, 2022

Conversation

frrist
Copy link
Member

@frrist frrist commented Aug 2, 2022

What

This PR addresses #978 by:

  1. Implementing a vm_messages model and schema for tracking internal messages (off-chain) sent by the VM during on-chain message execution
  2. Modifies lily to extract messages sent by the VM.
An example of the queries this allows:

Select a message sent to a multisig actor that proposes sending a method with the below params

select *
from parsed_messages
where cid = 'bafy2bzacec4mncfpdvu4rl4szr74qajty3wa6jxwallgfl4w2quf6hzfd5m42'
cid height from to value method params
bafy2bzacec4mncfpdvu4rl4szr74qajty3wa6jxwallgfl4w2quf6hzfd5m42 2020710 f1s5fmqz3aa4mvmoi235jarkj2f2xxuymrhjxochi f2ltn3tgp2a7uldxn2fgftzmodfcjtcu655p3vjmi 0 Propose {"To": "f01848682", "Value": {"Int": 0}, "Method": 16, "Params": "gUoAAXuoIZfGfo54"}

Select all VM messages (via source) resulting from the execution of this proposed method.

select *
from vm_messages
where source = 'bafy2bzacec4mncfpdvu4rl4szr74qajty3wa6jxwallgfl4w2quf6hzfd5m42'
height state_root cid source from to value method actor_code exit_code gas_used params returns
2020710 bafy2bzacedb66jfje3d6pcy64hnpdxuurkvl5p36n27axjg7dpk6ohoj3xoyu bafy2bzaceaixcm7hv6qtgqb7ccw7tayxcu64c4qcdcqtf332uk3vrwol6ozrw bafy2bzacec4mncfpdvu4rl4szr74qajty3wa6jxwallgfl4w2quf6hzfd5m42 f01833148 f01848682 0 16 bafk2bzacecgnynvd3tene3bvqoknuspit56canij5bpra6wl4mrq2mxxwriyu 0 0 {"AmountRequested": {"Int": 27357152872216039032}} "27357152872216039032"
2020710 bafy2bzacedb66jfje3d6pcy64hnpdxuurkvl5p36n27axjg7dpk6ohoj3xoyu bafy2bzacebhlcwggejv4mgb6hgvoqmxvaouzpysdtdwlgg4gnbdgqnzkwak72 bafy2bzacec4mncfpdvu4rl4szr74qajty3wa6jxwallgfl4w2quf6hzfd5m42 f01848682 f01833148 27357152872216039032 0 bafk2bzacebhldfjuy4o5v7amrhp5p2gzv2qo5275jut4adnbyp56fxkwy5fag 0 0 NULL NULL

We see that the proposed message was used to withdraw a balance of from miner actor by calling the miners WithdrawBalance method (via bafy2bzaceaixcm7hv6qtgqb7ccw7tayxcu64c4qcdcqtf332uk3vrwol6ozrw), which then resulted in the miner actor performing a send of the requested amount (via bafy2bzacebhlcwggejv4mgb6hgvoqmxvaouzpysdtdwlgg4gnbdgqnzkwak72)

@frrist frrist linked an issue Aug 3, 2022 that may be closed by this pull request
2 tasks
@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2022

Codecov Report

Merging #1027 (b6b937f) into master (ead40c9) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff           @@
##           master   #1027   +/-   ##
======================================
  Coverage    35.6%   35.6%           
======================================
  Files          44      44           
  Lines        2879    2881    +2     
======================================
+ Hits         1025    1027    +2     
  Misses       1750    1750           
  Partials      104     104           

@frrist frrist marked this pull request as ready for review August 3, 2022 02:10
@frrist frrist self-assigned this Aug 3, 2022
Comment on lines 22 to 28
ALTER TABLE ONLY {{ .SchemaName | default "public"}}.vm_messages ADD CONSTRAINT vm_messages_pkey PRIMARY KEY (height, state_root, cid, source);
CREATE INDEX vm_messages_height_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BRIN (height);
CREATE INDEX vm_messages_cid_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH (cid);
CREATE INDEX vm_messages_source_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH (source);
CREATE INDEX vm_messages_from_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("from");
CREATE INDEX vm_messages_to_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("to");
CREATE INDEX vm_messages_actor_code_method_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (actor_code, method);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasteph @placer14 here are the index definitions we discussed in #978, please review.

@frrist frrist mentioned this pull request Aug 3, 2022
2 tasks
Copy link
Contributor

@placer14 placer14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticky approve. Nice and clean! 🤝

@@ -178,8 +182,9 @@ var TableFieldComments = map[string]map[string]string{
"DealID": "Identifier for the deal.",
"EndEpoch": "The epoch at which this deal with end.",
"Height": "Epoch at which this deal proposal was added or changed.",
"IsString": "Related to FIP: https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0027.md",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to read the FIP to get the gist.

Suggested change
"IsString": "Related to FIP: https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0027.md",
"IsString": "Indicates if the value of Label is a string. Assume binary blob otherwise. Related to FIP: https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0027.md",

@@ -197,6 +202,21 @@ var TableFieldComments = map[string]map[string]string{
ParsedMessage: {},
InternalMessage: {},
InternalParsedMessage: {},
VmMessage: {
"ActorCode": "ActorCode of To (receiver)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ActorCode": "ActorCode of To (receiver)",
"ActorCode": "ActorCode of To (receiver).",

// Cid of the message.
Cid string `pg:",pk,notnull"`
// On-chain message triggering the message.
Source string `pg:",pk,notnull"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need source to be included in the primary key. cid should have enough uniqueness within a height/stateroot.

Suggested change
Source string `pg:",pk,notnull"`
Source string `pg:",notnull"`

Copy link
Member Author

@frrist frrist Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source must remain a primary key as VM message CIDs can conflict at the same height/stateroot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't cid the CID of the intermediate (internal) message? In other words, one source has many cids? If so, then the index should go on the column with the higher cardinality (between cid and source) which I expect to be the intermediate message CIDs and I understand that to be the cid column.

Copy link
Member Author

@frrist frrist Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't cid the CID of the intermediate (internal) message, In other words, one source has many cids?

Yes correct.

Are you proposing we drop source as a primary key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Just remove from the PK to reduce the size of that index. We can use the standalone hash index for source if we ever need to condition on that column. It should reduce the size of the PK index without any performance cost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. After some discussion w @frrist, it seems that the cid is not unique and so source is required in the PK afterall. 😢

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably recommend the removal of the hash-based source index. I think both indices are redundant and if the PK index isn't getting a trim, let's drop the other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on dropping the source hash index esp if it's part of the PK.

// Value attoFIL contained in message.
Value string `pg:"type:numeric,notnull"`
// Method called on To (receiver)
Method uint64 `pg:",use_zero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not null? (I know this is inconsequential given we write our own migrations, but it's good context that doesn't hurt to include and keeps consistency w other fields usage of the same tag.)

Suggested change
Method uint64 `pg:",use_zero"`
Method uint64 `pg:",notnull,use_zero"`

// ActorCode of To (receiver)
ActorCode string `pg:",notnull"`
// ExitCode of message execution.
ExitCode int64 `pg:",use_zero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ExitCode int64 `pg:",use_zero"`
ExitCode int64 `pg:",notnull,use_zero"`

// ExitCode of message execution.
ExitCode int64 `pg:",use_zero"`
// GasUsed by message.
GasUsed int64 `pg:",use_zero"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GasUsed int64 `pg:",use_zero"`
GasUsed int64 `pg:",notnull,use_zero"`

schemas/v1/8_vm_messages.go Show resolved Hide resolved
returns jsonb
);
ALTER TABLE ONLY {{ .SchemaName | default "public"}}.vm_messages ADD CONSTRAINT vm_messages_pkey PRIMARY KEY (height, state_root, cid, source);
CREATE INDEX vm_messages_height_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BRIN (height);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think BRIN indexing is dependent on write order, which is use-case specific. BTREE is the better generalized index here.

Suggested change
CREATE INDEX vm_messages_height_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BRIN (height);
CREATE INDEX vm_messages_height_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (height);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, will use btree instead.

COMMENT ON COLUMN {{ .SchemaName | default "public"}}.vm_messages.method IS 'The method number invoked on the recipient actor. Only unique to the actor the method is being invoked on. A method number of 0 is a plain token transfer - no method execution';
COMMENT ON COLUMN {{ .SchemaName | default "public"}}.vm_messages.actor_code IS 'The CID of the actor that received the message.';
COMMENT ON COLUMN {{ .SchemaName | default "public"}}.vm_messages.exit_code IS 'The exit code that was returned as a result of executing the message.';
COMMENT ON COLUMN {{ .SchemaName | default "public"}}.vm_messages.gas_used IS 'A measure of the amount of resources (or units of gas) consumed, in order to execute a message.';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: Do you know what resources this would be other than gas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the description of gas here that we use elsewhere, e.g.: https://github.com/filecoin-project/lily/blob/v0.10.0/schemas/v1/base.go#L391

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspected it was copypasta. Just wondering if there is anything other than gas that it would have been referring to and couldn't think of.

Comment on lines 23 to 28
CREATE INDEX vm_messages_height_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (height);
CREATE INDEX vm_messages_cid_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH (cid);
CREATE INDEX vm_messages_source_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH (source);
CREATE INDEX vm_messages_from_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("from");
CREATE INDEX vm_messages_to_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("to");
CREATE INDEX vm_messages_actor_code_method_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (actor_code, method);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to do a write-optimized schema, I'd recommend to get rid of several of these indices since all these indices need to be updated whenever an insert or update happens.

I propose the following:

Suggested change
CREATE INDEX vm_messages_height_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (height);
CREATE INDEX vm_messages_cid_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH (cid);
CREATE INDEX vm_messages_source_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH (source);
CREATE INDEX vm_messages_from_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("from");
CREATE INDEX vm_messages_to_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("to");
CREATE INDEX vm_messages_actor_code_method_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (actor_code, method);
CREATE INDEX vm_messages_height_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (height);
CREATE INDEX vm_messages_from_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("from");
CREATE INDEX vm_messages_to_idx ON {{ .SchemaName | default "public"}}.vm_messages USING HASH ("to");

Regarding:

CREATE INDEX vm_messages_actor_code_method_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (actor_code, method);

This only makes sense if we have queries that have a filter that's where actor_code = x and method = b or where actor_code = x . This wouldn't work well with filters: where method = b and actor_code = x or where method = b.

Therefore, I'd recommend based on Mike's comment here:

CREATE INDEX vm_messages_actor_code_idx ON {{ .SchemaName | default "public"}}.vm_messages USING BTREE (actor_code);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only makes sense if we have queries that have a filter that's where actor_code = x and method = b

This will be a common query since the method number depends on the actor code. i.e. method numbers only have meaning when compared with the actor they were called on.

@frrist
Copy link
Member Author

frrist commented Aug 9, 2022

@kasteph, I addressed your feedback, can you give this another look and a ✅ if it looks okay?

@kasteph
Copy link
Contributor

kasteph commented Aug 10, 2022

@frrist looks good, approved and thank you!

@frrist frrist merged commit fb9bc26 into master Aug 10, 2022
@frrist frrist deleted the frrist/vm-messages branch August 10, 2022 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full internal message extraction
4 participants