Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add parent cache verification and a setting to enable it #1265

Merged
merged 23 commits into from
Sep 18, 2020

Conversation

cryptonemo
Copy link
Collaborator

When the cache files are generated, they are hashed and that hash is
persisted. When the cache file is opened, it is optionally verified
for integrity via the 'verify_cache' setting (disabled by default).

When this code is run, if the checksum files do not exist, we
re-generate the parent cache file in order to write out that checksum
file since what's already on disk may be corrupted.

Resolves #1264

@@ -33,7 +35,7 @@ pub struct ParentCache {
#[derive(Debug)]
struct CacheData {
/// This is a large list of fixed (parent) sized arrays.
data: memmap::Mmap,
pub data: memmap::Mmap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to make this pub, I believe

.with_context(|| format!("could not mmap path={}", path.display()))?
};
hasher.update(&data);
drop(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is likely more efficient to use std::io::ReadBuf with a regular file open, than mmap for just hashing the data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, do you know of any fast rust hashers/crates made specifically for large data/files that might fit this use? I was wondering if something like that may work out all around better, but figured I'd stick with what we already have support for in the first cut.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the systems we run on sha2 is likely good enough, the other fast alternative I could suggest is https://github.com/BLAKE3-team/BLAKE3

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks, I tried blake2s since it was already supported, but switched back to sha2 ;-) I'll give the ReadBuf approach a go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was hoping for more of a speed-up, but if we loop over full buffers, it's slightly faster. I'll update shortly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very odd, on second test it appears much slower. Hrm ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dig, could you take a look at the commented out attempt at doing this? I tested several times and it appears much slower, so I'm probably doing something wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right to conclude dig's suggestion didn't turn out to be faster? (It seems not to have been taken in the current version.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As added, it was far slower in my testing, so I reverted. I suspect dig may know a faster method or way of applying it, so it can be improved over time as needed.

Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this validates the cache file against a checksum created locally, it doesn't check for absolute correctness. It therefore does not in any way address the situation in which bad hardware (especially RAM) leads to a wrongly-generated local cache. This is the primary use case for this integrity check. We need to calculate and distribute canonical checksums and use those for the validation.

One possible solution is to extend parameters.json to also handle this more general case. Another would be to create a new manifest file for the caches.

@cryptonemo
Copy link
Collaborator Author

Although this validates the cache file against a checksum created locally, it doesn't check for absolute correctness. It therefore does not in any way address the situation in which bad hardware (especially RAM) leads to a wrongly-generated local cache. This is the primary use case for this integrity check. We need to calculate and distribute canonical checksums and use those for the validation.

One possible solution is to extend parameters.json to also handle this more general case. Another would be to create a new manifest file for the caches.

Sure, I'm open to all of that, however, since re-generating the parent cache in all reported cases 'fixes' the problem, this is meant to address that aspect.

@porcuquine
Copy link
Collaborator

I might be misunderstanding. Consider the case in which the user has generated a bad cache locally and created a checksum memorializing that bad cache as 'correct'. When the cache is checked for correctness later, it will be consistent with its original value and therefore won't be regenerated, right? If so, I don't see how this will fix the problem. The bad cache will still be used, and the seal results will be wrong.

@cryptonemo
Copy link
Collaborator Author

I might be misunderstanding. Consider the case in which the user has generated a bad cache locally and created a checksum memorializing that bad cache as 'correct'. When the cache is checked for correctness later, it will be consistent with its original value and therefore won't be regenerated, right? If so, I don't see how this will fix the problem. The bad cache will still be used, and the seal results will be wrong.

There are 2 separate cases that I'm aware of, one of which can be solved with a general software solution.

  1. Parameter files and/or parent cache files become corrupt on disk. This solution addresses that issue because we can fix this. Lotus checks parameters and can report when they do not match. This PR does a similar thing for the parent cache and further regenerates them in that case*.

  2. The user has bad RAM and cannot generate a proper cache. In this case, there is no general software solution. We could add the additional consistency check to check the consistency check being added though, and simply report a mis-match. But even if we downloaded known good cache files and checked those are consistent, bad RAM will continue to generate bad proofs as they are loaded. Same for the parameter files, which may exist perfectly on disk. The solution here is to test the RAM and/or replace it. In this case, re-generating parameters may also cause errors, but the problem (and solution) is the same, regardless.

I believe that #1 is primarily the one we're interested in because all reports I'm aware of are solved by either 1) Re-generating the parent cache (by deleting it), or 2) testing RAM and physically replacing the hardware when needed (or adjust speed settings, etc).

@porcuquine
Copy link
Collaborator

I might be misunderstanding. Consider the case in which the user has generated a bad cache locally and created a checksum memorializing that bad cache as 'correct'. When the cache is checked for correctness later, it will be consistent with its original value and therefore won't be regenerated, right? If so, I don't see how this will fix the problem. The bad cache will still be used, and the seal results will be wrong.

There are 2 separate cases that I'm aware of, one of which can be solved with a general software solution.

  1. Parameter files and/or parent cache files become corrupt on disk. This solution addresses that issue because we can fix this. Lotus checks parameters and can report when they do not match. This PR does a similar thing for the parent cache and further regenerates them in that case*.

I am not aware of disk corruption being a significant risk — although it's good if we protect against it.

  1. The user has bad RAM and cannot generate a proper cache. In this case, there is no general software solution. We could add the additional consistency check to check the consistency check being added though, and simply report a mis-match. But even if we downloaded known good cache files and checked those are consistent, bad RAM will continue to generate bad proofs as they are loaded. Same for the parameter files, which may exist perfectly on disk. The solution here is to test the RAM and/or replace it. In this case, re-generating parameters may also cause errors, but the problem (and solution) is the same, regardless.

The problem is that if a user has previously had bad hardware and generated a bad cache, then even if they replace the hardware and are now capable of generating a correct cache (or would be capable of generating good proofs with downloaded cache), they won't detect that they are using a bad cache. That was what happened in the report here: #1264 (comment) — which was the primary motivation for #1264. That is the motivation for the acceptance criteria: 'A bad graph cache doesn't lead to failure if current hardware produces a correct graph. Instead, the bad cache is replaced by a newly-generated good one.'

I believe that #1 is primarily the one we're interested in because all reports I'm aware of are solved by either 1) Re-generating the parent cache (by deleting it), or 2) testing RAM and physically replacing the hardware when needed (or adjust speed settings, etc).

The problem is that even if bad hardware is detected and replaced, if an old bad cache is still used, bad proofs will still be generated.

At the very least, this condition needs to be detected. If bad hardware is the root cause and has not been replaced, the best we can do is fail — but we certainly should. If the underlying problem has been fixed, we should repair the cache and continue.

@cryptonemo
Copy link
Collaborator Author

I am not aware of disk corruption being a significant risk — although it's good if we protect against it.

We can agree to disagree that the requests that have been solved by re-downloading parameters or re-generating the parent cache is likely to be disk corruption. Those error(s) would certainly persist beyond that with bad RAM.

The problem is that if a user has previously had bad hardware and generated a bad cache, then even if they replace the hardware and are now capable of generating a correct cache (or would be capable of generating good proofs with downloaded cache), they won't detect that they are using a bad cache.

If a user replaced known bad hardware, the current recommendation here is still to regenerate the cache by deleting it, or enabling the verification afterward which would catch this error case.

As I said, I'm open to some "known good hash" list somewhere, but it doesn't fix anything, it reports something and then triggers the actual solution. Current best practice for known bad disk is to replace disk and re-download parameters/re-generate parameters. I don't see how replacing RAM is any different, as probably a lot more than the parent cache is already corrupt, including sealed sectors, etc.

In any case, an extra hash comparison is easy and can help trigger that (i.e. the check for the consistency check), but maintaining that list is potentially cumbersome.

@porcuquine
Copy link
Collaborator

I'm pausing the ongoing discussion, since we have largely resolved these issue offline.

Separately, cache keys need to include the porep_id — since if and when this changes, graphs seeds will. This is by design, and caches of previous graphs must not be used after such a change.

@cryptonemo
Copy link
Collaborator Author

I'm pausing the ongoing discussion, since we have largely resolved these issue offline.

Separately, cache keys need to include the porep_id — since if and when this changes, graphs seeds will. This is by design, and caches of previous graphs must not be used after such a change.

Fortunately, the porep_id appears to already be included in the current cache path/id, which will be automatically invalidated.

@porcuquine
Copy link
Collaborator

Fortunately, the porep_id appears to already be included in the current cache path/id, which will be automatically invalidated.

Ah right, good. Thanks for checking.

Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, assuming fully tested.


fn gen_graph_cache<Tree: 'static + MerkleTreeTrait>(
sector_size: usize,
challenge_count: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not affect the graph.

fn gen_graph_cache<Tree: 'static + MerkleTreeTrait>(
sector_size: usize,
challenge_count: usize,
layers: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not affect the graph.

let nodes = (sector_size / 32) as usize;
let drg_degree = filecoin_proofs::constants::DRG_DEGREE;
let expansion_degree = filecoin_proofs::constants::EXP_DEGREE;
let layer_challenges = LayerChallenges::new(layers, challenge_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider using a default value for layer_challenges if we need it only to instantiate SetupParams. If the value affects the generated graph, something unexpected is happening. If it doesn't, then it's misleading and unnecessary to pass it to gen_graph_cache. (However, if it's not awkward to do to so, maybe it's not a big deal. In that case, I would at least add a comment to the effect that these values don't affect the graph.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment and replaced with fixed values.

// filecoin-proofs-api:src/registry [porep_id()] that matches the specified
// sector size and must be updated when that value is updated for the proper
// graph cache generation/validation.
//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplication is annoying, though maybe the right tradeoff here/now.

.read()
.expect("LAYERS read failure")
.get(&sector_size)
.expect("unknown sector size") as usize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, we could dispense with challenge_count and layers since they don't affect the graph.

@@ -34,6 +35,7 @@ impl Default for Settings {
fn default() -> Self {
Settings {
maximize_caching: true,
verify_cache: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is cache verification? Defaulting to true is likely a MUCH better UX in many cases where this matters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may cost miners hours of time per sector, so instead of default to true, it should be advised to check when the error cases arise. My fastest machine can verify all sizes in 2.5 hours, which is more than is needed (if enabled), but almost all of the time is spent on the 32&64GiB verifications, so it'll add up quickly.

Copy link
Collaborator

@porcuquine porcuquine Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let's make sure to document clearly and with sufficient visibility that those who will benefit from this know when they need to enable.

Also, related to my suggestion elsewhere, I think we can hide all or most of this cost by generating the digest incrementally (in parallel with graph computation) while generating the cache?

If we can do that, we could afford to always check integrity on verification (best!). That would make the supplementary check here only truly relevant in the case of disk corruption, and a better candidate for being enabled only when debugging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, the more I think this optimization is important for maximizing usability of this integrity check. If it's intrusively expensive, people won't enable it by default, and we'll miss the assurance it provides. Unless it proves impossible to keep this fast (as above), I think the integrity check at generation time should be mandatory.

Copy link
Collaborator Author

@cryptonemo cryptonemo Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but I'm not sure how to do it given how the cache is created in parallel at the moment. Generation on my fastest machine takes closer to 24+ hours IIRC (and closer to 40 or so on my normal dev machine), I was referring to the verification only in the timing above (i.e. assuming you have files on disk, how long to verify they are good).

EDIT: The timings were not using release mode, which makes them an order of magnitude less reliable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cache-generation is already parallelized, then this won't help, but maybe with release mode it's not such an issue.

➜  filecoin-parents /usr/bin/time sha256sum v28-sdr-parent-d5500bc0dddadb609f867d94da1471ecbaac3fe6f8ac68a4705cebde04a765b8.cache
f6deb8b7804ad7327ab9eb86b6b66f7dbb7dfc370cac54e1b5267984105d3611  v28-sdr-parent-d5500bc0dddadb609f867d94da1471ecbaac3fe6f8ac68a4705cebde04a765b8.cache
27.56user 7.10system 0:34.73elapsed 99%CPU (0avgtext+0avgdata 3140maxresident)k
5272inputs+0outputs (1major+157minor)pagefaults 0swaps
➜  filecoin-parents ll v28-sdr-parent-d5500bc0dddadb609f867d94da1471ecbaac3fe6f8ac68a4705cebde04a765b8.cache
-rw-r--r-- 1 xxx xxx 56G Aug 28 00:36 v28-sdr-parent-d5500bc0dddadb609f867d94da1471ecbaac3fe6f8ac68a4705cebde04a765b8.cache

The above assumes a machine with fast SHA256, but I think that holds for miners who will be generating the parents cache anyway. ~30 seconds is not so bad.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran the verify on my mining rig for 32GiB (in release mode) and it took just under 32 seconds, so we seem to agree there.

.with_context(|| format!("could not mmap path={}", path.display()))?
};
hasher.update(&data);
drop(data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right to conclude dig's suggestion didn't turn out to be faster? (It seems not to have been taken in the current version.)

Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved before noticing that there is no final check after re-generating a bad cache, so I'm requesting changes to make sure my latest comment doesn't get missed.

@cryptonemo
Copy link
Collaborator Author

I approved before noticing that there is no final check after re-generating a bad cache, so I'm requesting changes to make sure my latest comment doesn't get missed.

Immediately after generating the graph, it's checked.

porcuquine
porcuquine previously approved these changes Sep 1, 2020
@porcuquine
Copy link
Collaborator

Immediately after generating the graph, it's checked.

I see, yes, thank you. We could reduce the time overhead by building up the digest in parallel, while the cache is being generated rather than waiting until the end. That would remove the cost of sequential hashing from the total generation time.

@cryptonemo
Copy link
Collaborator Author

@porcuquine I just pushed a change that only calculates the digest after generation if it's a production one that we recognize. Previously, it was calculated and wasn't compared to a known value in that case, so this fixes that (carried over from when it was always calculated and then persisted to disk).

@cryptonemo cryptonemo force-pushed the parent-cache-verify branch 2 times, most recently from d90521c to 63f41c5 Compare September 4, 2020 15:11
porcuquine
porcuquine previously approved these changes Sep 4, 2020
@porcuquine
Copy link
Collaborator

@cryptonemo As we discussed recently, we should also have an option to check integrity of proofs. When enabled, it should be impossible to generate or verify proofs except with the parameters published in parameters.json. Since there's so much overlap with this work, I think it may be easiest to track by adding it to this PR. Please confirm if you agree.

This should probably eventually be 'opt out', but the expedient path may be for it to initially be opt in. That might simplify any extra work required to ensure testing and dev workflows remain ergonomic — without blocking the current work on those considerations.

@cryptonemo
Copy link
Collaborator Author

@cryptonemo As we discussed recently, we should also have an option to check integrity of proofs. When enabled, it should be impossible to generate or verify proofs except with the parameters published in parameters.json. Since there's so much overlap with this work, I think it may be easiest to track by adding it to this PR. Please confirm if you agree.

This should probably eventually be 'opt out', but the expedient path may be for it to initially be opt in. That might simplify any extra work required to ensure testing and dev workflows remain ergonomic — without blocking the current work on those considerations.

Updated for early review, but not complete. This update moves parameters.json over to storage-proofs, to avoid duplication of it being used (now in storage-proofs, as well as filecoin-proofs). If this looks to be the right direction, I can add the (ideally one time per parameter) hashing verification, but that is not ready at the moment.

@porcuquine
Copy link
Collaborator

@cryptonemo If we do move parameters.json, this may need to be carefully coordinated, since nodes currently use it to download and check parameters. To avoid some risks, maybe we can add a symlink so both locations work. (It probably still makes sense to move in that case, but I could see the argument for leaving and linking only.)

Hash the parameters file in order to check if it matches the digest
given in the `parameters.json` file.
Hashing the parameter files can be expenive, hence hash them only once
for verification that they are good.
@vmx
Copy link
Contributor

vmx commented Sep 15, 2020

End once again a new range https://github.com/filecoin-project/rust-fil-proofs/pull/1265/files/04dfadb60a142a62bf7788c163cf33dafcfb4d04..a8584308c06a97ad50bc1b2d58dfb54c5de7b102 (I shouldn't have started that "link to commit ranges thing ;)

@vmx
Copy link
Contributor

vmx commented Sep 16, 2020

It's ready for review.

@cryptonemo
Copy link
Collaborator Author

It's ready for review.

Looks good, as discussed elsewhere. I can't approve my own PRs, so it's up to @porcuquine or @dignifiedquire now.

Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, almost there. A few things:

  • Has this been manually tested? (It should be, to ensure the parameter checking works with published params and fails without them.)
  • README needs update for both (and we should make sure Lotus is aware of the changes when released).
  • I think parameters.json currently lives in filecoin_proofs, with a symlink to it in storage_proofs. I think we want the opposite, right?

// The hash in the parameters file is truncated to 256 bits.
let digest_hex = &hash.to_hex()[..32];

ensure!(digest_hex == data.digest, "Parameters are invalid");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably log the file name, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's been manually tested as described by @vmx, but I've been using the check_parameters tool in fil_proofs_tooling, which is a direct way to check a specific parameter file.

@@ -0,0 +1 @@
../filecoin-proofs/parameters.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want the file to live in storage_proofs, with the symlink in filecoin_proofs. The idea is that there should only be dependencies from filecoin_proofs on storage_proofs and not vice versa. Does that match your understanding, @cryptonemo ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how this should be, or at least was. I had git mv'd it over, then added a symlink there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem backwards. That was the intent anyway, will try again to fix that.

@vmx
Copy link
Contributor

vmx commented Sep 16, 2020

I've tested it locally manually. If anyone wants to test it, just enable the verification via env var and run the tests.

@cryptonemo It would be cool if you could take this over the finish line. If you haven't until my tomorrow I will.

Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the groth parameter verification enough to see that it's not behaving as expected, and a quick look at the code showed some of the reasons.

My expectation: if verify_production_params is true but a file of the correct name with wrong contents exists, then attempts to use the parameters will fail with an error describing the cause of the problem (failed integrity check).

What actually happens (for me):

  • with a very short junk file: an error from trying to read/parse the file.
  • with a longer junk file: an (inaccurate) error about 'failure finding <FILENAME>'.

I think there are two broad issues causing this:

  1. the integrity check should come first (if requested), before trying to build the mapped parameters.
  2. when integrity check fails, this is not being distinguished from a 'cache miss', so we're falling back to generating parameters. Then an error is being reported because (in the case of the lifecycle tests I checked with) we choose not to generate parameters and instead return an error (here misidentified as a missing file). Instead, we really need to isolate and report the actual cause of the error. Even though the specific combination of constraints in our current code base doesn't make this catastrophic, it's definitely 'wrong', 'bad', and 'scary' to fall back to generating parameters in the case of a failed production-parameter integrity check.

@vmx
Copy link
Contributor

vmx commented Sep 17, 2020

I'll look into those.

Verify the parameters before mmapping them.
@cryptonemo
Copy link
Collaborator Author

1. the integrity check should come first (if requested), before trying to build the mapped parameters.

Strongly agree here, as that's what I had personally tested.

If parameter verfification is enabled, panic if the parameters are invalid.
@vmx
Copy link
Contributor

vmx commented Sep 17, 2020

@porcuquine ready for review again, all your comments should be addressed.

porcuquine
porcuquine previously approved these changes Sep 18, 2020
Copy link
Collaborator

@porcuquine porcuquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work now. The only thing thing left is that it doesn't distinguish between an absent parameter file and one that has the wrong content. So if the user simply doesn't have the required parameters at all, they will get a somewhat confusing InvalidParameters ('Parameter verification failed') message. This is probably fine, and the ideal solution might just be change of name/message rather than an even more fine-grained error. If we did go that route, maybe ValidParametersMissing would be true in both the absent and present-but-wrong cases.


Ok(mapped_params)
})?;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binding to params is not really necessary here (though I don't mind if you leave it).

pub sector_size: u64,
}

pub const PARAMETERS_DATA: &str = include_str!("../..//parameters.json");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the double slash really necessary here?

Ok(x) => Ok(x),
Err(_) => {
read_cached_params(&cache_path).or_else(|err| match err.downcast::<Error>() {
Ok(Error::InvalidParameters(msg)) => panic!("Parameter verification failed: {}", msg),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should return an error rather than panic here, same as with the parents cache.

@vmx
Copy link
Contributor

vmx commented Sep 18, 2020

@porcuquine With the most recent commit, when I run RUST_BACKTRACE=1 FIL_PROOFS_VERIFY_PRODUCTION_PARAMS=true RUST_LOG=info cargo run --bin benchy -- winning-post --size 2 I get:

If it is an invalid file (checksum doesn't match):

thread 'main' panicked at 'failed to generate winning post: Invalid parameters file: /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.params', fil-proofs-tooling/src/bin/benchy/winning_post.rs:81:40

If the file is missing:

thread 'main' panicked at 'failed to generate winning post: No cached parameters found for proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e [failure finding /var/tmp/filecoin-proof-parameters/v28-proof-of-spacetime-fallback-merkletree-poseidon_hasher-8-0-0-3ea05428c9d11689f23529cde32fd30aabd50f7d2c93657c1d3650bca3e8ea9e.params]', fil-proofs-tooling/src/bin/benchy/winning_post.rs:81:40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

integrity check for cached graphs
4 participants