-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[process] Create process to produce random values for RTL #2229
Comments
o i like the idea of a global package.. it would definitely make it easier for tapeout partners to just go in and swap everything out in their database without necessarily merging anything back into the repo. I think as part of topgen we could easily mark which module needs random seeds (and perhaps how many) and generate such a package. It would still be up to the module to use the one that is assigned to each (we could uniquely name them to make it easier for audits). @cdgori |
This is a good idea. You can resolve a ton of the issues Scott mentioned by returning a truncated HMAC-SHAx over the pool, where the HMAC key is the "usage" - some concatenation of the instance path (to solve the multiply-instantiated issue) and a name for the application of the randomness ("MUX_SELECT" or "SBOX_PERMUTATION_0" or whatever). We'll probably have to code up the relevant HMAC-SHAx in SV (ugh) or bind a known-good C version into SV space (better, I think). You'd probably still want to audit that the application strings were unique across the codebase for each call to this package, but the hash helps a lot, and the instance path probably diversifies it enough to fix any accidental collisions if your audit wasn't quite thorough enough. Lastly, you'll ultimately need a way to generate LARGE_POOL from another file/source of entropy as well (basically that becomes part of the chip build process). |
Doing a hash of the calling instance name (something like |
Yea. I think a more typical thing is a preprocessor step to generate the necessary values, so it doesn't have to be synthesizable. |
I also like the idea of a top-level package, and I think this could be nicely achieved with This also gets us around the synthesizability issue, but it requires that you rerun |
Something like that sounds good. Note that you also need to be able to extract some of the values (e.g. a global secret) to pass to the HSMs/other personalization infrastructure. So having things in separate preprocessor files (rather than created at synthesis time) is actually probably useful - I cannot immediately think of how you would extract a synthesis-time constant for these other uses. |
Adding @sriyerg comments from #3445 [Discussed offline with @tjaychen] We could do something like this: `ifdef SIMULATION
int unsigned FOO_LSFR_SEED = $urandom;
int unsigned BAR_LSFR_SEED = $urandom;
int unsigned BAZ_LSFR_SEED = $urandom;
`else
parameter FOO_LSFR_SEED = 32'hdeadbeef;
parameter BAR_LSFR_SEED = 32'h1;
parameter BAZ_LSFR_SEED = 32'h2;
`endif Note that for SIMULATION, these are variables as opposed to parameters, so that they can be set to a different value at runtime. This probably does raise "are we really simulating what we are synthesizing" question, so it will be nice to see opinions on whether we can actually get away with this. If we cannot, we can make them parameters for simulations as well - we still get the benefit of changing LFSR seed values on each nightly regression run. |
@tomroberts-lowrisc |
Hi everyone, |
Or is it perhaps purely a verification package? |
Another EDN-based proposal: the CSRNG has an option for a fully-deterministic mode (no physical entropy in the seed). At least at top level this could be used to distribute deterministic seeds to all the IP's. There are a few drawbacks however to this scheme:
|
This issue is more about "build time" entropy rather than "run time" entropy (choice of random constants, choice of LFSR taps, etc). You often need some randomization of parameters but they must be constrained (and sometimes secret, depending on the specifics). It does touch on the EDN in that the block-local LFSRs might be seeded from the EDN, but that's a bit orthogonal to the main issue. |
This sounds like a plan @tjaychen ;-) |
Yes this sounds good to me as well! |
Two more thoughts that just occurred to me:
|
The instance name extension sounds bad/problematic to me. (If the appended hex data is related to the keys, it's even worse because it means the netlist, and potentially all the logs of all the tools that processed the netlist, now contain sensitive information.) There's something to be worked out with how we handle these parameters in terms of certification secrecy as well. I have seen two approaches: 1) have a way to conditionally include one of two sets of params ("dev" and "prod" - though these terms are already overloaded in OT context so we'd have to choose something else), such that the "dev" params.vh file is in a publicly accessible tree and the "prod" params.vh is in a private repo or something like that. Or 2) use only one params.vh file but expect to "roll it forward" once in a private/offline context before generating the final netlist build. Both have issues with workflow management, but I've not seen any better solutions (yet). |
I think the idea of exposing them to a concentrated location is indeed to
accomplish your point 1 above.
Meaning we in the open will expose it to some hierarchy / package file, and
when a partner takes it to tapeout, they should be sure to replace EVERY
SINGLE value in that file.
And hopefully also keep their netlist secret :)
The name extension thing is a good point.. and maybe that would push us
towards using ports that we automatically hook-up at the top level.
It's probably something we can add to topgen and have it reference a
specific package (this would also solve the multi-instance issue I think).
…On Fri, Oct 2, 2020 at 11:42 AM Chris Gori ***@***.***> wrote:
The instance name extension sounds bad/problematic to me. (If the appended
hex data is related to the keys, it's even worse because it means the
netlist, and potentially all the logs of all the tools that processed the
netlist, now contain sensitive information.)
There's something to be worked out with how we handle these parameters in
terms of certification secrecy as well. I have seen two approaches: 1) have
a way to conditionally include one of two sets of params ("dev" and "prod"
- though these terms are already overloaded in OT context so we'd have to
choose something else), such that the "dev" params.vh file is in a publicly
accessible tree and the "prod" params.vh is in a private repo or something
like that. Or 2) use only one params.vh file but expect to "roll it
forward" once in a private/offline context before generating the final
netlist build. Both have issues with workflow management, but I've not seen
any better solutions (yet).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2229 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH2RSXXC4O53C4ZDJBBMMDSIYNKHANCNFSM4NAAJB3Q>
.
|
Obviously the netlist contains the secrets in some form and must be kept secret, it's just somewhat worse (to me) if the netlist trivially contains the keys in the instance names where you could grep for them vs. someone having to extract the logical structure from a particular submodel to determine a key from reset values or something like that. And then you have the log contamination issue, but probably to be safe we have to assume that anyway. |
Synth tools can modify a design name after elaboration. This can solve the issue of key values (parameters) appended to the cell name. See the change_names commands in DC. |
Arnon, but this random value propagation might end up applying to a ton of
modules though (for example every lfsr, possibly key manager, flash, otp
etc).
I feel like it would be easier / cleaner if you do via Michael's suggestion
and go through ports, and just update our automation scripts to reference
the right places.
This would require some slight changes for some modules too I guess.
…On Fri, Oct 2, 2020 at 12:21 PM arnonsha ***@***.***> wrote:
Synth tools can modify a design name after elaboration. This can solve the
issue of key values (parameters) appended to the cell name. See the
change_names commands in DC.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2229 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH2RSUGTF2X6GVLCDE7LZTSIYR25ANCNFSM4NAAJB3Q>
.
|
Ah, so if there are many such designs then Michael's way is indeed better. Of course as you said, the modules interface would have to be changed so support this. It is probably also more robust way: The change_names solution may lead to issues in the logical equivalence checks (mapping stage). |
How about the following solution (which is similar to what @sjgitty initially proposed): comportable IPs that require "global netlist entropy" can specify that in their hjson, e.g. as follows: {
name: "MY_MODULE",
clock_primary: "clk_i",
bus_device: "tlul",
regwidth: "32",
netlist_entropy_bits: "128",
#...
}
package netlist_entropy_pkg;
parameter int AvailableNetlistEntropyBits = 128;
parameter logic [AvailableNetlistEntropyBits-1:0] NETLIST_ENTROPY_BITS = 128'hdb3e8ab8236025c58cd9ffea85c8f878;
endpackage : netlist_entropy_pkg; In the top-level generated, topgen would then just cut out the correct unique slice for each IP instance that requires netlist entropy: my_module #(
// params
) u_my_module (
.clk_i,
.rst_ni,
.netlist_entropy_i (NETLIST_ENTROPY_BITS[<unique slice extracting 128bit>]),
// other ports
); The Also, we can add further conditions under which this package shall be generated, e.g.:
That way, a vendor can also choose to just exchange the WDYT? |
I think I would go beyond just entropy. I kind of feel like it should be any "constant" value we feel is worth replacing on a per-design basis. This could be hidden constants, entropy, IDs, keys or anything else.
Is there any reason to slice it up like this and not just have each be an individual parameter in some top level package?
The rest of this all sounds good to me. |
That's cool and sounds very good. One comment about this though, the indices into a global/shared NETLIST_ENTROPY_BITS can/might make this a little brittle. I kind of remember wanting the different entropy streams (maybe per-consumer) to be a bit "diversified" in their generation such that (for example) shuffling around consumer #3's entropy needs didn't impact the way the entropy was generated for consumer #11 - so we typically used the consumer name (or some other ID) as a key for the entropy. We might have had to do some other tricky stuff to get the separation but also reproducibility we wanted. You also want the property (which this would have) that changing the one "master seed" ripples through everything and causes a full refresh/update. So if we could make it an individual (per-consumer) parameter as @tjaychen says it might be safer? |
The idea behind using one single constant was just in case we (for some reason) wanted to manually override the file generated by topgen. In that case you could just dump a large pool of entropy, without having to care about the individual blocks (and whether more blocks are going to be added later on). But as you say that can be a bit brittle. So if we mostly rely on topgen to generate this file anyway, defining a per-consumer entropy constant sounds good |
regarding other important constants that are not in the category of "netlist entropy"... do you think we should use the same mechanism for those (passing in via ports), or do you think we could use @vogelpi's parameter exposure mechanism? |
i personally feel like they should follow your scheme also... because we can probably argue that we don't want important looking constants (arguably more important that initial entropy) to land in the name and somehow be exposed.. that's just a thought. What do you think? Btw, good point about slicing it that way and simplifying what someone has to generate... that's worth considering. If you imagine we end up with 100 entries, that someone has to replace those 100 entries (assuming they don't use topgen and want to do it manually). |
We had some internal discussion, and I believe the conclusion is that there are multiple ways to solve this, none of which are perfect. My 2c is that we just need to decide one and move on. My recommendation is to do the following:
Since this puts burden on topgen, would like to hear from @eunchan . Since this dashes one idea of DV for increased coverage, would like to hear from @sriyerg . And of course would like to hear from the usual suspects @tjaychen @msfschaffner @imphil @cdgori @vogelpi @martin-lueker et al Re LFSRs and other modules that might use said "compile-time random constants", if there was concern about synthesis recognizing these as compile-time constant in order to infer async-reset- or async-set-flops, you could code the LFSR to start in one value async, then move to the init seed in a synchronous manner like below. But I believe with parameters this shouldn't be an issue. ... parameter InitRandSeed = 32'hdeadbeef, ...
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
lfsr_q <= 32'h0;
end else if (lfsr_q == 32'h0) begin
lfsr_q <= InitRandSeed;
end else begin
lfsr_q <= lfsr_d;
end
end Last comment, if there is concern about the parameter name leaking information through synthesis by exposing the parameter value, there are some tricks in synthesis to avoid this. At a minimum we could use the following to limit the length, though I'll profess I don't know what it does to maintain uniquification. set hdlin_shorten_long_module_name true
set hdlin_module_name_limit 30 # default value is 256 Last last comment, re other constants besides these auto-gen'd topgen constants, they could reside in the same top package if we wanted them to, but I think that is a detail that we can decide after moving forward with whatever plan we choose for this. |
Parameter approach is fine with me. At least the builds will be different across subsequent nightly regressions so we are not losing out a lot (our regression tool is actually capable of merging the coverage with previous runs). If it does end up creating a coverage closure problem (I doubt it will), we can do multiple parallel builds and split the tests into different sets that uses each of those builds. |
Thanks @sjgitty that all sounds sensible to me. Agree there should be no special treatment required for parameters as async reset values. |
Thanks for the comprehensive writeup @sjgitty - I think this looks great. Calling this something like "compile random constant" seems about right (definitely want to avoid entropy in the name, it will be very confusing). I would vote against any terminology here with "seed" in the name since that has certain connotations that don't seem to apply here (i.e. using the value to generate more downstream values). |
Thanks @sjgitty , using parameters sounds good to me. The feature in topgen to pass parameters defined in the hjson into the IP top (Point 3 in your list) is already implemented and used for aes and otbn. What's missing is a way to drive the value from the top package (Point 6 in your list). Your proposal for shortening the module names should do the trick. If a module name is longer than the specified number of chars, the tool will print a warning, shorten the name and append an unique hash. What remains is to disable the warning to not pollute the logs. |
Thanks @sjgitty, this looks reasonable to me. |
Thanks all for providing feedback on this issue. I am closing this down now since PR #3946 has been merged. |
Random constants are sent through the hierarchy as parameters in-line with other OpenTitan modules. Further detail on this mechanism can be found in lowRISC/opentitan#2229 Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Random constants are sent through the hierarchy as parameters in-line with other OpenTitan modules. Further detail on this mechanism can be found in lowRISC/opentitan#2229 Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
RTL designs often use statically random values for items like initial seeds of LFSRs, private global keys, etc. It is important to have the randomness so that for example different LFSRs come up in different states from each other. It is important that they are static so that one can ensure that the value tested is legal and does not change from one simulation to the next. But due to the open source nature of the project, the drawback is that the random values are published.
The goal of this issue is to minimize the effort needed to swap out random values from one incarnation (FPGA, tapeout, etc) to the next so that each incarnation can be unique from the others, while still satisfying the above goals.
One example would be a global package that contained a large static pool of constant random values, as well as a method to return different width values starting from a static offset. Given something like this, each incarnation could have a different global package with the same API and different random content. An incomplete but representative example:
Then as a post-process one could compare all uses of the calls to ensure uniqueness. There are challenges here such as multiply-instantiated modules, ensuring proper use of the calls without overly burdening the design, making sure there were performance issues, etc.
CC: @msfschaffner @tjaychen @imphil
The text was updated successfully, but these errors were encountered: