Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(quil-py): support extern call instructions #394

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

erichulburd
Copy link
Collaborator

@erichulburd erichulburd commented Aug 29, 2024

Review Guidance

High-level overview

This introduces support for CALL and PRAGMA EXTERN instructions. The former is supported directly as an Instruction and I've introduced a ReservedPragma instruction to accommodate the latter (see source code comment below for discussion on this particular choice).

There are three main aspects to this support:

  • Parsing of call in crate::parser::command::parse_call and parsing of ExternSignature in crate::parser::pragma_extern.
  • Rust representation of these instructions in crate::instruction::extern_call.
  • Resolution of memory accesses in crate::instruction::extern_call and crate::program::Program::get_memory_accesses.

This functionality was also ported to Python in quil-py/src/instruction/extern_call.rs.

Public API Changes

There are two breaking public API changes from adding EXTERN / CALL support:

  1. Two One new enum variant on Instruction: Call and ReservedPragma (see source code comment below for the choice of ReservedPragma).
  2. Instruction::get_memory_accesses is fallible now. This reflects the fact that a CALL instruction cannot know its memory accesses until it has been resolved to an ExternSignature (ie it has to know the mutability of its different arguments.

quil-rs/src/instruction/pragma.rs Outdated Show resolved Hide resolved
/// The name of the call instruction. This must be a valid user identifier.
pub name: String,
/// The arguments of the call instruction.
pub arguments: CallArguments,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered three alternatives here:

  1. Just have a Vec<CallArgument> where each CallArgument has a resolution: Option<Resolution> attribute.
  2. Do not mutate the CallArguments and instead return a struct that represents the resolution, roughly of the structure ("name", instruction_index) => Vec (we need the index to refer to index of the CALL instruction within the program).
  3. Add a type parameter to the CALL instructions and then to the program itself. Resolution would then return a program of a different type (eg into_call_resolved_program(self) -> Result<Program<ResolveCallArgument>, ProgramError>).

I landed on the de facto implementation because:

  1. Within a given instruction, call arguments are all resolved at the same time. Either all arguments are resolved or none are.
  2. There are existing patterns within Quil that mutate the program, such as resolve_placeholders.
  3. Tracking instruction indices seems fairly unergonomic and brittle.
  4. I did not want to add type complexity to the Program struct which is pretty easy to use.
  5. I wanted to avoid the user resolving the program more than once for efficiency's sake.

I definitely feel like this resolution functionality belongs in quil-rs and not separately in downstream compilers. After going back and forth, I think this is the right implementation, but am open to input here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a thorny question! I agree that alternatives #2 and (to a lesser extent) #1 are inferior, but I think there's a lot to be said for #3. I've talked to you about this offline, but to expand here:

I think this raises the question of if we've hit the point where we want to separate the parsed AST from the "typechecked"/"resolved" AST. I think there's a lot of merit to doing so, as it allows for capturing invariants much more cleanly. But I'm not sure if this PR is the place to do it.

I think I might favor a design where we parameterize all our types by the stage of compilation, passing that down to where we need to make a decision, and default that type parameter to the stage that corresponds to the existing situation. Something like the following:

pub trait Stage {
    type CallArgument;
}

pub enum Parsed;

impl Stage for Parsed {
    type CallArgument = UnresolvedCallArgument;
}

pub enum Resolved;

impl Stage for Resolved {
    type CallArgument = ResolvedCallArgument;
}

pub struct Program<S: Stage = Resolved> {
    // …
    instructions: Instruction<S>,
    // …
}

pub enum Instruction<S: Stage = Resolved> {
    // …
    Call(Call<S>),
    // …
}

pub struct Call<S: Stage = Resolved> {
    pub name: String,
    pub arguments: Vec<S::CallArgument>,
}

The upside to this is that we can bundle all the types we need to parameterize by together; the downside is the extra trait. I think the upside is likely to be worth it, but it's not obvious.

Another limitation of this approach is that it forces the ASTs to be almost identical. This can be good or bad. Another approach would be to have

pub trait Stage {
    type AST;
}

even if Parsed::AST = Program<α> and Resolved::AST = Program<β> for now.

We can also hide some of the complexity by having the parser return a Program<Parsed>, a function fn resolve(parsed: Program<Parsed>) -> Result<Program<Resolved>, ResolutionError>, and then exposing at the top level a function that simply combines the two, returning a Result<Program, …>, so the user is not confronted with this new API.

That said: this adds extra complexity! I think that may be worth it, but it's not obvious. This mutate-and-resolve approach is not a bad one, and it might even be the implementation behind the more complex version I outline above.

Copy link
Collaborator Author

@erichulburd erichulburd Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spoke with Kalan about this. We landed somewhere around here:

  • Pragma doesn't need to be an enum. We can add something like extern_definitions: HashMap<String, ExternDefinition> to Program.
  • struct Call will only have arguments: Vec<UnresolvedCallArgument>. Call.resolve(...) returns Result<Vec<ResolvedCallArgument>, ...>. This will be a public function for the purposes of translating.

There's a bit of inefficiency here WRT resolving in get_memory_accesses (still fallible) and then resolving for translation. I may think a bit more about that, but otherwise, this seems tenable to me.

Copy link
Contributor

@antalsz antalsz Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the idea here that resolution doesn't need to produce a new Program, just store the extern_definitions? And then any processing that needs to consume a resolved Call will just call .resolve in situ when necessary? I think this seems fine, if a bit of kicking the can down the road wrt a new AST – but as I said above, this PR is likely not the right place for that anyway, so I don't think that's an issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, you got it. Here I'm just going for consistency with the existing implementation but I'm onboard with your general vision for generating a separate "validated and resolved" AST. We'll keep the conversation going.

quil-py/quil/instructions/__init__.pyi Outdated Show resolved Hide resolved
quil-py/quil/program/__init__.pyi Outdated Show resolved Hide resolved
Comment on lines +104 to +128
fn has_return_or_parameters(&self) -> bool {
self.return_type.is_some() || !self.parameters.is_empty()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to make this required in the (fallible) constructor and make the fields private?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes on the fallible constructor. Private fields present a couple issues:

  • The Python package uses this py_wrap_data_struct! macro that requires the fields to be public.
  • Most other instructions in quil-rs have fully public fields.

There's not really precedent to create an intermediate in quil-py with public fields that then converts to a quil-rs Call, so I hesitate to introduce one here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with having public fields is that you can no longer count on the constructor guaranteeing validity, because a user can simply do

let mut sig = ExternSignature::try_new(Some(cool_type), vec![cool_parameter]).unwrap();
sig.return_type = None;
sig.parameters = vec![];

and now your sig is invalid.

quil-rs/src/instruction/extern_call.rs Outdated Show resolved Hide resolved
Comment on lines 219 to 220
/// The extern definition has a signature but it lacks a return or parameters.
#[error("extern definition {0} has a signature but it lacks a return or parameters")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What signature can it have, then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd part of the spec, which basically requires that the CALL has at least one argument. I get that the intent is to require that statefulness is maintained in the user memory space, however, this seems a futile requirement to try to enforce because a user could easily workaround it:

DECLARE useless_bit BIT[1]
PRAGMA EXTERN foo "(ignored : BIT[1])"
CALL foo useless_bit[0] # the bit is ignored and the function mutates state

I'm partial to dropping this requirement in the implementation as well as the spec.

quil-rs/src/instruction/pragma.rs Outdated Show resolved Hide resolved
quil-rs/src/parser/command.rs Outdated Show resolved Hide resolved
quil-rs/src/parser/command.rs Outdated Show resolved Hide resolved
quil-rs/src/program/mod.rs Outdated Show resolved Hide resolved
/// The name of the call instruction. This must be a valid user identifier.
pub name: String,
/// The arguments of the call instruction.
pub arguments: CallArguments,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a thorny question! I agree that alternatives #2 and (to a lesser extent) #1 are inferior, but I think there's a lot to be said for #3. I've talked to you about this offline, but to expand here:

I think this raises the question of if we've hit the point where we want to separate the parsed AST from the "typechecked"/"resolved" AST. I think there's a lot of merit to doing so, as it allows for capturing invariants much more cleanly. But I'm not sure if this PR is the place to do it.

I think I might favor a design where we parameterize all our types by the stage of compilation, passing that down to where we need to make a decision, and default that type parameter to the stage that corresponds to the existing situation. Something like the following:

pub trait Stage {
    type CallArgument;
}

pub enum Parsed;

impl Stage for Parsed {
    type CallArgument = UnresolvedCallArgument;
}

pub enum Resolved;

impl Stage for Resolved {
    type CallArgument = ResolvedCallArgument;
}

pub struct Program<S: Stage = Resolved> {
    // …
    instructions: Instruction<S>,
    // …
}

pub enum Instruction<S: Stage = Resolved> {
    // …
    Call(Call<S>),
    // …
}

pub struct Call<S: Stage = Resolved> {
    pub name: String,
    pub arguments: Vec<S::CallArgument>,
}

The upside to this is that we can bundle all the types we need to parameterize by together; the downside is the extra trait. I think the upside is likely to be worth it, but it's not obvious.

Another limitation of this approach is that it forces the ASTs to be almost identical. This can be good or bad. Another approach would be to have

pub trait Stage {
    type AST;
}

even if Parsed::AST = Program<α> and Resolved::AST = Program<β> for now.

We can also hide some of the complexity by having the parser return a Program<Parsed>, a function fn resolve(parsed: Program<Parsed>) -> Result<Program<Resolved>, ResolutionError>, and then exposing at the top level a function that simply combines the two, returning a Result<Program, …>, so the user is not confronted with this new API.

That said: this adds extra complexity! I think that may be worth it, but it's not obvious. This mutate-and-resolve approach is not a bad one, and it might even be the implementation behind the more complex version I outline above.

@erichulburd
Copy link
Collaborator Author

then exposing at the top level a function that simply combines the two, returning a Result<Program, …>, so the user is not confronted with this new API.

The one caveat here is program mutation (either mutating a parsed program or just some Program::new()). If the user deletes an EXTERN pragma, or adds another call, then our resolution is invalid. So we'll need to either expose the resolution method to the user, or re-resolve after every relevant mutation (less efficient and more complex, but also could help us guarantee correctness opaquely). My knee-jerk preference is the former option.

Copy link
Contributor

@antalsz antalsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overall really like this representation, and I think it's a big improvement. I have some questions around exactly where we perform resolution, and I think you may be able to punt some (or most?) of them away with "we'll figure this out when we figure out what broader program checking looks like". I also have various specific comments.

One general note: I wonder if we should provide a way to trim out unused PRAGMA EXTERNs? I believe quil-rs provides that for unused calibrations etc. from the CLI; we should add removing unused PRAGMA EXTERNs to that, I think.

"""An argument to a ``Call`` instruction.

This may be expressed as an identifier, a memory reference, or an immediate value. Memory references and identifiers require a corresponding memory region declaration by the time of
compilation (at the time of call argument resolution and memory graph construction to be more precise).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compilation (at the time of call argument resolution and memory graph construction to be more precise).
compilation (at the time of call argument resolution and memory graph construction, to be more precise).

...

class Call:
"""An instruction to an external function declared within a `PRAGMA EXTERN` instruction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""An instruction to an external function declared within a `PRAGMA EXTERN` instruction.
"""An instruction that calls an external function declared with a `PRAGMA EXTERN` instruction.

Comment on lines +28 to +39
/// A parameter type within an extern signature.
#[derive(Clone, Debug, PartialEq, Hash, Eq)]
pub enum ExternParameterType {
/// A scalar parameter, which may accept a memory reference or immediate value.
Scalar(ScalarType),
/// A fixed-length vector, which must accept a memory region name of the appropriate
/// length and data type.
FixedLengthVector(Vector),
/// A variable-length vector, which must accept a memory region name of the appropriate
/// data type.
VariableLengthVector(ScalarType),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would be clarified if you gave examples of the syntax that corresponded to them

ExternParameterType::FixedLengthVector(value) => value.write(f, fall_back_to_debug),
ExternParameterType::VariableLengthVector(value) => {
value.write(f, fall_back_to_debug)?;
write!(f, "[]").map_err(Into::into)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? calls into (well, from) for you, so this can just be

Suggested change
write!(f, "[]").map_err(Into::into)
write!(f, "[]")?

Comment on lines +70 to +71
/// Create a new extern parameter. This validates the parameter name as a user
/// identifier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create a new extern parameter. This validates the parameter name as a user
/// identifier.
/// Create a new extern parameter. This will fail if the parameter name is not a valid user
/// identifier.

/// Call instructions are of the form:
/// `CALL @ms{Identifier} @rep[:min 1]{@group{@ms{Identifier} @alt @ms{Memory Reference} @alt @ms{Complex}}}`
///
/// For additional detail, see the ["Call" in the Quil specification](https://github.com/quil-lang/quil/blob/7f532c7cdde9f51eae6abe7408cc868fba9f91f6/specgen/spec/sec-other.s).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// For additional detail, see the ["Call" in the Quil specification](https://github.com/quil-lang/quil/blob/7f532c7cdde9f51eae6abe7408cc868fba9f91f6/specgen/spec/sec-other.s).
/// For additional detail, see ["Call" in the Quil specification](https://github.com/quil-lang/quil/blob/7f532c7cdde9f51eae6abe7408cc868fba9f91f6/specgen/spec/sec-other.s).

))(input)
}

fn parse_immediate_value(input: ParserInput) -> InternalParserResult<Complex64> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this is the first time we've had to parse complex numbers makes me realize – what Quil type will accept a complex literal when used as an argument to a CALL instruction?

@@ -92,10 +93,23 @@ macro_rules! set_from_memory_references {
};
}

#[derive(thiserror::Error, Debug, PartialEq)]
pub enum MemoryAccessesError {
#[error("must be able to resolve call to an extern signature: {0}")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this description of the error confusing. Would #[error(transparent)] work here?

Comment on lines +106 to +107
/// Note, this may fail if the program contains [`Instruction::Call`] instructions that cannot
/// be resolved to the appropriate [`crate::instruction::ExternSignature`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Note, this may fail if the program contains [`Instruction::Call`] instructions that cannot
/// be resolved to the appropriate [`crate::instruction::ExternSignature`].
/// This will fail if the program contains [`Instruction::Call`] instructions that cannot
/// be resolved against a signature in the provided [`ExternSignatureMap`] (either because
/// they call functions that don't appear in the map or because the types of the parameters
/// are wrong).

Comment on lines +198 to +206
Instruction::Pragma(pragma) if pragma.name == RESERVED_PRAGMA_EXTERN => {
self.extern_pragma_map.0.insert(
match pragma.arguments.first() {
Some(PragmaArgument::Identifier(name)) => Some(name.clone()),
_ => None,
},
pragma,
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see – the use of ExternPragmaMap is so that adding an instruction to the program is infallible. I'm still not convinced by the type itself, but I see the logic behind doing it this way. If we want to have ExternPragmaMap instead of the raw IndexMap, I'd make it a little more opaque and put all this logic into it, and I'd clarify what happens on duplicate EXTERN names in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants