Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cryptol RPC server sometimes hangs #1250

Closed
atomb opened this issue Jul 22, 2021 · 4 comments
Closed

Cryptol RPC server sometimes hangs #1250

atomb opened this issue Jul 22, 2021 · 4 comments
Labels
remote-api Related to Cryptol's remote API

Comments

@atomb
Copy link
Contributor

atomb commented Jul 22, 2021

We've seen mention of cases where the Cryptol RPC server can hang when loading Cryptol modules, but haven't been able to reproduce it locally yet.

@pnwamk, do you have some more detail you can fill in?

@pnwamk
Copy link
Contributor

pnwamk commented Jul 22, 2021

As it was described to me, the client hangs on c.load_module() after having successfully loading many previous modules in a script such as the following while the server is running in a docker container:

import cryptol
import os

# Steps to clone example repo and set up server:

# git clone https://github.com/GaloisInc/cryptol-specs.git
# git checkout rpc-load-module-test

# docker run --rm -ti \
# -v $PWD/Common:/cryptol/Common \
# -v $PWD/Primitive:/cryptol/Primitive \
# -v $PWD/McEliece_KEM:/cryptol/McEliece_KEM \
# -p 8080:8080 \
# -e CRYPTOLPATH='/cryptol' \
# ghcr.io/galoisinc/cryptol-remote-api:nightly


# After server is running, simply execute this script, e.g.:
# python3 load.py

c = cryptol.connect(url="http://localhost:8080", reset_server=True)

for root, dir, files in os.walk("."):
    for name in files:
        if name.endswith(".cry"):
            new_file = os.path.join(root, name)[2:-4].replace("/", "::")
            print("Loading ", new_file, flush=True)
            c.load_module(new_file).result()
            c.reset_server()

N.B., the above script and it's accompanying instructions don't quite work as is, as the cryptol-specs repository has some modules that fail to load at present.

@pnwamk
Copy link
Contributor

pnwamk commented Jul 23, 2021

This may be reproducing the error we've had reported. At least observationally it appears similar:

# Steps to clone example repo and set up server:

# git clone https://github.com/GaloisInc/BLST-Verification.git
# cd BLST-Verification
# git submodule update --init

# docker run --rm -ti \
# -v $PWD/spec:/cryptol/spec \
# -v $PWD/cryptol-specs:/cryptol/cryptol-specs \
# -p 8080:8080 \
# -e CRYPTOLPATH='/cryptol/spec:/cryptol/cryptol-specs' \
# ghcr.io/galoisinc/cryptol-remote-api:nightly


# After server is running, simply execute the following script from the `spec` directory, e.g.:
# cd spec
# curl -o load.py https://gist.githubusercontent.com/pnwamk/e2173de4e1d8c69e389cdb9680a80915/raw/dc0a3f4006bd0eab84f9dde4246baf44f106a4ce/load.py
# python3 load.py
import cryptol
import os

c = cryptol.connect(url="http://localhost:8080", reset_server=True)

for root, dir, files in os.walk("."):
    for name in files:
        if name.endswith(".cry"):
            new_file = os.path.join(root, name)[2:-4].replace("/", "::")
            print(f'Loading {new_file}... ', flush=True, end='')
            c.load_module(new_file).result()
            print("done!", flush=True)
            c.reset_server()

It happily loads module after module until it gets to keygen_implementation, at which point it hangs. If you change the script so that, instead of loading modules in a loop, it only loads that file, i.e.:

c.load_module('keygen_implementation').result()

then you immediately get an error reported from the server regarding an error that occurred while loading:

$ python3 load.py
Traceback (most recent call last):
  File "/Users/andrew/Repos/GRINDSTONE/client-debug/BLST-Verification/spec/load.py", line 24, in <module>
    c.load_module('keygen_implementation').result()
  File "/usr/local/lib/python3.9/site-packages/argo_client/interaction.py", line 168, in result
    return self.process_result(self._result_and_state_and_out_err()[0])
  File "/usr/local/lib/python3.9/site-packages/argo_client/interaction.py", line 153, in _result_and_state_and_out_err
    raise self.process_error(exception)
argo_client.interaction.ArgoException: [error] :1:1--1:1:
    File name does not match module name:
    Saw: Main
    Expected: keygen_implementation

So, my currently hypothesis is that something exceptional is happening in the server but for some reason the error sometimes isn't properly getting reported/received/etc.

Will have to dig a little deeper.

Semi-related: some updates to both the cryptol server and client are about to land which should hopefully make it easier to narrow these kinds of errors without necessarily rebuilding the server to add log statements. (Well we'll see if it helps in this particular case I guess...)

pnwamk pushed a commit that referenced this issue Jul 28, 2021
Some difficult to reproduce bugs regarding the RPC server hanging
(#1250) seemed to be
related to the type checker solver (Z3) hanging after periods of
extended use. I.e., a module would load/error quickly normally,
but if preceded by enough prior solver/server activity, Z3
would consistently hang. To avoid this, we can simply reset the
solver and reload the type checker prelude when the server is asked
to load a file or module. This solves the Z3 hang issues in observed
cases and requires no additional forking (which we're trying to
minimize when possible).
pnwamk pushed a commit that referenced this issue Jul 28, 2021
* fix(rpc): reset TC solver on file/module load

Some difficult to reproduce bugs regarding the RPC server hanging
(#1250) seemed to be
related to the type checker solver (Z3) hanging after periods of
extended use. I.e., a module would load/error quickly normally,
but if preceded by enough prior solver/server activity, Z3
would consistently hang. To avoid this, we can simply reset the
solver and reload the type checker prelude when the server is asked
to load a file or module. This solves the Z3 hang issues in observed
cases and requires no additional forking (which we're trying to
minimize when possible).
@robdockins robdockins added the remote-api Related to Cryptol's remote API label Jul 28, 2021
@pnwamk
Copy link
Contributor

pnwamk commented Jul 29, 2021

I believe we confirmed in all the observed cases where this was occurring, it was related to SMT solvers hanging on the server side. #1258 appears to effectively help mitigate this.

@pnwamk pnwamk closed this as completed Jul 29, 2021
@weaversa
Copy link
Collaborator

Thank you @pnwamk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
remote-api Related to Cryptol's remote API
Projects
None yet
Development

No branches or pull requests

4 participants