It proxy requests from http to an executable which is supposed to write response in a socket.
Splitting the above sentence:
- an http service that: 1. catch REST payload, 2. assign a uuid req_id
- (not provided!) an executable that: 1. accept a json payload, 2. write something in a unix socket (a file)
- an http service listening on the unix socket and match the req_id and send back to the caller
Why not just nginx? This example should become a crate, a library that can be used together with other crate, maybe consuming messages from a message broker, in a number of way.
The plan is to provides a number of way to re-use existing code, but over a stable async green thread model: so more urocket-xyz-stage(s).
sequenceDiagram
Participant C as Client
Participant U as URocket
Participant P as Internal Process
C->U: write REST payload on http
activate U
U->>P: spawn(payload, req_id) process
activate P
P->>U: write(payback, req_id) on socket /tmp/urocket.sock
deactivate P
Note right of U: match req_id
U->>C: send payback
deactivate U
Note left of C: Our client is happy now
Quick example:
- Open 3 command line shell:
sh-serv
,sh-front
,sh-back
- in
sh-serv
runcargo run -- -c examples/urocket-service.yaml
- in
sh-front
runcurl -X POST localhost:8080/get/pets -H'Content-type: application/json' -d'{"payload":"low"}'
(this shell is blocked) - read the message in
sh-serv
shell, something likeI stored the reqid :: 64ed1763-9ca9-4b95-b47f-75c1318b3462
- in
sh-back
shell runcurl -X POST --unix-socket /tmp/urocket.sock http://internal/urhttp/something -H'Content-type: application/json' -d'{"hello":true}'
(and still the shell insh-front
is blocked) - again in
sh-back
shell runcurl -X POST --unix-socket /tmp/urocket.sock http://internal/urhttp/64ed1763-9ca9-4b95-b47f-75c1318b3462 -H'Content-type: application/json' -d'{"hello":true}'
- the shell in
sh-front
is unblocked, and it received the message{"hello":true}
This code aims to handle req_id generation and matching, process spawn, timeout, exceptional case, logging, ... whatever is needed to make it stable enough to be used on production.
The name urocket-stage-http. During the launch of a rocket on the space, at some point the rocket split in 2 parts and the first stage is missed. This happens when the rocket is on the cloud. I would call this "usefull rocket stage".
Nothing to do with rust rocket, in fact the code use hyper since it needs minimal staff.
Openapi reference:
Scripting language are supported by specifying the executable
The openapi is used as it is, without any change, example:
paths:
/pets:
post:
tags:
- pet
summary: Add a new pet to the store
description: Add a new pet to the store
operationId: addPet
requestBody:
description: Create a new pet in the store
content:
application/json:
schema:
$ref: '#/components/schemas/Pet'
application/xml:
schema:
$ref: '#/components/schemas/Pet'
application/x-www-form-urlencoded:
schema:
$ref: '#/components/schemas/Pet'
required: true
The service attach the backend callback using its own configuration file, urocket-service.yaml
,
that replicate the paths.uri.method
schema to define each uri.method
callback infos:
paths:
/get/pets:
get:
validatein: false
inject:
wd: /src/scripts/php
env:
- MYENV=CI
cmd: /usr/bin/echo {{jsonpayload}}
timeout: 1000
channel: "cmdline"
encoding: json
logstdout: true
validate-out: false
This will execute /usr/bin/echo
on work dir defined in wd
, with env ... see below for the
details.
PHP use the socket for reply, i.e. libcurl:
$reqid = $_ENV["REQUEST_ID"];
$ch = curl_init();
// $postdata = json_encode($data); // typically
$postdata = '{"handler": "pricechange", "pricelist": [{"articlenr":"12312", ...}]}';
curl_setopt($ch, CURLOPT_UNIX_SOCKET_PATH, "/tmp/u_rock.sock");
curl_setopt($ch, CURLOPT_URL, "http://doesnotmatt.er/$reqid");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
$result = curl_exec($ch);
curl_close($ch);
print_r ($result);
The intent is to quickly port with minimal changes. Typical use case are those of scripts launched by a cronjob, without any control, and that may be quickly ported by just ppanding 5/5 lines at the end of the scripts. Also the plan is to define an automatic answer for "success", and output report.
Why not Apache + mod_php:
- because output is returned to the caller.
- security model of mod_php is generally stricter than a PHP cli script
Is a dummy hostname required?
curl --unix-socket /var/run/docker.sock http://localhost/images/json
or:
curl --unix-socket /var/run/docker.sock http:/images/json
see: https://superuser.com/a/925610
cURL 7.50 and up requires a valid URL to be provided, including a hostname, so to run the above examples with cURL 7.50, a "dummy" hostname has to be added
A message type is identified by (path, verb), as defined in OpenAPI definition paths.[path].[verb]
.
There are 4 distinct stage for a message type:
- incoming: defined as http verb + payload
- transported-in: defined as process execution env
- transported-out: defined by ipc channel
- outgoing: defined as http verb and payload
Stages 1. and 4. can add a layer of validation for payload, incoming and/or outgoing, the validation is based on Open API definition.
For 2. and 3. : both are a map between "http-path+verb":
paths:
"get/pets":
get:
validate-in: false
inject: {{ process-env }}
logstdout: true | false
validate-out: false
post: ...
in: {{ process-env }}
out: {{ ipc-channel }}
Note on logstdout: the service should be able to log stdout of the script. This can be supported by specifying special header in incoming http request (http header), or by other means, TBD.
The process is started with these env variables settled:
URIPATH=/path/in/request/uri
REQUEST_ID={unique request id used to match the result}
wd: /path/to/wd
env: [string]
cmd: command_line {{jsonpayload}} otherparam
channel: cmdline | stdin | ...
encoding: json
If "channel: cmdline" then payload is passed as escaped commandline argument, i.e.:
[cmd] '{"my": "json", "payload": "et cetera", "et": true, "cetera": false}'
If "channel: stdin" then payload pass through the stdin
The socket filepath is defined in the configuration file
Say we call the (optional) domain internal
then options for curl is:
--unix-socket /tmp/urocket.sock http://internal/urhttp/{req_id}
{req_id}
must be replaced with the matching request id, ie.:
- the process read
ENV["REQUEST_ID"]
from environment - the process write reply payload in
http://internal/urhttp/$ENV["REQUEST_ID]
This is a fixed uri matched.
Note: backserv just remove /urhttp/
and take the rest as req_id. (see todo)
The service works but it is too rigid. Code can not be splitted and reused in a productive way. For example processcontroller by itself can be a staff to be reused. And so for other parts.
frontserv
and backserv
are the most service-specific staff.
toktor crate should be splitted and moved as indipendent, once it is more idiomatic.
TODOs:
RequestVisor use Arbiter and replace it in frontserv/backservProcessController match config and RestMessage to create a processProcessController manage timeout (to kill and give back to arbiter)ProcessController Timeouts maybe pinned to remove on process exit (??)- ProcessController should log something.
- Log tracker: select something buffered.
RequestVisor accept ProcessController- Add an ErrorXXX struct. Fix all
.unwrap()
s
Instead of tokio, std::process::Command
is used, this make possible to handle specific process information, and avoid zombie (it does not need tini, rif, https://github.com/krallin/tini)
Also wait4()
is called on each process (using https://crates.io/crates/wait4).
ResUse is stored in ProcessInfos, at least for 2 seconds after the end of the process, together with StdOut and StdErr. If requested it is returned over a mpsc channel, as Option (if requested too late it is just None).
Also it can be desirable to have monitoring staff
This https://docs.rs/tokio-metrics/latest/tokio_metrics/ monitors the internal tasks, but child spawned are under control of the operating system.
The only way to access this infos is by procps: https://docs.rs/procfs/latest/procfs/process/struct.Stat.html
For stats: inspect procps before calling wait() in https://doc.rust-lang.org/std/process/struct.Child.html
For monitoring use procps periodically.
Stats is still not implemented. ResUse is collected but requestvisor is not asking for stats.
This should be controlled by configuration file:
paths:
/get/pets:
get:
validatein: false
inject:
wd: /src/scripts/php
env:
- MYENV=CI
cmd: /usr/bin/echo {{jsonpayload}}
timeout: 1000
channel: "cmdline"
encoding: json
## this would add http header urocket-stats: {time: 1.2, mem: 123123, ...}
stats: true
logstdout: true
validate-out: false
When is stats: true
the response header should contains something like:
urocket-stats: ResourceUsage { utime: 1.886ms, stime: 0ns, maxrss: 5505024 }
Some more info should be collected, maybe stdout len, stderr len, start time, and so on.
One can think about stats write in file, or stat printout also.
A more advanced (and maybe simpler) way to limit/control resource of the spawned process would be by using cgroups ns https://crates.io/crates/cgroups-rs
It would be simpler because the only limit to control from urocket became the timeout for the process. Other resources would be limited by the os.
The aim of this project is a kind of "it works on my machines", meaning not working anywhere else but on some targets system, typically are those supported by Kubernetes, and configured with cgroups v2 (hierarchical)
And simpler is better, but a kind of ProcessDriver should be writed ad hoc, that plugs well in tokio async.
https://docs.rs/cgroups-rs/0.3.4/cgroups_rs/cgroup_builder/index.html
Looking at this, the options provided by cgroups_rs is:
- create the control group
- create the
Child
process - insert the created
Child
into the created controlgroup
Anyway this sequence should be quick enough.
Another option is to use https://docs.rs/controlgroup/latest/controlgroup/ that look less
maintained, or implement trait that extends std::process::Command
, but I do not think it changes too much.
Also things get complicated with hierarchical cgroups and the option to insert the process::id() into a main cgroups, then create cgroups based on a kind policy