POSH

Posh, the "Process Offload Shell'", is a shell and runtime that automatically reduces data movement when running shell pipelines on data stored in remote storage, such as NFS. Posh enables speedups for I/O heavy shell applications that access remote filesystems by pushing computation to proxy servers closer to the data. Posh uses annotations, metadata about individual shell commands (awk, cat, grep, etc.) to understand which files an arbitrary shell pipeline will access to schedule and execute the command across proxy servers, so the computation looks like it had been running locally. For more details, check out our research paper, POSH: A Data Aware Shell which will be published at Usenix ATC 2020. This implementation is a research prototype -- use at your own risk!

Posh includes a server binary that runs at a proxy server, which must have access to the same remote filesystem data that the client is trying to access.
Posh includes two client binaries, that intercept shell commands and schedule and execute them across the proxy servers:
- The first executes shell scripts (with one or more commands) over Posh
- The second starts a shell prompt and lets user type in individual commands.

General setup.

The client and server must communicate over a custom port, which can be configured in both the server and client binaries; the default is 1235. The server must keep this port open for TCP traffic.
The client and server binaries require a directory to store temporary output while processes are running.

Posh proxy server program

A proxy server must have access to one remote folder on behalf of a client. The proxy server could be a remote storage server itself (store this data locally) or even access this data over NFS or another remote filesystem protocol. Run the following at the proxy server:

$POSH_SRC/target/release/server 
    --folder <client_folder> # folder this Proxy provides access to, required
    --ip_address <ip_addr> # ip address of the client, required
    --runtime_port <runtime_port> # port server has open for all Posh communication, default = 1235
    --tmpfile <path/to/temporary/directory> # place for Posh to keep temporary output while running commands, required

Posh client program

The Posh client shell requires an annotations file and a configuration file to understand and schedule shell commands. Each section, linked contains further information about the information these files much contain.

To run the shell script binary, run:

$POSH_SRC/target/release/shell-exec
    <binary> # shell script to run over Posh, required
    --annotations_file <path> # path to annotations, required
    --mount_file <path> # path to config file, required
    --pwd <directory> # directory to execute this script from, required
    --tmpfile <path/to/temporary/directory> # place for Posh to keep temporary output while running commands, required
    --runtime_port <runtime_port> # port to communicate with server with, default = 1235
    --splitting_factor <splitting factor> # parallelization factor, default = 1
    --tracing_level <tracing_level> # log debug outpu†, default = none

To run the shell prompt binary, run:

$POSH_SRC/target/release/shell-client
    --annotations_file <path> # path to annotations, required
    --mount_file <path> # path to config file, required
    --tmpfile <path/to/temporary/directory> # place for Posh to keep temporary output while running commands, required
    --runtime_port <runtime_port> # port to communicate with server with, default = 1235
    --splitting_factor <splitting factor> # parallelization factor, default = 1
    --tracing_level <tracing_level> # log debug outpu†, default = none

Syntax allowed:
- Posh can accelerate commands with standard shell syntax, including pipes (|), and stdin, stdout and stderr redirections (<, >, 2>)
- Posh allows export commands (e.g. export VAR=VALUE) to configure environment variables within scripts
- We are working on including more standard syntax.

Client configuration file

A sample config file is provided in config/sample.config. To use Posh, edit the lines under mounts with your configuration information.
The config file has up to 3 parts. # 1 is required, while 2 and 3 are only necessary for experimental features.
1. [Required] A list of mounts, e.g. a list of IPs for proxy servers mapped to the corresponding client remote mounted directory, which must be an absolute path, for example:
```
  mounts:
      "255.255.255.0": "/home/user/remote_mount1"
      "255.255.255.1": "/home/user/remote_mount2"
```
2. [Optional] A list of rough link speeds between different proxies, where the client is included as a local proxy. This is used for an experimental scheduling algorithm. For example:
```
  links:
      "(255.255.255.0,client)": 500 # in Mbps
      "(255.255.255.1,client)": 500 # in Mbps
  ```
```
3. [Optional] A list of temporary file locations on each proxy server that Posh can write to.
```
tmp_directory:
        "255.255.255.1": "/tmp/posh"
```

Annotations

Sample annotations are provided in config/eval_annotations.txt
See ANNOTATIONS.md for more information on the annotation format and adding your own annotations.

Example usage

Here, we'll describe the end to end steps for running a program over Posh, for a simple pipeline that runs cat of files from two different NFS mounts, and pipes the output to grep. The proxy servers will run on each NFS mount directly.

Configure NFS at the client and servers so the two servers expose NFS mounts to the client. In this example, each NFS servers hosts the shared directory at /mnt/logs and the client mounts them at /home/user/mount1 and /home/user/mount2 respectively.

The data we'll be using is network access logs from the SEC's Edgar log dataset. Download two sample logs, one to each mount, from the edgar log website:

# (at the first NFS server)
wget http://www.sec.gov/dera/data/Public-EDGAR-log-file-data/2017/Qtr1/log20170314.zip 
unzip log20170314.zip
mv log20170314.csv /mnt/logs/log.csv

# (at the second NFS server)
wget http://www.sec.gov/dera/data/Public-EDGAR-log-file-data/2017/Qtr1/log20170315.zip 
unzip log20170315.zip
mv log20170315.csv /mnt/logs/log.csv

At each of the proxy servers, run the following command (substituting the client IP address):

$POSH_SRC/target/release/server  --folder /mnt/logs --ip_address $CLIENT_IP --tmpfile /tmp/posh

Configure the client configuration file as described in the above section to look like the following:

    mounts:
        "FIRST_SERVER_IP": "/home/user/mount1"
        "SECOND_SERVER_IP": "/home/user/mount2"

Run the following at the client:

cd /home/user
$POSH_SRC/target/release/shell-client --annotations_file $POSH_SRC/config/eval_annotations.txt --mount_file $POSH_SRC/config/sample.config

At the resulting prompt, type in:

cat mount1/log.csv mount2/log.csv | grep "127.0.0.1"

The result will show up faster than using bash, as Posh offloads a cat | grep command to run at each proxy server and just aggregates the output in the correct order.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
config		config
dash		dash
shell		shell
.gitignore		.gitignore
ANNOTATIONS.md		ANNOTATIONS.md
Cargo.toml		Cargo.toml
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POSH

Contents

Dependencies

Building Posh

Configuring and Running Posh

General setup.

Posh proxy server program

Posh client program

Client configuration file

Annotations

Example usage

About

Releases

Packages

Languages

deeptir18/posh

Folders and files

Latest commit

History

Repository files navigation

POSH

Contents

Dependencies

Building Posh

Configuring and Running Posh

General setup.

Posh proxy server program

Posh client program

Client configuration file

Annotations

Example usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages