Skip to content

tlspyo - secure transfer of python objects over network

License

Notifications You must be signed in to change notification settings

MISTLab/tls-python-object

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tls-python-object (tlspyo)

💻 🌐 💻

A library for easy and secure transfer of python objects over network.

Python package Documentation Status

🚀 Quickstart guide 📜 API documentation

tlspyo provides a simple API to transfer python objects in a robust and safe way via TLS, between several machines (and/or processes) called Endpoints.

  • Endpoints are part of one to several groups,
  • Arbitrarily many Endpoints connect together via a central Relay,
  • Each Endpoint can broadcast or produce python objects to the desired groups.

ℹ️ Please carefully read the Security section before using tlspyo anywhere other than your own secure private network.

Quick links

Principle

tlspyo provides two classes: Relay and Endpoint.

  • The Relay is the center point of all communication between Endpoints,
  • An Endpoint is a node in your network. It connects to the Relay and is part of one to several groups.

Endpoints can do a multitude of things, including:

  • broadcast python objects to whole groups of Endpoints,
  • retrieve the objects broadcast to the group(s) it is part of,
  • produce a single object that will be consumed by a single Endpoint of a target group,
  • notify the Relay that it is ready to consume a produced object and wait until it receives it.

By default, tlspyo relies on Transport Layer Security (TLS) to secure object transfers over network.

Example usage

from tlspyo import Relay, Endpoint

if __name__ == "__main__":

    # Create a relay to allow connectivity between endpoints

    re = Relay(
        port=3000,  # this must be the same on your Relay and Endpoints
        password="VerySecurePassword",  # must be the same on Relay and Endpoints, AND be strong
        local_com_port=3001  # needs to be non-overlapping if Relays/Endpoints are on the same machine
    )

    # Create an Endpoint in group "producers" (arbitrary name)

    prod = Endpoint(
        ip_server='127.0.0.1',  # IP of the Relay (here: localhost)
        port=3000,  # must be same port as the Relay
        password="VerySecurePassword",  # must be same (strong) password as the Relay
        groups="producers",  # this endpoint is part of the group "producers"
        local_com_port=3002
    )

    # Create a bunch of other Endpoints in group "consumers" (arbitrary name)

    cons_1 = Endpoint(
        ip_server='127.0.0.1',
        port=3000,
        password="VerySecurePassword",
        groups="consumers",  # this endpoint is part of group "consumers"
        local_com_port=3003
    )

    cons_2 = Endpoint(
        ip_server='127.0.0.1',
        port=3000,
        password="VerySecurePassword",
        groups="consumers",  # this endpoint is part of group "consumers"
        local_com_port=3004,
    )

    # Producer broadcasts an object to any and all endpoint in the destination group "consumers"
    prod.broadcast("I HAVE BEEN BROADCAST", "consumers")

    # Producer sends an object to the shared queue of destination group "consumers"
    prod.produce("I HAVE BEEN PRODUCED", "consumers")

    # Consumer 1 notifies the Relay that it wants one produced object destined for "consumers"
    cons_1.notify("consumers")

    # Consumer 1 is able to retrieve the broadcast AND the consumed object:
    res = []
    while len(res) < 2:
        res += cons_1.receive_all(blocking=True)
    print(f"Consumer 1 has received: {res}")

    # Consumer 2 is able to retrieve only the broadcast object:
    res = cons_2.receive_all(blocking=True)
    print(f"Consumer 2 has received: {res}")

    # Let us close everyone gracefully:
    prod.stop()
    cons_1.stop()
    cons_2.stop()
    re.stop()

Getting started

ℹ️ The machine hosting your Relay must be visible to the machines hosting your Endpoints through the chosen port, via its public ip_server. When using tlspyo over the Internet, this typically requires you to configure your router such that it forwards port to the IP of the machine hosting your Relay on your local network.

Installation

From PyPI:

pip install tlspyo

TLS setup:

ℹ️ You can skip this section if you do not want to use TLS. For instance if you use tlspyo on your own private secure network. When using tlspyo over the Internet, you should of course use TLS (read the security section if you do not understand why).

  • Generate TLS credentials:

tlspyo makes the process of generating your TLS credentials straightforward.

▶️ On the machine that will host your Relay, execute the following command line:

python -m tlspyo --generate

This will generate two files in the tlspyo/credentials data directory: key.pem and certificate.pem.

ℹ️ In case you wish to customize your TLS certificate, add the --custom option in the previous command line.

Now, your need to retrieve your certificate.pem on the machines that will host your Endpoints (note: you can skip the following steps if your Endpoints are on the same machine as your Relay).

This can be achieved via either of the following methods:

  • METHOD 1: manually copy the public certificate (more secure):

▶️ On the machines that will host your Endpoints, execute:

python -m tlspyo --credentials

This creates and displays the target folder where you need to copy the certificate.pem that you generated on the machine that will host the Relay (the source folder was displayed when you executed --generate).

  • METHOD 2: transfer the public certificate via TCP (not secure):

⚠️ This method is not secure. In particular, a man-in-the-middle can impersonate the certificate-broadcasting server and send you a fraudulent TLS certificate. Use with caution.

▶️ On the machine that will host your Relay, start a certificate-broadcasting server:

python -m tlspyo --broadcast --port=<port>

where <port> is a port through which other machines will attempt to retrieve your certificate via TCP.

▶️ On the machines that will host your Endpoints, execute:

python -m tlspyo --retrieve --ip=<ip> --port=<port>

where <ip> is the public IP of the certificate-broadcasting machine, and <port> is the same as previously.

And you are all set! 😎

You can now stop the certificate-broadcasting server by closing the terminal where it runs.

A Simple Producer-Consumer Example

Let us now see how to make basic usage of tlspyo. In this example, we will create a Relay and two Endpoints on the same machine, and have them transfer objects via localhost. The full script for this example can be found here.

Import the Relay and Endpoint classes:

from tlspyo import Relay, Endpoint

Relay

Every tlspyo application requires a central Relay.

The Relay lives on a machine that can be reached by all Endpoints. Typically, you will want this machine to be accessible to your Endpoints via your private local network, or via the Internet through port forwarding. Note however that, before you make your Relay visible to the Internet via, e.g., port forwarding, it is important that you read the Security section.

Creating a Relay is straightforward:

# Initialize a relay to allow connectivity between endpoints

re = Relay(
    port=3000,  # this must be the same on your Relay and Endpoints
    password="VerySecurePassword",  # this must be the same on Relay and Endpoints, AND be strong
    local_com_port=3001,  # this needs to be non-overlapping if Relays/Endpoints live on the same machine
    security="TLS"  # this is the default; replace by None if you do not want to use TLS
)

As soon as your Relay is created, it is up and running. Behind the scenes, it is now waiting for TLS connections from Endpoints. This is done in a background process that listens to port 3000 in this example. This process also communicates with your Relay via local_com_port 3001 in this example.

Usually, you can ignore local_com_port and leave it to the default, unless you use several Endpoints/Relay on the same machine, which we will do.

Endpoints

Now that our Relay is ready, let us create a bunch of Endpoints. This is also pretty straightforward:

# Initialize a producer endpoint

prod = Endpoint(
    ip_server='127.0.0.1', # IP of the Relay (here: localhost)
    port=3000, # must be same port as the Relay
    password="VerySecurePassword", # must be same (strong) password as the Relay
    groups="producers",  # this endpoint is part of the group "producers"
    local_com_port=3002,  # must be unique
    security="TLS"  # this is the default; replace by None if you do not want to use TLS
)

# Initialize  consumer endpoints

cons_1 = Endpoint(
    ip_server='127.0.0.1',
    port=3000,
    password="VerySecurePassword",
    groups="consumers",  # this endpoint is part of group "consumers"
    local_com_port=3003,  # must be unique
    security="TLS"
) 

cons_2 = Endpoint(
    ip_server='127.0.0.1',
    port=3000,
    password="VerySecurePassword",
    groups="consumers",  # this endpoint is part of group "consumers"
    local_com_port=3004,  # must be unique
    security="TLS"
) 

A nice thing about tlspyo is that all communication is handled behind the scenes. The above calls have all launched processes in the background which handle connection and communication between Endpoints through the Relay.

Let us now send some objects from the producer to the consumers. As you may have noticed, we created two different groups here. We put the producer in a group that we have named "producers", and the consumers in another group that we have called "consumers". Note that Endpoint can be created as being part of any number of groups (groups can take a list of strings). When communicating between endpoints, you can use those groups to make sure the right endpoints receive the right objects.

There are two ways for Endpoints to send objects in tlspyo:

  • Broadcasting is used to send an object to all endpoint in a given group. Furthermore, when an Endpoint connects to the Relay, it receives the last object that was broadcast to each of his groups.

    # Producer broadcasts an object to any and all endpoint in the destination group "consumers"
    prod.broadcast("I HAVE BEEN BROADCAST", "consumers")
  • Producing is used to send an object to a queue (FIFO) that is shared between all Endpoints of a given group. The endpoints of the receiving group must Notify the Relay to get access to an object that has been put in that shared queue.

    # Producer sends an object to the shared queue of destination group "consumers"
    prod.produce("I HAVE BEEN PRODUCED", "consumers")
    
    # Consumer notifies the Relay that it wants one produced object destined for "consumers"
    cons_1.notify("consumers")

Once objects reach the consumer endpoint, they are stored in a local queue from which you can retrieve objects whenever you want. To do this, there are multiple options:

  • To retrieve from the local queue in a FIFO fashion, use pop(blocking=blocking, max_items=max_items).
  • To retrieve the most recent item(s) in the local queue and discard the rest, use get_last(blocking=blocking, max_items=max_items).
  • To get all items that are currently in the local queue, use receive_all(blocking=blocking).

ℹ️ Notes:

  • All calls above return a list of objects. If no objects are returned, the result will be an empty list.
  • If blocking is True, all methods above will block until at least one item is received (default to False).
  • Inpop and get_last, use max_items to specify a maximum number of items to be returned (defaults to 1).

Now, let our consumers retrieve their loot:

# Consumer 1 is able to retrieve the broadcast AND the consumed object:
 res = []
 while len(res) < 2:
     res += cons_1.receive_all(blocking=True)
 print(f"Consumer 1 has received: {res}")

 # Consumer 2 is able to retrieve only the broadcast object:
 res = cons_2.receive_all(blocking=True)
 print(f"Consumer 2 has received: {res}")

which prints:

Consumer 1 has received: ['I HAVE BEEN BROADCAST', 'I HAVE BEEN PRODUCED']
Consumer 2 has received: ['I HAVE BEEN BROADCAST']

Once we are done, we can stop all Endpoints, and then the Relay for the sake of a graceful exit:

# Let us close everyone gracefully:
 prod.stop()
 cons_1.stop()
 cons_2.stop()
 re.stop()

There you go! You have now sent your first object over the network using tlspyo.

Please check out the API documentation for more advanced usage.

Security

DISCLAIMER

We are doing our best to make tlspyo reasonably secure when used correctly, but we provide ABSOLUTELY NO GUARANTEE that it is in any sense. We are a small open-source community, and we greatly appreciate your contribution to tackle any potentially unreasonable security concerns or important missing information. Please submit a detailed issue if you are aware of any important exploit not covered in this section.

Implementation

tlspyo relies on the Twisted framework regarding TLS implementation and network management.

Important to know

⚠️ Objects transferred by tlspyo are serialized with pickle by default, so that you can transfer most python objects easily.

NEVER TRANSFER PICKLED OBJECTS OVER A PUBLIC NETWORK WITHOUT tlspyo, as this would make you vulnerable to dangerous exploits. This is because unpickling untrusted pickled objects (i.e., pickled objects created by a malicious user) can lead to arbitrary code execution on your machine.

To prevent this from happening, tlspyo provides two interdependent layers of security:

  • Endpoints authenticate your Relay via TLS, which must use your own secret key and public certificate. This ensures your Endpoints are indeed talking to your Relay and not to some man-in-the-middle, provided you keep your secret key secure. This also prevents anyone else from eavesdropping thanks to TLS encryption.
  • Every object transfer is protected by a password known to both the Relay and the Endpoints (the password argument). No object is deserialized without verification of the password. This ensures that anyone posing as an endpoint will never be able to send undesired objects through your relay unless they know your password.

If a malicious user successfully posed as your Relay, your Endpoint would send them messages that they could decrypt, including your password (this is prevented by TLS when using your own secret key and public certificate). If they successfully posed as your Endpoint they could send malicious pickled objects to your Relay (this is prevented by them not knowing your password).

In a nutshell, when using tlspyo you want your password to be as strong as possible, and your TLS secret key to be kept... well, secret 🔒

For safety-critical applications, we recommend you ditch pickle altogether and instead code a secure custom serialization protocol, on top of the TLS layer provided by tlspyo.

Custom serialization

By default, tlspyo uses pickle for serialization and relies on TLS to prevent attacks.

In advanced application, you may want to use another serialization protocol instead. For instance, you may want to transfer non-picklable objects, further optimize the security of your application, or simply use a pickle serialization protocol or your choice instead of your Python's default.

In particular, in security=None mode (i.e., with TLS disabled) over a public network, using your own secure serialization protocol is critical.

tlspyo makes this easy. All you need to do is code your own serialization protocol following the pickle.dumps/pickle.loads signature, and pass it to the serializer/deserializer arguments of both your Relay and Endpoints.

For instance:

import pickle as pkl
from tlspyo import Relay, Endpoint

# We define a custom serialization protocol based on pickle for simplicity.
# Of course, this is only for illustration.
# In practice, you may not want to use pickle here.


def my_custom_serializer(obj):
    """
    Takes a python object as input and outputs a bytestring
    """
    return b"header" + pkl.dumps(["TEST", pkl.dumps(obj)])


def my_custom_deserializer(bytestring):
    """
    Takes a bytestring as input and outputs a python object
    """
    assert len(bytestring) > len(b"header")
    assert bytestring[:len(b"header")] == b"header"
    bytestring = bytestring[len(b"header"):]
    tmp = pkl.loads(bytestring)
    assert isinstance(tmp, list)
    assert len(tmp) == 2
    assert tmp[0] == "TEST"
    obj = pkl.loads(tmp[1])
    return obj


if __name__ == '__main__':

    re = Relay(
        port=3000,
        password="VerySecurePassword",
        local_com_port=3001,
        security="TLS",
        serializer=my_custom_serializer,
        deserializer=my_custom_deserializer
    )

    ep = Endpoint(
        ip_server='127.0.0.1',
        port=3000,
        password="VerySecurePassword",
        groups="group1",
        local_com_port=3002,
        security="TLS",
        serializer=my_custom_serializer,
        deserializer=my_custom_deserializer
)

External links

tlspyo is an open-source project hosted at Polytechnique Montreal - MISTlab. We use it in various projects, ranging from parallel meta-learning to data transfer between multiple learning robots.

tlspyo relies on Twisted to manage network robustness and security.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.