Please pay attention that Memphis.dev is no longer supported officially by the Superstream team (formerly Memphis.dev ) and was released to the public.
Memphis.dev is a highly scalable, painless, and effortless data streaming platform.
Made to enable developers and data teams to collaborate and build
real-time and streaming apps fast.
$ pip3 install memphis-py
Notice: you may receive an error about the "mmh3" package, to solve it please install python3-devel
$ sudo yum install python3-devel
from memphis import Memphis, Headers
from memphis.types import Retention, Storage
import asyncio
First, we need to create Memphis object
and then connect with Memphis by using memphis.connect
.
async def main():
try:
memphis = Memphis()
await memphis.connect(
host="<memphis-host>",
username="<application-type username>",
account_id=<account_id>, # You can find it on the profile page in the Memphis UI. This field should be sent only on the cloud version of Memphis, otherwise it will be ignored
connection_token="<broker-token>", # you will get it on application type user creation
password="<string>", # depends on how Memphis deployed - default is connection token-based authentication
port=<port>, # defaults to 6666
reconnect=True, # defaults to True
max_reconnect=10, # defaults to -1 which means reconnect indefinitely
reconnect_interval_ms=1500, # defaults to 1500
timeout_ms=1500, # defaults to 1500
# for TLS connection:
key_file='<key-client.pem>',
cert_file='<cert-client.pem>',
ca_file='<rootCA.pem>'
)
...
except Exception as e:
print(e)
finally:
await memphis.close()
if __name__ == '__main__':
asyncio.run(main())
async def connect(
self,
host: str,
username: str,
account_id: int = 1, # Cloud use only, ignored otherwise
connection_token: str = "", # JWT token given when creating client accounts
password: str = "", # For password-based connections
port: int = 6666,
reconnect: bool = True,
max_reconnect: int = 10,
reconnect_interval_ms: int = 1500,
timeout_ms: int = 2000,
# For TLS connections:
cert_file: str = "",
key_file: str = "",
ca_file: str = "",
)
The connect function in the Memphis class allows for the connection to Memphis. Connecting to Memphis (cloud or open-source) will be needed in order to use any of the other functionality of the Memphis class. Upon connection, all of Memphis' features are available.
What arguments are used with the Memphis.connect function change depending on the type of connection being made.
For details on deploying memphis open-source with different types of connections see the docs.
A password-based connection would look like this (using the defualt root memphis login with Memphis open-source):
# Imports hidden. See other examples
async def main():
try:
memphis = Memphis()
await memphis.connect(
host = "localhost",
username = "root",
password = "memphis",
# port = 6666, default port
# reconnect = True, default reconnect setting
# max_reconnect = 10, default number of reconnect attempts
# reconnect_interval_ms = 1500, default reconnect interval
# timeout_ms = 2000, default duration of time for the connection to timeout
)
except Exception as e:
print(e)
finally:
await memphis.close()
if __name__ == '__main__':
asyncio.run(main())
If you wanted to connect to Memphis cloud instead, simply add your account ID and change the host. The host and account_id can be found on the Overview page in the Memphis cloud UI under your name at the top. Here is an example to connecting to a cloud broker that is located in US East:
# Imports hidden. See other examples
async def main():
try:
memphis = Memphis()
await memphis.connect(
host = "aws-us-east-1.cloud.memphis.dev",
username = "my_client_username",
password = "my_client_password",
account_id = "123456789"
# port = 6666, default port
# reconnect = True, default reconnect setting
# max_reconnect = 10, default number of reconnect attempts
# reconnect_interval_ms = 1500, default reconnect interval
# timeout_ms = 2000, default duration of time for the connection to timeout
)
except Exception as e:
print(e)
finally:
await memphis.close()
if __name__ == '__main__':
asyncio.run(main())
It is possible to use a token-based connection to memphis as well, where multiple users can share the same token to connect to memphis. Here is an example of using memphis.connect with a token:
# Imports hidden. See other examples
async def main():
try:
memphis = Memphis()
await memphis.connect(
host = "localhost",
username = "user",
connection_token = "token",
# port = 6666, default port
# reconnect = True, default reconnect setting
# max_reconnect = 10, default number of reconnect attempts
# reconnect_interval_ms = 1500, default reconnect interval
# timeout_ms = 2000, default duration of time for the connection to timeout
)
except Exception as e:
print(e)
finally:
await memphis.close()
if __name__ == '__main__':
asyncio.run(main())
The token will be presented when creating new users.
Memphis needs to be configured to use a token based connection. See the docs for help doing this.
For the rest of the examples, the try-except statement and the asyncio runtime call will be withheld to assist with the succinctness of the examples.
A TLS based connection would look like this:
# Imports hidden. See other examples
try:
memphis = Memphis()
await memphis.connect(
host = "localhost",
username = "user",
key_file = "~/tls_file_path.key",
cert_file = "~/tls_cert_file_path.crt",
ca_file = "~/tls_ca_file_path.crt",
# port = 6666, default port
# reconnect = True, default reconnect setting
# max_reconnect = 10, default number of reconnect attempts
# reconnect_interval_ms = 1500, default reconnect interval
# timeout_ms = 2000, default duration of time for the connection to timeout
)
except Exception as e:
print(e)
finally:
Memphis needs to configured for these use cases. To configure memphis to use TLS see the docs.
To disconnect from Memphis, call close()
on the memphis object.
await memphis.close()
Stations are distributed units that store messages. Producers add messages to stations and Consumers take messages from them. Each station stores messages until their retention policy causes them to either delete the messages or move them to remote storage.
A station will be automatically created for the user when a consumer or producer is used if no stations with the given station name exist.
If the station trying to be created exists when this function is called, nothing will change with the exisitng station
async def station(
self,
name: str,
retention_type: Retention = Retention.MAX_MESSAGE_AGE_SECONDS, # MAX_MESSAGE_AGE_SECONDS/MESSAGES/BYTES/ACK_BASED(cloud only). Defaults to MAX_MESSAGE_AGE_SECONDS
retention_value: int = 3600, # defaults to 3600
storage_type: Storage = Storage.DISK, # Storage.DISK/Storage.MEMORY. Defaults to DISK
replicas: int = 1,
idempotency_window_ms: int = 120000, # defaults to 2 minutes
schema_name: str = "", # defaults to "" (no schema)
send_poison_msg_to_dls: bool = True, # defaults to true
send_schema_failed_msg_to_dls: bool = True, # defaults to true
tiered_storage_enabled: bool = False, # defaults to false
partitions_number: int = 1, # defaults to 1
dls_station: str = "", # defaults to "" (no DLS station). If given, both poison and schema failed events will be sent to the DLS
)
The station function is used to create a station. Using the different arguemnts, one can programically create many different types of stations. The Memphis UI can also be used to create stations to the same effect.
Creating a station with just a name name would create a station with that named and containing the default options provided above:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station"
)
To change what criteria the station uses to decide if a message should be retained in the station, change the retention type. The different types of retention are documented here in the python README.
The unit of the rentention value will vary depending on the retention_type. The previous link also describes what units will be used.
Here is an example of a station which will only hold up to 10 messages:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
retention_type = Retention.MESSAGES,
retention_value = 10
)
Memphis stations can either store Messages on disk or in memory. A comparison of those types of storage can be found here.
Here is an example of how to create a station that uses Memory as its storage type:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
storage_type = Storage.MEMORY
)
In order to make a station more redundant, replicas can be used. Read more about replicas here. Note that replicas are only available in cluster mode. Cluster mode can be enabled in the Helm settings when deploying Memphis with Kubernetes.
Here is an example of creating a station with 3 replicas:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
replicas = 3
)
Idempotency defines how Memphis will prevent duplicate messages from being stored or consumed. The duration of time the message ID's will be stored in the station can be set with idempotency_window_ms. If the environment Memphis is deployed in has unreliably connection and/or a lot of latency, increasing this value might be desiriable. The default duration of time is set to two minutes. Read more about idempotency here.
Here is an example of changing the idempotency window to 3 seconds:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
idempotency_window_ms = 180000
)
The schema name is used to set a schema to be enforced by the station. The default value of "" ensures that no schema is enforced. Here is an example of changing the schema to a defined schema in schemaverse called "sensor_logs":
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
schema = "sensor_logs"
)
There are two parameters for sending messages to the dead-letter station(DLS). These are send_poison_msg_to_dls and send_schema_failed_msg_to_dls.
Here is an example of sending poison messages to the DLS but not messages which fail to conform to the given schema.
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
schema = "sensor_logs",
send_poison_msg_to_dls = True,
send_schema_failed_msg_to_dls = False
)
When either of the DLS flags are set to True, a station can also be set to handle these events. To set a station as the station to where schema failed or poison messages will be set to, use the dls_station parameter:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
schema = "sensor_logs",
send_poison_msg_to_dls = True,
send_schema_failed_msg_to_dls = False,
dls_station = "bad_sensor_messages_station"
)
When the retention value is met, Mempihs by default will delete old messages. If tiered storage is setup, Memphis can instead move messages to tier 2 storage. Read more about tiered storage here. Enable this setting with the respective flag:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
tiered_storage_enabled = True
)
Partitioning might be useful for a station. To have a station partitioned, simply change the partitions number:
memphis = Memphis()
await memphis.connect(...)
await memphis.station(
name = "my_station",
partitions_number = 3
)
Retention types define the methodology behind how a station behaves with its messages. Memphis currently supports the following retention types:
memphis.types.Retention.MAX_MESSAGE_AGE_SECONDS
When the retention type is set to MAX_MESSAGE_AGE_SECONDS, messages will persist in the station for the number of seconds specified in the retention_value.
memphis.types.Retention.MESSAGES
When the retention type is set to MESSAGES, the station will only hold up to retention_value messages. The station will delete the oldest messsages to maintain a retention_value number of messages.
memphis.types.Retention.BYTES
When the retention type is set to BYTES, the station will only hold up to retention_value BYTES. The oldest messages will be deleted in order to maintain at maximum retention_vlaue BYTES in the station.
memphis.types.Retention.ACK_BASED # for cloud users only
When the retention type is set to ACK_BASED, messages in the station will be deleted after they are acked by all subscribed consumer groups.
The unit of the retention_value
changes depending on the retention_type
specified.
All retention values are of type int
. The following units are used based on the respective retention type:
memphis.types.Retention.MAX_MESSAGE_AGE_SECONDS
is in seconds,
memphis.types.Retention.MESSAGES
is a number of messages,
memphis.types.Retention.BYTES
is a number of bytes,
With memphis.ACK_BASED
, the retention_type
is ignored
Memphis currently supports the following types of messages storage:
memphis.types.Storage.DISK
When storage is set to DISK, messages are stored on disk.
memphis.types.Storage.MEMORY
When storage is set to MEMORY, messages are stored in the system memory.
Destroying a station will remove all its resources (including producers/consumers)
station.destroy()
In case schema is already exist a new version will be created
await memphis.create_schema("<schema-name>", "<schema-type>", "<schema-file-path>")
Current available schema types - Protobuf / JSON schema / GraphQL schema / Avro
async def enforce_schema(self, name, station_name)
To add a schema to an already created station, enforce_schema can be used. Here is an example using enforce_schmea to add a schema to a station:
memphis = Memphis()
await memphis.connect(...)
await memphis.enforce_schmea(
name = "my_schmea",
station_name = "my_station"
)
await memphis.attach_schema("<schema-name>", "<station-name>")
async def detach_schema(self, station_name)
To remove a schema from an already created station, detach_schema can be used. Here is an example of removing a schmea from a station:
memphis = Memphis()
await memphis.connect(...)
await memphis.detach_schmea(
station_name = "my_station"
)
The most common client operations are using produce
to send messages and consume
to
receive messages.
Messages are published to a station with a Producer and consumed from it by a Consumer.
Consumers are poll based and consume all the messages in a station. Consumers can also be grouped into consumer groups. When consuming with a consumer group, all consumers in the group will receive each message.
Memphis messages are payload agnostic. Payloads are always bytearray
s.
In order to stop getting messages, you have to call consumer.destroy()
. Destroy will terminate the consumer even if messages are currently being sent to the consumer.
If a station is created with more than one partition, producing to and consuming from the station will happen in a round robin fashion.
async def producer(
self,
station_name: str,
producer_name: str,
generate_random_suffix: bool = False, #Depreicated
)
Use the Memphis producer function to create a producer. Here is an example of creating a producer for a given station:
memphis = Memphis()
await memphis.connect(...)
producer = await memphis.producer(
station_name = "my_station",
producer_name = "new_producer"
)
async def produce(
self,
message,
ack_wait_sec: int = 15,
headers: Union[Headers, None] = None,
async_produce: Union[bool, None] = None,
nonblocking: bool = False,
msg_id: Union[str, None] = None,
concurrent_task_limit: Union[int, None] = None,
producer_partition_key: Union[str, None] = None,
producer_partition_number: Union[int, -1] = -1
):
Both producers and connections can use the produce function. To produce a message from a connection, simply call memphis.produce
. This function will create a producer if none with the given name exists, otherwise it will pull the producer from a cache and use it to produce the message.
await memphis.produce(station_name='test_station_py', producer_name='prod_py',
message='bytearray/protobuf class/dict/string/graphql.language.ast.DocumentNode', # bytearray / protobuf class (schema validated station - protobuf) or bytearray/dict (schema validated station - json schema) or string/bytearray/graphql.language.ast.DocumentNode (schema validated station - graphql schema) or bytearray/dict (schema validated station - avro schema)
ack_wait_sec=15, # defaults to 15
headers=headers, # default to {}
nonblocking=False, #defaults to false
msg_id="123",
producer_partition_key="key" #default to None
)
Creating a producer and calling produce on it will increase the performance of producing messages as it removes the overhead of pulling created producers from the cache.
await producer.produce(
message='bytearray/protobuf class/dict/string/graphql.language.ast.DocumentNode', # bytearray / protobuf class (schema validated station - protobuf) or bytearray/dict (schema validated station - json schema) or string/bytearray/graphql.language.ast.DocumentNode (schema validated station - graphql schema) or or bytearray/dict (schema validated station - avro schema)
ack_wait_sec=15) # defaults to 15
Here is an example of a produce function call that waits up to 30 seconds for an acknowledgement from memphis and does so in an nonblocking manner:
memphis = Memphis()
await memphis.connect(...)
await memphis.produce(
station_name = "some_station",
producer_name = "temp_producer",
message = {'some':'message'},
ack_wait_sec = 30,
nonblocking = True
)
As discussed before in the station section, idempotency is an important feature of memphis. To achieve idempotency, an id must be assigned to messages that are being produced. Use the msg_id parameter for this purpose.
memphis = Memphis()
await memphis.connect(...)
await memphis.produce(
station_name = "some_station",
producer_name = "temp_producer",
message = {'some':'message'},
msg_id = '42'
)
To add message headers to the message, use the headers parameter. Headers can help with observability when using certain 3rd party to help monitor the behavior of memphis. See here for more details.
memphis = Memphis()
await memphis.connect(...)
await memphis.produce(
station_name = "some_station",
producer_name = "temp_producer",
message = {'some':'message'},
headers = {
'trace_header': 'track_me_123'
}
)
Lastly, memphis can produce to a specific partition in a station. To do so, use the producer_partition_key parameter:
memphis = Memphis()
await memphis.connect(...)
await memphis.produce(
station_name = "some_station",
producer_name = "temp_producer",
message = {'some':'message'},
producer_partition_key = "2nd_partition"
)
Or, alternatively, use the producer_partition_number parameter:
memphis = Memphis()
await memphis.connect(...)
await memphis.produce(
station_name = "some_station",
producer_name = "temp_producer",
message = {'some':'message'},
producer_partition_number = 2
)
For better performance, the client won't block requests while waiting for an acknowledgment. If you are producing a large number of messages very quickly, there maybe some timeout errors, then you may need to limit the number of concurrent tasks to get around this:
await producer.produce(
message='bytearray/protobuf class/dict/string/graphql.language.ast.DocumentNode', # bytearray / protobuf class (schema validated station - protobuf) or bytearray/dict (schema validated station - json schema) or string/bytearray/graphql.language.ast.DocumentNode (schema validated station - graphql schema)
headers={}, nonblocking=True, limit_concurrent_tasks=500)
You may read more about this here on the memphis.dev blog.
Producing to multiple stations can be done by creating a producer with multiple stations and then calling produce on that producer.
memphis = Memphis()
await memphis.connect(...)
producer = await memphis.producer(
station_name = ["station_1", "station_2"],
producer_name = "new_producer"
)
await producer.produce(
message = "some message"
)
Alternatively, it also possible to produce to multiple stations using the connection:
memphis = Memphis()
await memphis.connect(...)
await memphis.produce(
station_name = ["station_1", "station_2"],
producer_name = "new_producer",
message = "some message"
)
producer.destroy()
consumer = await memphis.consumer(
station_name="<station-name>",
consumer_name="<consumer-name>",
consumer_group="<group-name>", # defaults to the consumer name
pull_interval_ms=1000, # defaults to 1000
batch_size=10, # defaults to 10
batch_max_time_to_wait_ms=100, # defaults to 100
max_ack_time_ms=30000, # defaults to 30000
max_msg_deliveries=2, # defaults to 2
start_consume_from_sequence=1, # start consuming from a specific sequence. defaults to 1
last_messages=-1 # consume the last N messages, defaults to -1 (all messages in the station)
)
Consumers are used to pull messages from a station. Here is how to create a consumer with all of the default parameters:
memphis = Memphis()
await memphis.connect(...)
consumer = await Memphis.consumer(
station_name = "my_station",
consumer_name: "new_consumer",
)
To create a consumer in a consumer group, add the consumer_group parameter:
memphis = Memphis()
await memphis.connect(...)
consumer = await Memphis.consumer(
station_name = "my_station",
consumer_name: "new_consumer",
consumer_group: "consumer_group_1"
)
When using Consumer.consume, the consumer will continue to consume in an infinite loop. To change the rate at which the consumer polls the station for new messages, change the pull_interval_ms parameter:
memphis = Memphis()
await memphis.connect(...)
consumer = await Memphis.consumer(
station_name = "my_station",
consumer_name = "new_consumer",
pull_interval_ms = 2000
)
Every time the consumer pulls from the station, the consumer will try to take batch_size number of elements from the station. However, sometimes there are not enough messages in the station for the consumer to consume a full batch. In this case, the consumer will continue to wait until either batch_size messages are gathered or the time in milliseconds specified by batch_max_time_to_wait_ms is reached.
Here is an example of a consumer that will try to poll 100 messages every 10 seconds while waiting up to 15 seconds for all messages to reach the consumer.
memphis = Memphis()
await memphis.connect(...)
consumer = await Memphis.consumer(
station_name = "my_station",
consumer_name = "new_consumer",
pull_interval_ms = 10000,
batch_size = 100,
batch_max_time_to_wait_ms = 100
)
The max_msg_deliveries parameter allows the user how many messages the consumer is able to consume before consuming more. The max_ack_time_ms Here is an example where the consumer will only hold up to one batch of messages at a time:
memphis = Memphis()
await memphis.connect(...)
consumer = await Memphis.consumer(
station_name = "my_station",
consumer_name = "new_consumer",
pull_interval_ms = 10000,
batch_size = 100,
batch_max_time_to_wait_ms = 100,
max_msg_deliveries = 2
)
The key will be used to consume from a specific partition
consumer.consume(msg_handler,
consumer_partition_key = "key" #consume from a specific partition
)
The number will be used to consume from a specific partition
consumer.consume(msg_handler,
consumer_partition_number = -1 #consume from a specific partition
)
context = {"key": "value"}
consumer.set_context(context)
To use a consumer to process messages, use the consume function. The consume function will have a consumer poll a station for new messages as discussed in previous sections. The consumer will stop polling the statoin once all the messages in the station were consumed, and the msg_handler will receive a Memphis: TimeoutError
.
async def msg_handler(msgs, error, context):
for msg in msgs:
print("message: ", msg.get_data())
await msg.ack()
if error:
print(error)
consumer.consume(msg_handler)
To get messages deserialized, use msg.get_data_deserialized()
.
async def msg_handler(msgs, error, context):
for msg in msgs:
print("message: ", await msg.get_data_deserialized())
await msg.ack()
if error:
print(error)
consumer.consume(msg_handler)
There may be some instances where you apply a schema after a station has received some messages. In order to consume those messages get_data_deserialized may be used to consume the messages without trying to apply the schema to them. As an example, if you produced a string to a station and then attached a protobuf schema, using get_data_deserialized will not try to deserialize the string as a protobuf-formatted message.
Using fetch_messages or fetch will allow the user to remove a specific number of messages from a given station. This behavior could be beneficial if the user does not want to have a consumer actively poll from a station indefinetly.
msgs = await memphis.fetch_messages(
station_name="<station-name>",
consumer_name="<consumer-name>",
consumer_group="<group-name>", # defaults to the consumer name
batch_size=10, # defaults to 10
batch_max_time_to_wait_ms=100, # defaults to 100
max_ack_time_ms=30000, # defaults to 30000
max_msg_deliveries=2, # defaults to 2
start_consume_from_sequence=1, # start consuming from a specific sequence. defaults to 1
last_messages=-1, # consume the last N messages, defaults to -1 (all messages in the station))
consumer_partition_key="key", # used to consume from a specific partition, default to None
consumer_partition_number=-1 # used to consume from a specific partition, default to -1
)
msgs = await consumer.fetch(batch_size=10) # defaults to 10
prefetch = true
will prefetch next batch of messages and save it in memory for future fetch() request
msgs = await consumer.fetch(batch_size=10, prefetch=True) # defaults to False
Acknowledge a message indicates the Memphis server to not re-send the same message again to the same consumer / consumers group
await message.ack()
Mark the message as not acknowledged - the broker will resend the message immediately to the same consumers group, instead of waiting to the max ack time configured.
await message.nack();
Sending the message to the dead-letter station (DLS) - the broker won't resend the message again to the same consumers group and will place the message inside the dead-letter station (DLS) with the given reason. The message will still be available to other consumer groups
await message.dead_letter("reason");
Delay the message and tell Memphis server to re-send the same message again to the same consumer group. The message will be redelivered only in case consumer.max_msg_deliveries
is not reached yet.
await message.delay(delay_in_seconds)
Get headers per message
headers = message.get_headers()
Get message sequence number
sequence_number = msg.get_sequence_number()
Get message time sent
time_sent = msg.get_timesent()
consumer.destroy()
memphis.is_connected()