A basic payment processor, written in Rust
This payment processor is a simple rust crate generated using cargo init
.
It can be built using cargo build
, and ran using cargo run
. The application takes a single command-line argument
which is to be the path of the input data. Data should be in .csv
format, but we do not care about the extension.
If the path can not be found, an error will be logged to stderr, and the application will exit.
This repo has been analyzed for security vulnerabilities using cargo-audit
, has been linted with Clippy, and
formatted with cargo fmt
. It has also been run with cargo-valgrind
against various data sets and no issues have been reported.
- Disputes, resolves, and chargebacks are only possible on
TransactionType::Deposit
. - Transactions happen chronologically in a file
- Once an account has been locked due to a chargeback, all subsequent transactions to this account will return an error.
- We will not terminate the application in the event of a bad transaction, we will simply discard it and move on.
To process a CSV formatted list of transactions, simply run the application as follows:
cargo run -- sample-input/transactions.csv
If you wish to pipe your output to a file, you may do so by:
cargo run -- sample-input/transactions.csv > accounts.csv
To run the tests, run:
cargo test
This simple payment processor takes in CSV formatted data with the following columns:
column | description |
---|---|
type | A String. ("deposit" "withdrawal" "dispute" "resolve" or "chargeback") |
client | a valid u16 client ID |
tx | a valid u32 transaction ID |
amount | decimal value with a precision of up to four places past the decimal |
An example data set containing only deposits and withdrawals is shown below. More data sets can be found in the repo under 'sample-input'.
type, client, tx, amount
deposit, 1, 1, 1.0
deposit, 2, 2, 2.0
deposit, 1, 3, 2.0
withdrawal, 1, 4, 1.5
withdrawal, 2, 5, 3.0
A few points on CSV formatting:
- Values are case-sensitive.
- The Only supported delimiter is:
,
. - Whitespace doesn't matter.
- Column ordering doesn't matter.
- Amounts will be rounded to four decimal places.
This crate uses an env-logger; by default log messages of type error
will be written to stderr. You can control the
log level through the use of environment variables as described below:
RUST_LOG=off cargo run sample-input/transactions.csv
Valid levels for RUST_LOG
are:
- error
- warn
- info
- debug
- trace
Serde is a framework for serializing and deserializing Rust data structures efficiently and generically. To learn more, check it out on crates.io here.
Using SERDE allows me to define a Transaction
struct as shown below and simply derive the functionality needed
to serialize / deserialize it. Since the CSV crate provides support for SERDE, using them in common allows very
readable (maintainable) code and reduces boilerplate.
Serde also provides support for enumerations (internally tagged, externally tagged, and untagged). This allows me to be confident that a Transaction that was deserialized properly definitely has valid data for all of the types within the struct.
Note: Below, I derived the functionality for both Serialize and Deserialize even though I do not serialize transactions. That's ok; in Rust, we only pay for what we use and since I do not use this functionality, I don't incur any additional overhead for including it here.
#[derive(Serialize, Deserialize, Debug, PartialEq)]
struct Transaction {
#[serde(rename = "type")]
kind: TransactionType,
client: u16,
tx: u32,
amount: Option<Decimal>,
#[serde(default)]
under_dispute: bool,
}
A Decimal implementation written in pure Rust suitable for financial calculations that require significant integral and fractional digits with no round-off errors. To learn more, check it out on crates.io here.
The rust_decimal crate is really great. Not only does it provide data types suitable for financial calculations, but it also provides functions to normalize and round (according to various rounding schemes) our data. Additionally, rust_decimal_macros provides some super useful macros to make creating Decimals very easy.
The Decimal data type also supports all common arithmetic operations out of the box.
A fast and flexible CSV reader and writer for Rust, with support for Serde. To learn more, check it out on crates.io here.
The CSV crate makes dealing with CSV data a snap. Especially with how nicely it plays with SERDE.
Intuition tells me that the current bottleneck of the applications is likely be the reading of records from CSV and
possibly the parsing of these records into the Transaction
struct. Once this is addressed, we can look at
executing tasks in parallel and possibly even prioritizing certain types of transactions.
Having said that, profiling the application against large data sets prior to optimization and having a clear requirement in mind is key to avoiding the trap of premature optimization. This approach also allows us to focus our efforts and measure our improvement.
In order to speed up the CSV parsing, I would first follow the advice here.
An effort should be made to amortize allocations and avoid UTF-8 checks by reading and writing ByteRecords instead of
StringRecords. Any str
s will now be &[u8]
s, so we lose the API around Strings, but in the interest or performance
that could be a worthwhile tradeoff. At this stage we should also profile the performance of deserializing a CSV byte
record into a Transaction
struct and determine if it is worth implementing a custom deserializer.
With CSV parsing sped up, the next thing we can look at is threading. It may make sense to have a reader thread and one
(or more) processing threads. The reader thread will simply read ByteRecords records from CSV and place the deserialized
Transaction
at the tail of a queue. The processing thread will take items from the head of the queue and process them.
As an extension of this, we could increase the number of processing threads and separate incoming transactions into two or more queues (by type) allowing us to prioritize certain transaction types. Initially, my thought is to have one queue for deposits/withdrawals and another for disputes, resolves, and chargebacks where processing threads prioritize handling disputes / chargebacks. It should be noted that we would need to determine up front how we would handle the case where pending transactions are backed up and a dispute comes in before its associated deposit has been processed.
Lastly, we can leverage tools such as Callgrind and KCacheGrind to identify possible bottlenecks caused by branch prediction / cache misses. Optimization here should focus on the 'hot path' as optimization handling of edge cases will not generally yield any significant improvement.
In addition to the above mentioned performance improvements, I will consider implementing making the following improvements:
- Clean up the tests.
- Support various input data formats.
- Add non-volatile storage, likely in a relational database.
- Create a CI pipeline that runs an automated suite of tests on every PR and merge to devel/main branches.