An implementation of the AVL tree to store data. I'm also include basic transaction handling with commit and rollback methods using in-memory snapshots and saving it to a .pkl (binary file)
databasestructure.py
Note: see an example with main.py
file
import pickle
from databasestructure import *
db = CustomDatabase()
Example about how to create a table named users, with columns id
, name
and age
:
db.create_table('users', ['id', 'name', 'age'], key_index=0)
Note: key_index
is wich column index has the id
column, in this case id
is on the first column, that's why key_index = 0.
db.insert('users', [None, 'Alice', 30])
db.commit('users')
Note: The id
column is the index, you use None
to use the automatic autoincrement feature
This will look on the id
column.
# Select the user with id=0
headers, user = db.select('users', 0)
print(headers)
print(user)
# Select the user with age > 25
headers, user = db.select_by_column_value("users", "age", '>', 25)
print(headers)
print(user)
Valid operators:
>
>=
<
<=
==
!=
# Select table
headers, table = db.select_all('users')
print(headers)
print(table)
# You need the index of the row to be updated
user = db.update('users', 0, [0, 'Alice', 31])
print(user)
# Add the column email to users table
db.add_column('users', 'email')
print(db.select_all('users'))
# You need the index of the row to be deleted
db.delete('users', 1)
Let's say that we have these tables:
- Table
users
:
id | name | age |
---|---|---|
0 | Charlie | 28 |
1 | Alice | 30 |
- Table
orders
:
order_id | user_id | product |
---|---|---|
101 | 0 | Laptop |
102 | 1 | Smartphone |
103 | 0 | Phone |
If we want to join tables, we must do it based on the key_index
value.
So, to join table users
and table orders
by user_id
column:
# Perform the join
headers, joined_data = db.join('users', 'orders', key_index1=0, key_index2=1)
# Print the joined data
print("Joined Data:\n")
print(headers)
for data in joined_data:
print(data)
key_index1 = 0
: This means that the join key for the first table (users
) is in the first column (id
).key_index2 = 1
: This means that the join key for the second table (orders
) is in the second column (user_id
).
Note: Check test_joining.py
file
- Result will be:
id | name | age | order_id | user_id | product |
---|---|---|---|---|---|
0 | Charlie | 28 | 101 | 0 | Laptop |
1 | Alice | 30 | 102 | 1 | Smartphone |
0 | Charlie | 28 | 103 | 0 | Phone |
db.insert('users', [None, 'Charlie', 28])
print("Before rollback user with latest id:", db.select_all('users'))
db.rollback('users')
print("After rollback user with latest:", db.select_all('users'))
db.insert('users', [None, 'Charlie', 28])
db.commit('users')
print("After commit user with latest id:", db.select_all('users'))
# Save the database in binary format
db.save('./avl_database.pkl')
# Load the database in binary format
db = CustomDatabase.load('./avl_database.pkl')
You must have a .csv with the same structure that the table you want to upload. Let's say table users
, you will need a .csv file with a structure like the example:
id | name | age |
---|---|---|
1 | John | 25 |
2 | Jane | 30 |
3 | Bob | 22 |
db.load_from_csv("users", "./data.csv")
Please check: main_multi_threading.py
file
Inside of the dummy_data
folder, there a generator of dummy data and main_test.py
that will execute basic operations to test the database performance:
The results may vary due to CPU capabilities, and if you want to use multi-thread. For test support i used only 1 thread and a 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz CPU (very normal CPU).
Used 10,000 dummy users and 20,000 orders.
Test | AVL DB (seconds) | MySQL (seconds) |
---|---|---|
Creation of both tables | 0.0 | - |
Loaded 10,000 users from .csv to memory RAM | 0.12057 | - |
Loaded 20,000 orders from .csv to memory RAM | 0.29318 | - |
Write 30,000 rows from RAM to .pkl file | 0.09558 | - |
Load 30,000 rows from .pkl to memory RAM | 0.10949 | - |
From 10,000 users select those older than 25 yeard old | 0.00709 | - |
Joining users and orders tables, total 17,358 rows |
0.16644 | - |