Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-Term Considerations Regarding Toml File Storage #97

Open
simonsan opened this issue Mar 19, 2024 · 1 comment
Open

Long-Term Considerations Regarding Toml File Storage #97

simonsan opened this issue Mar 19, 2024 · 1 comment
Labels
A-architecture Area: Related to our architecture A-storage Area: Related to our storage system C-question Category: Further information is requested

Comments

@simonsan
Copy link
Contributor

Current situation

I initially implemented the storage model for the activity log based on the Toml file format.
When we begin a new activity, we parse the log with all entries, and then we append it in memory to the vector and write the whole activity log back into the file.

When we end or update an activity, we do the same thing.

I think that has several disadvantages, e.g. when there is an error during writing back the file, it could be damaged and the activity log destroyed. Also, it will take longer and longer to parse it, with activities becoming more and more. I need to benchmark that, it could be negligible with a few thousand activities, which is unlikely to happen, as users might archive their activity log monthly when the archival feature is implemented.

I could refactor the entire model to an event based one, so the log file is really append-only and only writes to the end of the file. But I'm actually not sure if this makes sense at this point, because I want to implement the storage in a SQLite database soonish, which would make this obsolete. Because I don't think we want to do it event based in the database, as it's much easier to query for a record and update it or even batch update records.

The reason I initially used Toml was so users can edit it within their favourite text editor, and I found that kind of useful as I used that a lot to edit activities in bartib. I think this would become less useful, when I reimplement it in a way, that only events are stored. Because then it's not as easy to determine any more, what the actual status, duration etc. of an activity really is. To determine that, we would need to parse all activities in a certain time frame and then merge the events. Which will be much more complicated.

Pros And Cons

Current TOML-Based Storage Model:

Pros:

  • Human-Readable: TOML files can be easily read and edited with a text editor, providing transparency and direct access to the data for users.
  • Ease of Implementation: Implementing storage using TOML is relatively straightforward and doesn't require additional dependencies or infrastructure.
  • Low Complexity: The current model is simple and easy to understand, making it suitable for small to medium-sized datasets.

Cons:

  • Risk of Data Corruption: Writing the entire activity log file each time an update occurs increases the risk of data corruption if there's an error during the write operation.
  • Performance Degradation: Parsing and writing the entire file can become slow and inefficient as the log grows larger, impacting overall application performance.
  • Limited Scalability: The current model may struggle to handle large datasets efficiently, especially as the number of activities increases over time.

Event-Based Append-Only Model:

Pros:

  • Improved Data Integrity: Moving to an append-only model reduces the risk of data corruption, since updates are only appended to the end of the file.
  • Better Performance: With no need to reparse the entire file, performance is improved, especially for large activity logs.
  • Scalability: The event-based model scales more effectively with growing datasets, as it doesn't suffer as much from the performance degradation associated with reparsing the entire file.

Cons:

  • Complexity: Implementing an event-based model introduces additional complexity compared to the current read-all-write-all TOML-based approach, requiring careful design and implementation.
  • Loss of Human-Readability: While the append-only model is more efficient, it sacrifices the human-readable nature of TOML files, making direct editing by users more challenging.
  • Data Retrieval Complexity: Retrieving and interpreting data from an append-only log may require more sophisticated parsing and processing logic, potentially complicating certain operations.
  • Difficulty in Database Migration: If the event log is implemented using a file-based format like TOML, it may not be directly compatible with a database-backed storage solution like SQLite. This can result in the event-based model becoming obsolete when transitioning to a database-driven architecture, requiring a rewrite or significant refactoring of the storage layer.

Direct Transition to SQLite:

Pros:

  • Data Integrity and Reliability: SQLite provides robust data storage capabilities, ensuring data integrity and reliability, even in the face of unexpected errors or interruptions.
  • Efficient Queries: SQLite's query capabilities enable efficient retrieval and manipulation of data, supporting complex queries and analysis.
  • Scalability: SQLite can handle large datasets efficiently, making it suitable for applications with growing storage requirements.

Cons:

  • Dependency and Infrastructure: SQLite introduces a dependency on an external library and requires managing database connections and transactions, adding complexity to the application.
  • Deployment Considerations: Deploying and managing SQLite databases may require additional configuration and maintenance compared to simple file-based storage solutions.
@simonsan simonsan added A-architecture Area: Related to our architecture C-question Category: Further information is requested A-storage Area: Related to our storage system labels Mar 19, 2024
@simonsan
Copy link
Contributor Author

simonsan commented Mar 19, 2024

That being said, I don't think it makes a lot of sense to invest time and energy to refactor to an append-only event-based model for the Toml storage at this point. I think the way to go is to implement the Storage traits for SQLiteStorage and use it as the default storage. Maybe it could make sense refactor the Toml storage to an append-only and event-based local event log, so we can sync from it to a future implementation of a DBMS e.g. PostgresStorage for multi-user/team environments. But even that could be easier to sync from SQLite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-architecture Area: Related to our architecture A-storage Area: Related to our storage system C-question Category: Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant