Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

Store reports in the database #1559

Merged
merged 1 commit into from
Mar 11, 2020
Merged

Store reports in the database #1559

merged 1 commit into from
Mar 11, 2020

Conversation

xcheng-here
Copy link
Contributor

It's for OTA-3637. Store reports in database when events generated
and delete them after sent them to server.

Signed-off-by: cheng.xiang ext-cheng.xiang@here.com

Copy link
Collaborator

@pattivacek pattivacek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue, but according to our contribution guidelines, the name of the branch should have a very short description of what is being fixed (since external contributors can't see our Jira tickets). In this case, fix/OTA-3736/store-reports or something would be good.

@@ -32,17 +33,15 @@ void ReportQueue::run() {

void ReportQueue::enqueue(std::unique_ptr<ReportEvent> event) {
{
std::lock_guard<std::mutex> lock(m_);
report_queue_.push(std::move(event));
//std::lock_guard<std::mutex> lock(m_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lock really no longer needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it back.

SQLite3Guard db = dbConnection();
int64_t id = 0;
{
auto statement = db.prepareStatement("SELECT MAX(id) FROM report_events;");
Copy link
Contributor

@lbonn lbonn Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two statements should be in a transaction.

Or better, it can be done in one step:

db.prepareStatement<std::string>("INSERT INTO report_events(id, json_string) SELECT MAX(id) + 1, ? FROM report_events", json_string);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, modified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the changes yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a mistake in the last push. It should be updated now.

bool SQLStorage::loadReportEvents(Json::Value* report_array, int64_t* id_max) {
SQLite3Guard db = dbConnection();
std::vector<std::pair<Uptane::EcuSerial, int64_t>> ecu_cnt;
// keep the same order as in ecu_serials (start with primary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this comment was copy-pasted by mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed.

report_queue_.pop();
}
int64_t max_id = 0;
storage->loadReportEvents(&report_array, &max_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use load everything from storage here, you do not need to keep report_array as a class member because the state is in the database.

It looks like it can be a local variable in this function and calls to report_array.clear() would change to storage->deleteReportEvents(...).

IMO, one of the important thing here is to try to minimize double-sent events. It looks like the backend should be able to discard duplicates by looking at the random uuid we send, though we should probably check with them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the report_array. But I'm still thinking about how to prevent double-sent. Do we have a mechanism to check it with backend ? ( And does a check better than just send it ? )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking before sending sounds complicated and not worth the overhead anyway. I'm trying to find out with the backend team if they can live with occasional resends.

@codecov-io
Copy link

codecov-io commented Feb 26, 2020

Codecov Report

Merging #1559 into master will increase coverage by 0.13%.
The diff coverage is 84.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1559      +/-   ##
==========================================
+ Coverage   82.36%   82.49%   +0.13%     
==========================================
  Files         189      189              
  Lines       11872    11929      +57     
==========================================
+ Hits         9778     9841      +63     
+ Misses       2094     2088       -6
Impacted Files Coverage Δ
src/libaktualizr/storage/invstorage.h 97.95% <ø> (ø) ⬆️
src/libaktualizr/storage/sqlstorage.h 50% <ø> (ø) ⬆️
src/libaktualizr/storage/sqlstorage_base.h 100% <ø> (+40%) ⬆️
src/libaktualizr/primary/reportqueue.h 23.07% <ø> (ø) ⬆️
src/libaktualizr/primary/sotauptaneclient.h 100% <100%> (ø) ⬆️
src/libaktualizr/primary/reportqueue.cc 100% <100%> (ø) ⬆️
src/libaktualizr/storage/sql_utils.h 85.89% <100%> (+1.17%) ⬆️
src/libaktualizr/storage/sqlstorage_base.cc 79.86% <100%> (+5.54%) ⬆️
src/libaktualizr/storage/sqlstorage.cc 77.67% <77.27%> (+0.56%) ⬆️
src/libaktualizr/package_manager/ostreemanager.cc 80.51% <0%> (-1.11%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e502843...6c93364. Read the comment docs.

@xcheng-here
Copy link
Contributor Author

@patrickvacek I didn't modify the branch name yet because I'm not sure whether it will make the comments lost.

@pattivacek
Copy link
Collaborator

I didn't modify the branch name yet because I'm not sure whether it will make the comments lost.

No worries, looks like it's not possible to change it once the PR has already been made without making a new PR, which isn't worth it. Anyway, you put a description in the first comment of the PR, so that should help.

@@ -12,6 +12,8 @@
#include "sql_utils.h"
#include "utilities/utils.h"

std::mutex SQLite3Guard::m_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. This treats this mutex like a global instead of as a member variable. Is that really what we have to do? If so, then can you add a comment that explains why it has to be that way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the mutex is a member of SQLite3Guard, it will be destroyed with the SQLite3Guard variable. And currently the way of our code using SQLite3Guard is that define it in the stack of functions which will access database, so it will be destroyed when the function returned. IMO, it means every function using a new mutex, so it will not really lock. Please correct me if I made something wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently the way of our code using SQLite3Guard is that define it in the stack of functions which will access database, so it will be destroyed when the function returned. IMO, it means every function using a new mutex, so it will not really lock.

Isn't that what we want? We should only lock the database when we are writing to it, and then immediately release. Or do we really want to hold on to the mutex for the duration of the Storage object? I don't really see what the point of that is if the Storage is basically a singleton.

Also, I just noticed that the extern std::mutex sql_mutex declared in sql_utils.h appears to be unused and an obsolete relic of previous rounds of locking logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just realized what you meant now. Yes, the mutex does have to live a higher level to be meaningful. But wouldn't a member variable of SqlStorage still be preferable to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a member of SqlStorage, I think I will have two options to implement the lock. The first one is that add a lock_guard to every function of SqlStorage, which will lock/unlock this mutex, I don't think it's more graceful than using SQLite3Guard, which is already used in every function. The second option is let the mutex be member of SqlStorage, but still use SQLite3Guard to lock it. I think it's also not graceful. So I choose to use a static member of SQLite3Guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's also possible to keep a pointer of storage in that WHandle class, I think it will be better than keep a pointer of mutex, I will try it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem is that now we could init SQLite3Guard with the path of database, that means we didn't associate a Storage object to it. If we want the mutex be a member of Storage, maybe we'd better avoid init it in this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a pointer to storage in WHandle seems fine.

And you want to pass a mutex in SQLite3Guard's constructor, is that right?
As long as the other calls to the constructors are in existing tests which don't try to access the db in parallel with SQLStorage, that's maybe fine if the lock is made optional.

We can discuss again on Friday to clarify the question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I want to pass a mutex into SQLite3Guard's constructor, and use a shared_ptr of mutex to keep it. But when I start coding, I found I still feel not very good for this design, it still make a pointer to a member variable of another class.
And yes, I also think make the lock be optional will make the existing test code work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks it's not a problem to use a pointer of mutex SQLite3Guard, because we usually only use SQLite3Guard as local variable. And I haven't found another way which is graceful enough and will not need to restruct all of the storage code. So I think I will still use this solution.

@xcheng-here xcheng-here force-pushed the fix/OTA-3736 branch 3 times, most recently from 7e479ef to 93824d7 Compare March 3, 2020 10:30
@@ -1700,7 +1760,8 @@ boost::optional<std::pair<uintmax_t, std::string>> SQLStorage::checkTargetFile(c
class SQLTargetWHandle : public StorageTargetWHandle {
public:
SQLTargetWHandle(const SQLStorage& storage, Uptane::Target target)
: db_(storage.dbPath()), target_(std::move(target)) {
: db_path_(storage.dbPath()), target_(std::move(target)) {
storage_ = &storage;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_path looks unused now, can it be removed? And I think storage_ can be set in the initializer list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved storage_ to initializer list. The db_path_ is still using, it's mostly for existing test code.

@xcheng-here xcheng-here force-pushed the fix/OTA-3736 branch 2 times, most recently from 78866a1 to 2406696 Compare March 4, 2020 05:06
if (storage_ != nullptr) {
SQLite3Guard db = storage_->dbConnection();
;
if (sqlite3_changes(db.get()) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like sqlite3_changes will always return 0 in this case, as the database connection has just been created: https://www.sqlite.org/c3ref/changes.html.

We have to find another way to track that here, maybe another attribute would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, it's a good point. And I found the "insert/replace" is in one of the constructors of WHandle class, so in this case we could run "delete" anyway in abort() function. And in another constructor which is basically for test code, there is nothing to do with the table in database, so we don't need to run "delete" in this case. Please correct me if I made something wrong.


if (storage_ != nullptr) {
SQLite3Guard db = storage_->dbConnection();
if (sqlite3_changes(db.get()) > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another instance here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I'm so careless. Will modify it now.

@lbonn
Copy link
Contributor

lbonn commented Mar 5, 2020

Looks good to me, even better if we add a simple ReportQueue test that demonstrates that events are non volatile.

@xcheng-here
Copy link
Contributor Author

I think it means we need to restart aktualizr in the test, just after we store several events into database, but I haven't find a way which could do it and will not break the whole test process yet. And if we don't restart aktualizr, I think this change is transparent to caller of ReportQueue interfaces, because the details are hided in "enqueue" and "flush_queue".

@pattivacek
Copy link
Collaborator

I think it means we need to restart aktualizr in the test, just after we store several events into database, but I haven't find a way which could do it and will not break the whole test process yet.

We have various ways of simulating reboots, but I don't think we even need to do that. In this case, can't we test this just be creating a ReportQueue directly, preventing it from sending, destroying it, then recreating it with the same storage database, and then verifying that it sends as expected then?

lbonn
lbonn previously approved these changes Mar 10, 2020
@xcheng-here
Copy link
Contributor Author

Yes, good idea. I added a test case.

ReportQueue report_queue(config, http, sql_storage);
// Wait at most 30 seconds for the messages to get processed.
http->expected_events_received.get_future().wait_for(std::chrono::seconds(20));
EXPECT_EQ(http->events_seen, num_events);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! My only concern is that we should check that there is actually something in the database before restarting the ReportQueue. We aren't testing anything interesting if it turns out that the first ReportQueue was actually able to send out all of the events before it got shut down. Basically I just want to make sure the test is testing what we think it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, because the server url which is used for the first ReportQueue is empty, it could not send any events out. But it's a good idea to check the events number in sql, I added it to the case, and I also checked whether the number was set to 0 after I send them out.

Copy link
Collaborator

@pattivacek pattivacek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally Travis failures are pretty bogus these days, but this error looks real:

[ RUN      ] ReportQueue.StoreEvents
created: /tmp/aktualizr-7d0e-f261-4d59-f14e/48e5-48a8-dir
[2020-03-11 02:03:21.211473] [0x0000000004050580] [info]    Bootstraping DB to version 24
created: /tmp/aktualizr-7d0e-f261-4d59-f14e/141f-0db2-dir
created: /tmp/aktualizr-7d0e-f261-4d59-f14e/8df6-7a6d-dir
../src/libaktualizr/primary/reportqueue_test.cc:152: Failure
Expected equality of these values:
  max_id
    Which is: 10
  count
    Which is: 0
[  FAILED  ] ReportQueue.StoreEvents (4696 ms)

EXPECT_EQ(http->events_seen, num_events);
}

/* Test event store. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks really good now! My last suggestion is to make the comment here more descriptive. Maybe something like "Test persistent storage of unsent events in the database across ReportQueue instantiations."

It's for OTA-3637. Store reports in database when events generated
and delete them after sent them to server.

Signed-off-by: cheng.xiang <ext-cheng.xiang@here.com>
@pattivacek pattivacek merged commit 962343d into master Mar 11, 2020
@pattivacek pattivacek deleted the fix/OTA-3736 branch March 11, 2020 13:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants