-
Notifications
You must be signed in to change notification settings - Fork 60
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor issue, but according to our contribution guidelines, the name of the branch should have a very short description of what is being fixed (since external contributors can't see our Jira tickets). In this case, fix/OTA-3736/store-reports
or something would be good.
@@ -32,17 +33,15 @@ void ReportQueue::run() { | |||
|
|||
void ReportQueue::enqueue(std::unique_ptr<ReportEvent> event) { | |||
{ | |||
std::lock_guard<std::mutex> lock(m_); | |||
report_queue_.push(std::move(event)); | |||
//std::lock_guard<std::mutex> lock(m_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the lock really no longer needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it back.
SQLite3Guard db = dbConnection(); | ||
int64_t id = 0; | ||
{ | ||
auto statement = db.prepareStatement("SELECT MAX(id) FROM report_events;"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two statements should be in a transaction.
Or better, it can be done in one step:
db.prepareStatement<std::string>("INSERT INTO report_events(id, json_string) SELECT MAX(id) + 1, ? FROM report_events", json_string);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the changes yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake in the last push. It should be updated now.
bool SQLStorage::loadReportEvents(Json::Value* report_array, int64_t* id_max) { | ||
SQLite3Guard db = dbConnection(); | ||
std::vector<std::pair<Uptane::EcuSerial, int64_t>> ecu_cnt; | ||
// keep the same order as in ecu_serials (start with primary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this comment was copy-pasted by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed.
report_queue_.pop(); | ||
} | ||
int64_t max_id = 0; | ||
storage->loadReportEvents(&report_array, &max_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use load everything from storage here, you do not need to keep report_array
as a class member because the state is in the database.
It looks like it can be a local variable in this function and calls to report_array.clear()
would change to storage->deleteReportEvents(...)
.
IMO, one of the important thing here is to try to minimize double-sent events. It looks like the backend should be able to discard duplicates by looking at the random uuid we send, though we should probably check with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the report_array. But I'm still thinking about how to prevent double-sent. Do we have a mechanism to check it with backend ? ( And does a check better than just send it ? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking before sending sounds complicated and not worth the overhead anyway. I'm trying to find out with the backend team if they can live with occasional resends.
84aa27e
to
1328a6f
Compare
Codecov Report
@@ Coverage Diff @@
## master #1559 +/- ##
==========================================
+ Coverage 82.36% 82.49% +0.13%
==========================================
Files 189 189
Lines 11872 11929 +57
==========================================
+ Hits 9778 9841 +63
+ Misses 2094 2088 -6
Continue to review full report at Codecov.
|
@patrickvacek I didn't modify the branch name yet because I'm not sure whether it will make the comments lost. |
No worries, looks like it's not possible to change it once the PR has already been made without making a new PR, which isn't worth it. Anyway, you put a description in the first comment of the PR, so that should help. |
1328a6f
to
6479c14
Compare
@@ -12,6 +12,8 @@ | |||
#include "sql_utils.h" | |||
#include "utilities/utils.h" | |||
|
|||
std::mutex SQLite3Guard::m_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. This treats this mutex like a global instead of as a member variable. Is that really what we have to do? If so, then can you add a comment that explains why it has to be that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mutex is a member of SQLite3Guard, it will be destroyed with the SQLite3Guard variable. And currently the way of our code using SQLite3Guard is that define it in the stack of functions which will access database, so it will be destroyed when the function returned. IMO, it means every function using a new mutex, so it will not really lock. Please correct me if I made something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently the way of our code using SQLite3Guard is that define it in the stack of functions which will access database, so it will be destroyed when the function returned. IMO, it means every function using a new mutex, so it will not really lock.
Isn't that what we want? We should only lock the database when we are writing to it, and then immediately release. Or do we really want to hold on to the mutex for the duration of the Storage object? I don't really see what the point of that is if the Storage is basically a singleton.
Also, I just noticed that the extern std::mutex sql_mutex
declared in sql_utils.h appears to be unused and an obsolete relic of previous rounds of locking logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just realized what you meant now. Yes, the mutex does have to live a higher level to be meaningful. But wouldn't a member variable of SqlStorage still be preferable to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a member of SqlStorage, I think I will have two options to implement the lock. The first one is that add a lock_guard to every function of SqlStorage, which will lock/unlock this mutex, I don't think it's more graceful than using SQLite3Guard, which is already used in every function. The second option is let the mutex be member of SqlStorage, but still use SQLite3Guard to lock it. I think it's also not graceful. So I choose to use a static member of SQLite3Guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's also possible to keep a pointer of storage in that WHandle class, I think it will be better than keep a pointer of mutex, I will try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem is that now we could init SQLite3Guard with the path of database, that means we didn't associate a Storage object to it. If we want the mutex be a member of Storage, maybe we'd better avoid init it in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping a pointer to storage in WHandle seems fine.
And you want to pass a mutex in SQLite3Guard
's constructor, is that right?
As long as the other calls to the constructors are in existing tests which don't try to access the db in parallel with SQLStorage, that's maybe fine if the lock is made optional.
We can discuss again on Friday to clarify the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I want to pass a mutex into SQLite3Guard's constructor, and use a shared_ptr of mutex to keep it. But when I start coding, I found I still feel not very good for this design, it still make a pointer to a member variable of another class.
And yes, I also think make the lock be optional will make the existing test code work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks it's not a problem to use a pointer of mutex SQLite3Guard, because we usually only use SQLite3Guard as local variable. And I haven't found another way which is graceful enough and will not need to restruct all of the storage code. So I think I will still use this solution.
7e479ef
to
93824d7
Compare
@@ -1700,7 +1760,8 @@ boost::optional<std::pair<uintmax_t, std::string>> SQLStorage::checkTargetFile(c | |||
class SQLTargetWHandle : public StorageTargetWHandle { | |||
public: | |||
SQLTargetWHandle(const SQLStorage& storage, Uptane::Target target) | |||
: db_(storage.dbPath()), target_(std::move(target)) { | |||
: db_path_(storage.dbPath()), target_(std::move(target)) { | |||
storage_ = &storage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db_path
looks unused now, can it be removed? And I think storage_
can be set in the initializer list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved storage_ to initializer list. The db_path_ is still using, it's mostly for existing test code.
78866a1
to
2406696
Compare
if (storage_ != nullptr) { | ||
SQLite3Guard db = storage_->dbConnection(); | ||
; | ||
if (sqlite3_changes(db.get()) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like sqlite3_changes
will always return 0 in this case, as the database connection has just been created: https://www.sqlite.org/c3ref/changes.html.
We have to find another way to track that here, maybe another attribute would be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, it's a good point. And I found the "insert/replace" is in one of the constructors of WHandle class, so in this case we could run "delete" anyway in abort() function. And in another constructor which is basically for test code, there is nothing to do with the table in database, so we don't need to run "delete" in this case. Please correct me if I made something wrong.
2406696
to
2ebf315
Compare
|
||
if (storage_ != nullptr) { | ||
SQLite3Guard db = storage_->dbConnection(); | ||
if (sqlite3_changes(db.get()) > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another instance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I'm so careless. Will modify it now.
2ebf315
to
ddb1691
Compare
Looks good to me, even better if we add a simple ReportQueue test that demonstrates that events are non volatile. |
I think it means we need to restart aktualizr in the test, just after we store several events into database, but I haven't find a way which could do it and will not break the whole test process yet. And if we don't restart aktualizr, I think this change is transparent to caller of ReportQueue interfaces, because the details are hided in "enqueue" and "flush_queue". |
We have various ways of simulating reboots, but I don't think we even need to do that. In this case, can't we test this just be creating a ReportQueue directly, preventing it from sending, destroying it, then recreating it with the same storage database, and then verifying that it sends as expected then? |
ddb1691
to
37f2da7
Compare
Yes, good idea. I added a test case. |
37f2da7
to
8e1e779
Compare
ReportQueue report_queue(config, http, sql_storage); | ||
// Wait at most 30 seconds for the messages to get processed. | ||
http->expected_events_received.get_future().wait_for(std::chrono::seconds(20)); | ||
EXPECT_EQ(http->events_seen, num_events); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! My only concern is that we should check that there is actually something in the database before restarting the ReportQueue
. We aren't testing anything interesting if it turns out that the first ReportQueue
was actually able to send out all of the events before it got shut down. Basically I just want to make sure the test is testing what we think it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because the server url which is used for the first ReportQueue is empty, it could not send any events out. But it's a good idea to check the events number in sql, I added it to the case, and I also checked whether the number was set to 0 after I send them out.
8e1e779
to
6c93364
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally Travis failures are pretty bogus these days, but this error looks real:
[ RUN ] ReportQueue.StoreEvents
created: /tmp/aktualizr-7d0e-f261-4d59-f14e/48e5-48a8-dir
[2020-03-11 02:03:21.211473] [0x0000000004050580] [info] Bootstraping DB to version 24
created: /tmp/aktualizr-7d0e-f261-4d59-f14e/141f-0db2-dir
created: /tmp/aktualizr-7d0e-f261-4d59-f14e/8df6-7a6d-dir
../src/libaktualizr/primary/reportqueue_test.cc:152: Failure
Expected equality of these values:
max_id
Which is: 10
count
Which is: 0
[ FAILED ] ReportQueue.StoreEvents (4696 ms)
EXPECT_EQ(http->events_seen, num_events); | ||
} | ||
|
||
/* Test event store. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks really good now! My last suggestion is to make the comment here more descriptive. Maybe something like "Test persistent storage of unsent events in the database across ReportQueue instantiations."
It's for OTA-3637. Store reports in database when events generated and delete them after sent them to server. Signed-off-by: cheng.xiang <ext-cheng.xiang@here.com>
6c93364
to
a20b46a
Compare
It's for OTA-3637. Store reports in database when events generated
and delete them after sent them to server.
Signed-off-by: cheng.xiang ext-cheng.xiang@here.com