Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #806: Reduce count of IPI during work_queue processing. #1072

Merged
merged 5 commits into from
Oct 15, 2018

Conversation

aleksostapenko
Copy link
Contributor

No description provided.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR design seems robust and good, only couple of verifications/additions are required.


set_bit(TFW_QUEUE_B_IPI, &wq->flags);

smp_mb__after_atomic();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, here we set the IPI flag before checking the queue size and we check queue size before the flag in tfw_wq_push() - the synchronization seems safe. We need the barrier to make the queue size and the flag dependant. However, it seems the barrier pairing rule is broken here: strictly speaking, there is no guarantee that the queue operation won't be reordered with the flag testing in tfw_wq_push(). I see that __tfw_wq_push() uses at least atomic set, but it seems we still need an _after_atomic() barrier in tfw_wq_push() now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

if (!ss_close_q_sz())
return;

clear_bit(TFW_QUEUE_B_IPI, &wq->flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy of part of tfw_wq_tasklet() - maybe to make a macro with the queue size function as a parameter or a function accepting the queue size callback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -208,13 +211,14 @@ ss_turnstile_push(long ticket, SsWork *sw, int cpu)
cn->ticket = ticket;
memcpy(&cn->sw, sw, sizeof(*sw));
spin_lock_bh(&cb->lock);
list_add(&cn->list, &cb->head);
list_add_tail(&cn->list, &cb->head);
cb->size++;
if (cb->turn > ticket)
cb->turn = ticket;
spin_unlock_bh(&cb->lock);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we don't need a barrier after spinlock operation, but having a comment here would be good in case if we remove the spinlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still some fog with the barrier...

@@ -77,6 +86,11 @@ tfw_wq_push(TfwRBQueue *q, void *ptr, int cpu, struct irq_work *work,
long ticket = __tfw_wq_push(q, ptr);
if (unlikely(ticket))
return ticket;
/*
* The atomic operation is 'atomic64_cmpxchg()' in
* '__tfw_wq_push()' above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? atomic64_cmpxchg() is called very rarely, only on slow path. In fast path there are just atomic reads and writes: in good case there is write barrier, but in bad case there is no any barriers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast path take place only when the queue is full and tfw_wq_push exits with ticket - before IPI generation. In case of insertion new element into the queue - the atomic64_cmpxchg() must be called.

} TfwRBQueue;

enum {
/* Enable IPI generation. */
TFW_QUEUE_B_IPI = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we don't need _B_ suffix meaning bit here. For other flags we historically used _B_ and _F_ suffixes to distinguish between flags and bits, but now we tend to use common bitmap API instead of 'manual' bit operations. Removing of _B_ suffix will synchronise the code with upcoming patches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants