-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement caching. #27
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This way, the error message can actually be seen.
These functions will just return notfound for now, not doing anything at all with the values they receive. They will allows us to plug them into socket_handle.c, then we can implement the cache for each struct individually.
In order to implement cache cleanly, it's better to have return_result aware of it than simply add it as an extra module. This way, we can avoid allocations and deal with additions to the cache in a simple way. To implement cache querying, we have to change the module loop structure, allowing for one initial run that uses the cache before the subsequent runs that use modules listed in passwd_mods or group_mods. Since we are changing the loop structure already, we take the opportunity to move as many variables as possible into loop scope. We also add some booleans to handle logic that was using some shortcuts. In this commit, we also change the memory allocation strategy, both in avoiding sysconf calls (they are now called once in init_socket_handling) and avoiding allocating buf for the initgroups case, which doesn't need buf and allocates its own array in nss_getkey. Since we now support adding entries to caches, we need to delay some free() calls, so the FREE_ALLOC macro is added to not have to duplicate the logic of choosing which buffer to free. We also change the error handling when writing values back to clients: instead of immediately returning if writing fails, we still try to record that value in the cache, since a client failure doesn't mean the entry is now invalid. This also fixes a memory leak if the value was invalid (write_* function returns -2) and the action for unavailable status wasn't to return. This commit also fixes a bug where a RETURN action would still run the fallback block at the end of the return_result function, which would write to a now closed (on the client side) file descriptor - this could be checked by looking at strace output, where a write syscall got back EPIPE as error. The program masks SIGPIPE, so this didn't lead to program death, but it was an unnecessary syscall anyway.
Now that the loop logic has changed and we have added more eager returns, we don't need a conditional gating this anymore, and can simply call write_* for the appropriate request.
res = &CACHE.res[i]; | ||
|
||
/* we will simply overwrite the cache entry's ARGUMENT member */ | ||
memcpy(&res->ARGUMENT, ARGUMENT, sizeof(*ARGUMENT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it can be out of bounds, how do you know that the cache has appropriate room for this copy to succeed? Comment applies for all similar uses of memcopy.
We use pw_name and pw_uid fields as keys to query the cache. Since accessing the cache for getpwnam and getpwuid is very similar, and will also be very similar to getgrnam and getgrgid, we create a cache_query.h header with all the logic. Since addition to the cache will be similar to adding to struct group caching, we also create a header for that. Using an inline struct passwd in cache entries instead of dynamically allocating one can be more wasteful when the array isn't full, but it also avoids memory fragmentation due to too many allocations and simplifies memory handling in general. This commit also adds policy for the cache: how long an entry is valid for, the maximum amount of entries allowed and the initial amount of entries. The invalidation logic works as follows: the add function will go through the cache to see if some other thread didn't end up adding the entry already or if an outdated version of it exists. If there is an entry with our key, we check if it's valid; if yes, we simply return, if no, we will overwrite that entry. In case there isn't a pre-existing entry, we will either overwrite the most outdated invalid entry (this avoids growing the cache unnecessarily, and works almost like a least-recently-used cache), or, if such an entry doesn't exist, we will increase the cache size. We also add a static variable to control whether caching is enabled, and macros to use it automatically. If it's disabled, query functions can immediately return UNAVAILABLE status, and add functions can immediately free the buffer passed to them (the contract of the add functions is that they take ownership of the buffer passed to them).
Since the code is basically identical to struct passwd handling, this commit just makes use of the cache_add.h and cache_query.h headers, and ends up rather short.
The logic is similar to struct passwd caching, but re-using the headers isn't possible, so this commit has to reimplement most of it. The query function could have avoided dynamic allocations by passing its own buffer, but that would have required a lot of additional logic in return_result. We re-use struct initgroups_res from socket_handle.c, so it's moved to modules.h as well.
The initial design was implemented based on the idea that access to the cache's underlying storage would avoid allocations and generally speaking lead to a cleaner design. Unfortunately, this also opened up the program to use-after-free issues. A sequence of events as explained below shows a possible scenario (Tx are threads): 1. T1 enters cache, finds valid entry, return it (returns buffer from cache) 2. T2 enters cache, finds the entry is invalid now, doesn't return it; finds new entry from nss module 3. T2 tries to add new entry to cache, will release underlying memory (free(buffer)) 4. T1 finally gets to writing out, for _whatever reason_, and now buffer is UAF There are two possible ways to deal with this: - add a mutex to struct, lock while it's being used; means that the wrlock will be held for a potentially very *long* time, which means holding up the rdlock for consumers. Therefore, not an option - actually allocate memory when reading from cache; not ideal but avoids all of the lifetime issues We went with the latter, which required simplifying return_result by splitting out cache query. Especially now that we had to change how cache results are dealt with, it's better to duplicate some code (specifically a tiny bit of logic from nss_getkey into nss_getkey_cache and the write_* callers) and leave the logic clearer. It also avoids ugly type and pointer puning. The resulting nss_getkey_cache is much more direct, however; the only logic that's really duplicated is how arguments are interpreted before being passed to the cache query functions.
Obtain now before entering the loop and save on comparisons by storing the return value of compare_timestamps when appropriate.
This can speed up cache access considerably when accessing elements that are at the end of it. From local benchmarks, scanning a cache with 10k entries doing only integer comparison is almost free, compared to the communication overhead, but repeated strcmp calls can slow things down some. Using a hash allows us to greatly decrease the amount of strcmp calls. Hash function recommendation taken from [1]. [1] http://www.orthogonal.com.au/computers/hashstrings/
The -C option now requires an argument specifying the time until a cache entry becomes invalid. This way, admnistrators are forced to determine and specify their own. The new -n option specifies the maximum amount of cache entries that can be used. Since 0 is an invalid value for either option, we can use atol() to parse optarg and always error out on a 0 return. We also lower the initial entries, to minimize the memory consumption for systems with few users, and also lower the maximum amount of entries to 1k.
#28 has been merged and was a superset of this, closing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context is given in each individual commit.