Implement caching. #27

ericonr · 2021-07-16T03:25:20Z

Context is given in each individual commit.

This way, the error message can actually be seen.

These functions will just return notfound for now, not doing anything at all with the values they receive. They will allows us to plug them into socket_handle.c, then we can implement the cache for each struct individually.

In order to implement cache cleanly, it's better to have return_result aware of it than simply add it as an extra module. This way, we can avoid allocations and deal with additions to the cache in a simple way. To implement cache querying, we have to change the module loop structure, allowing for one initial run that uses the cache before the subsequent runs that use modules listed in passwd_mods or group_mods. Since we are changing the loop structure already, we take the opportunity to move as many variables as possible into loop scope. We also add some booleans to handle logic that was using some shortcuts. In this commit, we also change the memory allocation strategy, both in avoiding sysconf calls (they are now called once in init_socket_handling) and avoiding allocating buf for the initgroups case, which doesn't need buf and allocates its own array in nss_getkey. Since we now support adding entries to caches, we need to delay some free() calls, so the FREE_ALLOC macro is added to not have to duplicate the logic of choosing which buffer to free. We also change the error handling when writing values back to clients: instead of immediately returning if writing fails, we still try to record that value in the cache, since a client failure doesn't mean the entry is now invalid. This also fixes a memory leak if the value was invalid (write_* function returns -2) and the action for unavailable status wasn't to return. This commit also fixes a bug where a RETURN action would still run the fallback block at the end of the return_result function, which would write to a now closed (on the client side) file descriptor - this could be checked by looking at strace output, where a write syscall got back EPIPE as error. The program masks SIGPIPE, so this didn't lead to program death, but it was an unnecessary syscall anyway.

Now that the loop logic has changed and we have added more eager returns, we don't need a conditional gating this anymore, and can simply call write_* for the appropriate request.

the-maldridge · 2021-07-28T20:50:18Z

include/cache_add.h

+	res = &CACHE.res[i];
+
+	/* we will simply overwrite the cache entry's ARGUMENT member */
+	memcpy(&res->ARGUMENT, ARGUMENT, sizeof(*ARGUMENT));


This looks like it can be out of bounds, how do you know that the cache has appropriate room for this copy to succeed? Comment applies for all similar uses of memcopy.

We use pw_name and pw_uid fields as keys to query the cache. Since accessing the cache for getpwnam and getpwuid is very similar, and will also be very similar to getgrnam and getgrgid, we create a cache_query.h header with all the logic. Since addition to the cache will be similar to adding to struct group caching, we also create a header for that. Using an inline struct passwd in cache entries instead of dynamically allocating one can be more wasteful when the array isn't full, but it also avoids memory fragmentation due to too many allocations and simplifies memory handling in general. This commit also adds policy for the cache: how long an entry is valid for, the maximum amount of entries allowed and the initial amount of entries. The invalidation logic works as follows: the add function will go through the cache to see if some other thread didn't end up adding the entry already or if an outdated version of it exists. If there is an entry with our key, we check if it's valid; if yes, we simply return, if no, we will overwrite that entry. In case there isn't a pre-existing entry, we will either overwrite the most outdated invalid entry (this avoids growing the cache unnecessarily, and works almost like a least-recently-used cache), or, if such an entry doesn't exist, we will increase the cache size. We also add a static variable to control whether caching is enabled, and macros to use it automatically. If it's disabled, query functions can immediately return UNAVAILABLE status, and add functions can immediately free the buffer passed to them (the contract of the add functions is that they take ownership of the buffer passed to them).

Since the code is basically identical to struct passwd handling, this commit just makes use of the cache_add.h and cache_query.h headers, and ends up rather short.

The logic is similar to struct passwd caching, but re-using the headers isn't possible, so this commit has to reimplement most of it. The query function could have avoided dynamic allocations by passing its own buffer, but that would have required a lot of additional logic in return_result. We re-use struct initgroups_res from socket_handle.c, so it's moved to modules.h as well.

The initial design was implemented based on the idea that access to the cache's underlying storage would avoid allocations and generally speaking lead to a cleaner design. Unfortunately, this also opened up the program to use-after-free issues. A sequence of events as explained below shows a possible scenario (Tx are threads): 1. T1 enters cache, finds valid entry, return it (returns buffer from cache) 2. T2 enters cache, finds the entry is invalid now, doesn't return it; finds new entry from nss module 3. T2 tries to add new entry to cache, will release underlying memory (free(buffer)) 4. T1 finally gets to writing out, for _whatever reason_, and now buffer is UAF There are two possible ways to deal with this: - add a mutex to struct, lock while it's being used; means that the wrlock will be held for a potentially very *long* time, which means holding up the rdlock for consumers. Therefore, not an option - actually allocate memory when reading from cache; not ideal but avoids all of the lifetime issues We went with the latter, which required simplifying return_result by splitting out cache query. Especially now that we had to change how cache results are dealt with, it's better to duplicate some code (specifically a tiny bit of logic from nss_getkey into nss_getkey_cache and the write_* callers) and leave the logic clearer. It also avoids ugly type and pointer puning. The resulting nss_getkey_cache is much more direct, however; the only logic that's really duplicated is how arguments are interpreted before being passed to the cache query functions.

Obtain now before entering the loop and save on comparisons by storing the return value of compare_timestamps when appropriate.

This can speed up cache access considerably when accessing elements that are at the end of it. From local benchmarks, scanning a cache with 10k entries doing only integer comparison is almost free, compared to the communication overhead, but repeated strcmp calls can slow things down some. Using a hash allows us to greatly decrease the amount of strcmp calls. Hash function recommendation taken from [1]. [1] http://www.orthogonal.com.au/computers/hashstrings/

The -C option now requires an argument specifying the time until a cache entry becomes invalid. This way, admnistrators are forced to determine and specify their own. The new -n option specifies the maximum amount of cache entries that can be used. Since 0 is an invalid value for either option, we can use atol() to parse optarg and always error out on a 0 return. We also lower the initial entries, to minimize the memory consumption for systems with few users, and also lower the maximum amount of entries to 1k.

ericonr · 2021-08-18T12:38:25Z

#28 has been merged and was a superset of this, closing.

ericonr added 4 commits July 16, 2021 00:22

Move socket init code to before daemonization.

ec0d143

This way, the error message can actually be seen.

Initial dummy cache implementation.

acbbc1e

These functions will just return notfound for now, not doing anything at all with the values they receive. They will allows us to plug them into socket_handle.c, then we can implement the cache for each struct individually.

Simplify fallback in return_result.

7662b15

Now that the loop logic has changed and we have added more eager returns, we don't need a conditional gating this anymore, and can simply call write_* for the appropriate request.

ericonr mentioned this pull request Jul 16, 2021

Implement pre-threaded server #28

Closed

the-maldridge reviewed Jul 28, 2021

View reviewed changes

ericonr added 4 commits July 31, 2021 00:46

Implement struct group caching.

8e4cf0c

Since the code is basically identical to struct passwd handling, this commit just makes use of the cache_add.h and cache_query.h headers, and ends up rather short.

ericonr force-pushed the only-cache branch from 4fcf121 to 2fc6e38 Compare July 31, 2021 03:48

ericonr added 4 commits July 31, 2021 00:53

Optimize timestamp verification slightly.

9b3e7fe

Obtain now before entering the loop and save on comparisons by storing the return value of compare_timestamps when appropriate.

Comment cache_add and cache_query headers.

41c2505

ericonr force-pushed the only-cache branch from 6860159 to d690a42 Compare July 31, 2021 04:25

ericonr closed this Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement caching. #27

Implement caching. #27

ericonr commented Jul 16, 2021

the-maldridge Jul 28, 2021

ericonr commented Aug 18, 2021

Implement caching. #27

Implement caching. #27

Conversation

ericonr commented Jul 16, 2021

the-maldridge Jul 28, 2021

Choose a reason for hiding this comment

ericonr commented Aug 18, 2021