Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide APIs for enabling cgroup subsystems #1424

Merged
merged 2 commits into from
Nov 20, 2017

Conversation

ashu-mehra
Copy link
Contributor

Instead of providing APIs that enable cgroup limits for all
subsystems, a new set of APIs is added which allows to query
which subsystems are available and enable subsystems individually.
This would allow runtimes to enable cgroup subsystems based on their
requirements.
Note that there is another subtle change in the design of the
port library APIs related to cgroup system.
Enabling a cgroup subsystem in portlibrary (by calling
omrsysinfo_cgroup_enable_subsystems) would make the port library APIs
to use cgroup limits internally.
Any port library API that directly returns cgroup limits,
eg omrsysinfo_cgroup_get_memlimit(), does not check whether the
corresponding subsystem (eg memory) is enabled or not.
Therefore, when using omrsysinfo_cgroup_* APIs,
the caller doesn't need to explicitly enable the corresponding
subsystem in the port library.

Consider the case of omrsysinfo_get_physical_memory() which returns
physical memory of the system.
If the system is running in a cgroup, and the cgroup's "memory"
subsystem is enabled, then omrsysinfo_get_physical_memory()
would internally call omrsysinfo_cgroup_get_memlimit(), and
return memory limit imposed by the cgroup.
However, if the user wants to just get the memory limit imposed by the
cgroup, it can directly call omrsysinfo_cgroup_get_memlimit(),
without enabling the "memory" subsystem.

Issue: #1281
Signed-off-by: Ashutosh Mehra asmehra1@in.ibm.com

@ashu-mehra
Copy link
Contributor Author

@charliegracie @DanHeidinga can you please review.

@ashu-mehra
Copy link
Contributor Author

@charliegracie @DanHeidinga any estimate when this could be reviewed?

@charliegracie
Copy link
Contributor

I am not sure why the mutex is requried? Should the mutex be a pthread mutex or a omrthread mutex? Should the mutex be a global variable or a field in the port lirbary struct?

Waiting for a thumbs up from @DanHeidinga as well as the answers to my questions above. Thanks

@ashu-mehra
Copy link
Contributor Author

I am not sure why the mutex is requried?

cgroupEntryListMutex is protecting PPG_cgroupEntryList which is initialized in omrsysinfo_cgroup_system_is_available(). That API can be called by multiple threads, so the need to protect PPG_cgroupEntryList.

Should the mutex be a pthread mutex or a omrthread mutex? Should the mutex be a global variable or a field in the port lirbary struct?

Since the mutex is only required for protecting PPG_cgroupEntryList which is internal to the port library and the mutex is required only in omrsysinfo.c, I used pthread mutex and didn't put it in a port library struct.

@charliegracie
Copy link
Contributor

Re: the mutex. Is that what other places in the port library do?

@ashu-mehra
Copy link
Contributor Author

I see omrsignal.c using a static omrthread_monitor_t. I think I should also change cgroupEntryListMutex to be of type omrthread_monitor_t.

@ashu-mehra ashu-mehra force-pushed the cgroup_resource_system branch 2 times, most recently from e946c21 to 52052b1 Compare August 13, 2017 15:48
@ashu-mehra
Copy link
Contributor Author

I have updated the change set to use omrthread_monitor_t instead of pthread_mutex_t.
@charliegracie @DanHeidinga Please review.

/** see @ref omrsysinfo.c::omrsysinfo_cgroup_enable_limits "omrsysinfo_cgroup_enable_limits"*/
int32_t ( *sysinfo_cgroup_enable_limits)(struct OMRPortLibrary *portLibrary);
/** see @ref omrsysinfo.c::omrsysinfo_cgroup_system_is_available "omrsysinfo_cgroup_system_is_available"*/
int32_t ( *sysinfo_cgroup_system_is_available)(struct OMRPortLibrary *portLibrary) ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be sysinfo_cgroup_is_system_available()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this sounds better.

{
return OMRPORT_ERROR_SYSINFO_CGROUP_UNSUPPORTED_PLATFORM;
return OMRPORT_ERROR_SYSINFO_CGROUP_UNSUPPORTED_PLATFORM;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: double ; at end of line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the name and the comment, I'm not confident I know what this method would do. Is it testing if the runtime is running in a cgroup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
uint64_t
omrsysinfo_cgroup_enable_subsystems(struct OMRPortLibrary *portLibrary, uint64_t requestedSubsystems)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are errors handled / communicated to the caller by this function?

What if the user tries to enable a subsystem that isn't enabled in the system?

Copy link
Contributor Author

@ashu-mehra ashu-mehra Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error can occur during omrsysinfo_cgroup_is_system_available in which case no subsystem is enabled, but the error is not communicated back to the user.
I am expecting the APIs would be used in following manner:

if (0 == omrsysinfo_cgroup_is_system_available()) {
    availableSubsystems = omrsysinfo_cgroup_get_available_subsystems();
    if (requiredSubsystem & availableSubsystems) {
        omrsysinfo_cgroup_enable_subsystems(requiredSubsystems);
    }
}

What if the user tries to enable a subsystem that isn't enabled in the system?

No error is returned. If the APIs are used as above, this situation shouldn't happen.
And the API actually returns the subsystems that are enabled. So the user can check which subsystems got enabled on this call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a user that wanted to validate that the subsystems were actually enabled would need to do the following:

if (0 == omrsysinfo_cgroup_is_system_available()) {
    availableSubsystems = omrsysinfo_cgroup_get_available_subsystems();
    if (requiredSubsystem & availableSubsystems) {
        uint64_t enabledSubsystems = omrsysinfo_cgroup_enable_subsystems(requiredSubsystems);
        if ((enabledSubsystems & requiredSybsystems) != enabledSubsystems) {
            /* do error handling */
        }
    }
}

Isn't exactly user friendly but is something I can live with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is no need to call omrsysinfo_cgroup_get_available_subsystems() here:

if (0 == omrsysinfo_cgroup_is_system_available()) {
    uint64_t enabledSubsystems = omrsysinfo_cgroup_enable_subsystems(requiredSubsystems);
    if ((enabledSubsystems & requiredSybsystems) != enabledSubsystems) {
            /* do error handling */
    }
}

freeCgroupEntries(portLibrary, PPG_cgroupEntryList);
omrthread_monitor_exit(cgroupEntryListMonitor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monitor needs to protect the nulling of the PPG_cgroupEntryList = NULL; as well as the call to freeCgroupEntries()

Trc_PRT_Assert_ShouldNeverHappen();
}
/* unreachable code */
Trc_PRT_Assert_ShouldNeverHappen();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should issue a compiler warning as technically trace asserts can be disabled, resulting in falling off the end of a non-void function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a return here and then just the assert in the default case should be sufficient,

* subsystems that are available
*/
uint64_t
getAvailableSubsystems(struct OMRPortLibrary *portLibrary, OMRCgroupEntry *cgEntryList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be static so it's not called elsewhere?

@@ -120,7 +121,8 @@ typedef struct OMRPortPlatformGlobals {
#endif

#if defined(LINUX)
#define PPG_cgroupLimitsEnabled (portLibrary->portGlobals->platformGlobals.cgroupLimitsEnabled)
#define PPG_cgroupSubsystemsAvailable (portLibrary->portGlobals->platformGlobals.cgroupSubsystemsAvailable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three port library globals in play here and it looks like the two new ones are only valid when PPG_cgroupEntryList is non-null?

Can you add a comment here (and in the commit) that describes the relationship between the three port globals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition PPG_cgroupEntryList being non-null is used only to check if PPG_cgroupSubsystemsAvailable and PPG_cgroupSubsystemsAvailable have already been set. As such there is no relation between the three.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that state a relationship? Neither PPG_cgroupSubsystemsAvailable or PPG_cgroupSubsystemsAvailable are valid if PPG_cgroupEntryList is null.

The code should state that invariant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the above invariant as a comment in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@DanHeidinga
Copy link
Contributor

I haven't reviewed the test changes yet. I'll look at those once we're in agreement on the rest of the code changes.

@ashu-mehra ashu-mehra force-pushed the cgroup_resource_system branch from 52052b1 to 36753bb Compare August 14, 2017 20:21
@ashu-mehra
Copy link
Contributor Author

Updated the change set as per the comments.

@ashu-mehra
Copy link
Contributor Author

@DanHeidinga any more review comments?

outputErrorMessage(PORTTEST_ERROR_ARGS, "omrsysinfo_cgroup_get_memlimit returned %d, expected %d\n", rc, OMRPORT_ERROR_SYSINFO_CGROUP_LIMITS_DISABLED);
}
#else
#if !defined(LINUX)
if (OMRPORT_ERROR_SYSINFO_CGROUP_UNSUPPORTED_PLATFORM != rc) {
outputErrorMessage(PORTTEST_ERROR_ARGS, "omrsysinfo_cgroup_get_memlimit returned %d, exptected %d on platform that does not support cgroups\n", rc, OMRPORT_ERROR_SYSINFO_CGROUP_UNSUPPORTED_PLATFORM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exptected -> expected

@@ -2144,16 +2136,8 @@ TEST(PortSysinfoTest, sysinfo_cgroup_get_memlimit)
outputErrorMessage(PORTTEST_ERROR_ARGS, "Expected omrsysinfo_cgroup_get_memlimit and omrsysinfo_get_physical_memory to return same value, but omrsysinfo_cgroup_get_memlimit returned %ld and omrsysinfo_get_physical_memory returned %ld\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two %ld format specifiers in the outputErrorMessage call but no data is passed to the call.

/** see @ref omrsysinfo.c::omrsysinfo_cgroup_get_available_subsystems "omrsysinfo_cgroup_get_available_subsystems"*/
uint64_t ( *sysinfo_cgroup_get_available_subsystems)(struct OMRPortLibrary *portLibrary);
/** see @ref omrsysinfo.c::omrsysinfo_cgroup_is_subsystem_available "omrsysinfo_cgroup_is_subsystem_available"*/
BOOLEAN ( *sysinfo_cgroup_is_subsystem_available)(struct OMRPortLibrary *portLibrary, uint64_t subsystem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32_t ( *sysinfo_cgroup_is_system_available)(struct OMRPortLibrary *portLibrary) ;
BOOLEAN ( *sysinfo_cgroup_is_subsystem_available)(struct OMRPortLibrary *portLibrary, uint64_t subsystem);

Why does one return an int32_t and the other a BOOLEAN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sysinfo_cgroup_is_system_available can return negative error code to indicate the reason for unavailability of cgroup system. sysinfo_cgroup_is_subsystem_available just checks the subsystem flags which are set.

/** see @ref omrsysinfo.c::omrsysinfo_cgroup_enable_limits "omrsysinfo_cgroup_enable_limits"*/
int32_t ( *sysinfo_cgroup_enable_limits)(struct OMRPortLibrary *portLibrary);
/** see @ref omrsysinfo.c::omrsysinfo_cgroup_is_system_available "omrsysinfo_cgroup_is_system_available"*/
int32_t ( *sysinfo_cgroup_is_system_available)(struct OMRPortLibrary *portLibrary) ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: extra space between ) & ;.

@@ -808,42 +808,74 @@ omrsysinfo_os_kernel_info(struct OMRPortLibrary *portLibrary, struct OMROSKernel
}

/**
* Checks if the port library can use cgroup limits
* Returns TRUE if the platform supports cgroup system and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says Returns TRUE and then the @return 0 if the the cgroup system is available which is the opposite. Which is correct?

@@ -3459,38 +3542,68 @@ omrsysinfo_cgroup_is_limits_supported(struct OMRPortLibrary *portLibrary)
goto _end;
}

omrthread_monitor_enter(cgroupEntryListMonitor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this monitor actually protecting? Two threads that got past the if (NULL == PPG_cgroupEntryList) { check would BOTH execute readCgroupFile(), just one after the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would end up creating two lists of OMRCgroupEntry nodes and only one would be assigned to PPG_cgroupEntryList. Other one would be leaked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check PPG_cgroupEntryList under the mutex to determine whether it needs to call readCgroupFile() so that it doesn't leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (0 != rc) {
goto _end;
}
PPG_cgroupSubsystemsAvailable = getAvailableSubsystems(portLibrary, PPG_cgroupEntryList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once readCgroupFile() has set PPG_cgroupEntryList to non-null, the PPG_cgroupSubsystemsAvailable global isn't read and this function will always return 0.

It seems strange that this function calls getAvailableSubsystems. I would have (maybe naively?) expected that to be called by omrsysinfo_cgroup_get_available_subsystems if the PPG_cgroupSubsystemsAvailable global was null...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPG_cgroupSubsystemsAvailable cannot be checked for null; its not a pointer.
omrsysinfo_cgroup_is_system_available calls getAvailableSubsystems to ensure PPG_cgroupSubsystemsAvailable is set properly at the time when PPG_cgroupEntryList is created.
If getAvailableSubsystems is to be called by omrsysinfo_cgroup_get_available_subsystems, then it cannot rely on PPG_cgroupEntryList being null to call it, as PPG_cgroupEntryList may have been set previously by a call to omrsysinfo_cgroup_is_system_available.

if (NULL == PPG_cgroupEntryList) {
int32_t rc = portLibrary->sysinfo_cgroup_is_system_available(portLibrary);
if (0 != rc) {
PPG_cgroupSubsystemsAvailable = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the way this global is sometimes set here, sometimes set in sysinfo_cgroup_is_system_available. It needs to be set in a single place (single point of initialization)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to omrsysinfo_cgroup_is_system_available.

@@ -120,7 +121,8 @@ typedef struct OMRPortPlatformGlobals {
#endif

#if defined(LINUX)
#define PPG_cgroupLimitsEnabled (portLibrary->portGlobals->platformGlobals.cgroupLimitsEnabled)
#define PPG_cgroupSubsystemsAvailable (portLibrary->portGlobals->platformGlobals.cgroupSubsystemsAvailable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that state a relationship? Neither PPG_cgroupSubsystemsAvailable or PPG_cgroupSubsystemsAvailable are valid if PPG_cgroupEntryList is null.

The code should state that invariant.

{
return OMRPORT_ERROR_SYSINFO_CGROUP_UNSUPPORTED_PLATFORM;
return OMRPORT_ERROR_SYSINFO_CGROUP_UNSUPPORTED_PLATFORM;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: double ; on this line

@ashu-mehra
Copy link
Contributor Author

@DanHeidinga, there are few things pointed out by @dinogun on need for this PR and design of cgroup APIs.

Main point is -if we update existing portlib APIs to use cgroup limits transparently, then it would change the meaning of the API.
For example, omrsysinfo_get_physical_memory() is expected to return total physical memory in the system. So if we update it to return cgroup memory limit then it would break the API. Users of this API may really want the 'total physical memory in the system' and not the 'total physical memory available to the process'. Moreover, there wouldn't be any way left for the user to get 'total physical memory in the system' if the API is changed to return cgroup memory transparently.

@dinogun, can you please add your thoughts as well.

@DanHeidinga
Copy link
Contributor

The rationale behind having existing apis return the cgroup limits is that it makes it way way easier for users of the port library to be docker aware. If the existing apis aren't updated to use the cgroup limits, then every caller will need to be updated to check if cgroups are enabled, get the cgroup limit otherwise get the original limit.

That's a lot of code for every user to write at every callsite... and its easy to get wrong.

The design here is to have the runtime opt-into the cgroup limits deliberately which mitigates the concerns about "change the meaning of the API".

Can you point to runtimes using this api that would be broken by this change?

@ashu-mehra
Copy link
Contributor Author

ashu-mehra commented Aug 28, 2017

@DanHeidinga I understand your point.

One example where modifying existing API may have undesired effect is J9's OperatingSystemMXBean.getTotalPhysicalMemory() which uses sysinfo_get_physical_memory(). Javadoc for this API states:
Returns the total available physical memory on the system in bytes.
Since this is externally exposed API, we cannot be sure for what purpose this API would be used by the users. If the user is actually relying on this API to get physical memory in the system (eg some load balancer may be using it to get total physical memory in the system based on which it may balance the load), then updating sysinfo_get_physical_memory() to return container limit would cause the user to malfunction.

@DanHeidinga
Copy link
Contributor

The "saving grace" for APIs like the OSMXBean is to have the user opt into enabling cgroup limits system wide through some mechanism - in code, command line options, etc.

@ashu-mehra
Copy link
Contributor Author

have the user opt into enabling cgroup limits system wide through some mechanism - in code, command line options, etc.

@DanHeidinga I don't think I got this, can you please explain.

@DanHeidinga
Copy link
Contributor

I don't have a problem with sysinfo_get_physical_memory returning the cgroup limit if:

  • The user has specified a commandline option (like -XX:+EnableCGroupLimits), or
  • The runtime has asked the port library to enable cgroup limits through sysinfo_cgroup_enable_subsystems, or
  • some other explicit opt-in has occurred.

@DanHeidinga
Copy link
Contributor

Just to be clear - if something is OSMXBean and wants to return the physical memory even though the runtime has enabled cgroup limits, then I'm ok with addressing that particular use case through a new bug / issue when it occurs.

@ashu-mehra ashu-mehra force-pushed the cgroup_resource_system branch from 36753bb to 0f0adcc Compare August 29, 2017 17:39
@ashu-mehra
Copy link
Contributor Author

@DanHeidinga I have updated the change set to address the review comments.

@ashu-mehra
Copy link
Contributor Author

@DanHeidinga can you please review updated change set.

@DanHeidinga
Copy link
Contributor

I'm hoping to get through this review this week. Its been somewhat crazy due to internal deadlines recently and I haven't had a lot of cycles for this. Sorry.

Copy link
Contributor

@DanHeidinga DanHeidinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have reservations about the api, but don't have concrete comments about it now. Will comment further this week.

outputErrorMessage(PORTTEST_ERROR_ARGS, "omrsysinfo_cgroup_get_memlimit returned %d when cgroup limits was enabled successfully\n", rc);
/* Compare sysinfo_get_physical_memory and sysinfo_cgroup_get_memlimit after enabling memory subsystem */
enabledSubsystems = omrsysinfo_cgroup_enable_subsystems(OMR_CGROUP_SUBSYSTEM_MEMORY);
if (OMR_CGROUP_SUBSYSTEM_MEMORY == enabledSubsystems) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

if (OMR_CGROUP_SUBSYSTEM_MEMORY == (enabledSubsystems & OMR_CGROUP_SUBSYSTEM_MEMORY))

to make the test resilient to other subsystems being enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct.

*
* @param[in] portLibrary pointer to OMRPortLibrary
*
* @return 0 if the port library can use cgroup limits, otherwise negative error code
* @return 0 if the the cgroup system is available, otherwise negative error code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this really a boolean then? Indicating if the system is available or not?

I'm struggling to see what the user code would do with the failure case that that a boolean wouldn't give us.

freeCgroupEntries(portLibrary, PPG_cgroupEntryList);
PPG_cgroupEntryList = NULL;
omrthread_monitor_exit(cgroupEntryListMonitor);
attachedPortLibraries -= 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great info to put in the code - a comment that describes the problem due to multiple port libraries and indicating the tests that run into this problem.

That way no one has to rediscover the rationale for this code when they investigating it in the future.

availableSubsystems |= OMR_CGROUP_SUBSYSTEM_MEMORY;
}
cgEntry = cgEntry->next;
} while (cgEntry!= cgEntryList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long is this list on a typical system? Vanilla hw vs inside a container? If the list is typically capped at 2-3 entries, this code is fine. If we expect longer lists, we'll want to write this differently, especially if it gets called at startup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently cgEntryList has all cgroup subsystems configured in the system. In cgroup v1, that comes to 12 subsystems.
We can probably restrict it to only the ones we need - memory and cpu for now.
Let me add another change set on top of existing one which would highlight these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanHeidinga please take a look at commit 175daad that avoids adding every subsystem to PPG_cgroupEntryList and also sets PPG_cgroupSubsystemsAvailable more efficiently than before.

@ashu-mehra ashu-mehra force-pushed the cgroup_resource_system branch 2 times, most recently from 6203283 to 129fed2 Compare September 11, 2017 19:09
@ashu-mehra
Copy link
Contributor Author

@DanHeidinga can you please review commit 175daad.

Copy link
Contributor

@DanHeidinga DanHeidinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APIs seem reasonable as a starting point. I'm still concerned about the use of globals and locks in this code. Once the globals and lock issues are addressed (and documented), this should be ready to merge.

@@ -3459,38 +3542,68 @@ omrsysinfo_cgroup_is_limits_supported(struct OMRPortLibrary *portLibrary)
goto _end;
}

omrthread_monitor_enter(cgroupEntryListMonitor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check PPG_cgroupEntryList under the mutex to determine whether it needs to call readCgroupFile() so that it doesn't leak.

cgEntry->cgroup = cgEntry->subsystem + strlen(subsystem) + 1;
strcpy(cgEntry->cgroup, cgroupName);
for (i = 0; i < sizeof(supportedSubsystems) / sizeof(supportedSubsystems[0]); i++) {
if (J9_ARE_NO_BITS_SET(PPG_cgroupSubsystemsAvailable, supportedSubsystems[i].flag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for two threads to be executing this method at once? If it's not, then the function should document what lock needs to be held when calling it. Otherwise, it shouldn't be reading / modifying global state (PPG_cgroupSubsystemsAvailable) without a lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated addCgroupEntry() to avoid using PPG_cgroupSubsystemsAvailable.
Available subsystems are now determined by the readCgroupFile() and returned back via one of its parameters.

@@ -120,7 +121,8 @@ typedef struct OMRPortPlatformGlobals {
#endif

#if defined(LINUX)
#define PPG_cgroupLimitsEnabled (portLibrary->portGlobals->platformGlobals.cgroupLimitsEnabled)
#define PPG_cgroupSubsystemsAvailable (portLibrary->portGlobals->platformGlobals.cgroupSubsystemsAvailable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the above invariant as a comment in the code?

@ashu-mehra
Copy link
Contributor Author

@DanHeidinga added new commit 5a9e49f to address the comments.
Please review.

if (0 != rc) {
goto _end;
for (i = 0; i < sizeof(supportedSubsystems) / sizeof(supportedSubsystems[0]); i++) {
if (J9_ARE_NO_BITS_SET(available, supportedSubsystems[i].flag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use OMR_ARE_NO_BITS_SET instead of the J9 version. This will solve a compile break in this change.

@DanHeidinga
Copy link
Contributor

@ashu-mehra Thanks for addressing the locks, globals and comments questions.

The Travis and AppVeyor builds are failing now. Can you fix up the issues and then I'll give this another look over?

@ashu-mehra ashu-mehra force-pushed the cgroup_resource_system branch from 5717b12 to b8f34de Compare November 13, 2017 18:12
@ashu-mehra
Copy link
Contributor Author

@DanHeidinga Added a commit to replace J9_ARE_NO_BITS_SET with OMR_ARE_NO_BITS_SET.
AppVeyor build has also passed.

@ashu-mehra
Copy link
Contributor Author

Checking travis-ci build failure...

@ashu-mehra
Copy link
Contributor Author

Pushed a commit to fix compile failure reported by travis-ci build.

Copy link
Contributor

@DanHeidinga DanHeidinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @ashu-mehra for working with us to get this ready to merge.

@DanHeidinga
Copy link
Contributor

@charliegracie Can you or one of the other committers give this a second look and merge? The commits should probably be squashed prior to the merge.

@charliegracie
Copy link
Contributor

Hi @ashu-mehra would it be possible to refactor this commits to resolve the compilation issues into the appropriate original commits. It would clean up the commit history and make it easier to test the codebase at different points in history. If this was merged there would be at least 3 known points in history that the codebase would not compile. Thanks

Instead of providing APIs that enable cgroup limits for all
subsystems, a new set of APIs is added which allows to query
which subsystems are available and enable subsystems individually.
This would allow runtimes to enable cgroup subsystems based on their
requirements.
Note that there is another subtle change in the design of the
port library APIs related to cgroup system.
Enabling a cgroup subsystem in portlibrary (by calling
omrsysinfo_cgroup_enable_subsystems) would make the port library APIs
to use cgroup limits internally.
Any port library API that directly returns cgroup limits,
eg omrsysinfo_cgroup_get_memlimit(), does not check whether the
corresponding subsystem (eg memory) is enabled or not.
Therefore, when using omrsysinfo_cgroup_* APIs,
the caller doesn't need to explicitly enable the corresponding
subsystem in the port library.

Consider the case of omrsysinfo_get_physical_memory() which returns
physical memory of the system.
If the system is running in a cgroup, and the cgroup's "memory"
subsystem is enabled, then omrsysinfo_get_physical_memory()
would internally call omrsysinfo_cgroup_get_memlimit(), and
return memory limit imposed by the cgroup.
However, if the user wants to just get the memory limit imposed by the
cgroup, it can directly call omrsysinfo_cgroup_get_memlimit(),
without enabling the "memory" subsystem.

Issue: eclipse-omr#1281
Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
@ashu-mehra ashu-mehra force-pushed the cgroup_resource_system branch from d3f0de3 to beb1470 Compare November 20, 2017 07:15
@ashu-mehra
Copy link
Contributor Author

ashu-mehra commented Nov 20, 2017

@charliegracie I squashed the commits and brought it down from 5 to 2 commits.

@charliegracie
Copy link
Contributor

genie-omr build all

@charliegracie charliegracie self-assigned this Nov 20, 2017
@ashu-mehra
Copy link
Contributor Author

It looks like I inserted a bug in omrsysinfo_cgroup_is_system_available while addressing review comments.
I will be updating the last commit with following change to fix it. Since its trivial, I don't think we need to run tests again.

@@ -3603,7 +3603,7 @@ omrsysinfo_cgroup_is_system_available(struct OMRPortLibrary *portLibrary)
                }
 
                omrthread_monitor_enter(cgroupEntryListMonitor);
-               if (NULL != PPG_cgroupEntryList) {
+               if (NULL == PPG_cgroupEntryList) {
                        rc = readCgroupFile(portLibrary, getpid(), &PPG_cgroupEntryList, &PPG_cgroupSubsystemsAvailable);
                }
                omrthread_monitor_exit(cgroupEntryListMonitor);

List of changes done:
1. Updated addCgroupEntry() to avoid using PPG_cgroupSubsystemsAvailable
- Updated readCgroupFile() to compute available subsystems
- Add check for PPG_cgroupEntryList before calling readCgroupFile()
- Added comment to explain reason for using global variable
attachedPortLibraries
- Macros of type J9_ARE_*_BITS_SET have been replaced by
OMR_ARE_*_BITS_SET
- Moved declaration of variable 'rc' such that it is declared only
for LINUX platform.

Signed-off-by: Ashutosh Mehra <asmehra1@in.ibm.com>
@ashu-mehra ashu-mehra force-pushed the cgroup_resource_system branch from beb1470 to d94ef6f Compare November 20, 2017 19:02
@charliegracie
Copy link
Contributor

@genie-omr build all

if (0 != rc) {
goto _end;
for (i = 0; i < sizeof(supportedSubsystems) / sizeof(supportedSubsystems[0]); i++) {
if (OMR_ARE_NO_BITS_SET(available, supportedSubsystems[i].flag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have preferred that this if statement be broken into 2 so it is easier to debug but that is a stylistic issue that is not worth changing the code for. I will accept this change as is.

@charliegracie
Copy link
Contributor

This is a great start to the Cgroup support. Thank you

@charliegracie charliegracie merged commit a36d65e into eclipse-omr:master Nov 20, 2017
@DanHeidinga
Copy link
Contributor

Echoing Charlie's comments - thanks for working through the reviews on this with us @ashu-mehra! Great to see it merged!

@ashu-mehra
Copy link
Contributor Author

😌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants