Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Store expanding window bucket values in a compressed format #100

Merged

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented May 23, 2018

For some time I've thought it would be good to investigate storing some of the model state in a compressed format where the access pattern lends itself to doing this without paying prohibitive CPU cost. This PR is the outcome of applying this strategy to the window of historical data used for periodicity testing. It turns out that one gets very significant memory gains by doing this, particularly for sparse data.

This PR breaks into two pieces:

  1. Extending core::CCompressUtils to support inflation and adding support for vectors of trivially copyable types. This covers the requirements for this work.
  2. Updating maths::CExpandingWindow to optionally deflate the bucket value vector when not in use.

Profiling before and after this change showed up the need to make a refinement to the simplest possible implementation. In particular, maths::CExpandingWindow::addValue would have needed one call to inflate/deflate per data bucket. This proved prohibitively expensive: around a 25% increase in runtime. I therefore switched to maintaining a small buffer of incremental changes to the window bucket values and only flushing this when it gets full, which is, therefore, the only time addValue triggers an inflate/deflate. I can make this buffer very small, since the window bucket length is typically significantly longer than the data bucketing length, and still get significant speedups. A length of 5 constitutes a pretty good tradeoff: costing us almost nothing in terms of memory overhead and reducing the runtime cost of inflation/deflation to around 2%.

The memory gains from this change are fairly data dependent, but I got good improvements across the board (and up to just under 44% for very sparse data where the cost of this window is a big chunk of the model size and we get really good compression, up to 95%, of the mostly empty bucket values). The following is a summary of the gain relative to master (including and excluding #97).

"Partition foo"

# partitions Old model memory New model memory % improvement with #97 % improvement
10 1.4MB 1.2MB 14.3 0.98MB 30
21 1MB 0.61MB 39 0.54MB 46
78 4.1MB 2.3MB 43.9 2.2MB 46.3
16925 559MB 353MB 36.8 317MB 43.3

"By foo"

# by Old model memory New model memory % improvement
10 0.97MB 0.87MB 10.0
21 0.49MB 0.33MB 32.7
78 2MB 1.4MB 30.0
16925 244.1MB 161.2MB 34.0

This should have no impact on results except where the hard memory limit is in play.

ret = ::deflateInit(&m_ZlibStrm, level);
break;
case E_Inflate:
ret = ::inflateInit(&m_ZlibStrm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's neater to have a break on the last case. I doubt it makes any difference to efficiency.

return false;
}
}

result = m_FullResult;
if (finish) {
result = std::move(m_FullResult);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means it's now only possible to get the result once, which might be surprising to someone in the future.

It might be better to have another method similar to data() like finishAndTakeData() or similar, and only move the result in the latter method.

@@ -433,7 +433,7 @@ void CAnomalyJobLimitTest::testModelledEntityCountForFixedMemoryLimit() {
LOG_DEBUG(<< "# partition = " << used.s_PartitionFields);
LOG_DEBUG(<< "Memory status = " << used.s_MemoryStatus);
LOG_DEBUG(<< "Memory usage = " << used.s_Usage);
CPPUNIT_ASSERT(used.s_PartitionFields > 280 && used.s_PartitionFields < 360);
CPPUNIT_ASSERT(used.s_PartitionFields > 390 && used.s_PartitionFields < 470);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder whether the change will break one of the Java integration tests. It would be good to run the x-pack/qa/ml-native-tests suite with locally built C++ to confirm before merging, and mute any tests that depend on both the Java and C++ changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dimitris removed the asserts which tended to cause the failures in the past in elastic/elasticsearch#30716, which this test supersedes, but I'll double check nothing else does.

using TByteVec = std::vector<Bytef>;
enum EOperation { E_Deflate, E_Inflate };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devils advocate: What is the reason to have operations/modes over 2 classes/methods ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed easier to share code this way than have two classes; although I could probably extract common functionality out into a private base class. It also felt at the outset like it was going to be the smaller change to CCompressUtils; although in practice it ended up changing quite a lot. I'll have a think about this alternative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this turns out nicer: I added separate implementations with shared base class for inflation and deflation. These can really just wrap up only the small number of library calls which are task specific, so nearly all the implementation is shared.

//! of the vector data.
template<typename T>
static Bytef* bytes(const std::vector<T>& input) {
return const_cast<Bytef*>(reinterpret_cast<const Bytef*>(input.data()));
Copy link
Contributor

@hendrikmuhs hendrikmuhs May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider: zlib becomes much nicer w.r.t. const if you:

#define ZLIB_CONST

But this requires zlib >= 1.2.8, I am not sure what versions we use on the different platforms.

Copy link
Contributor

@droberts195 droberts195 May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use this we'd have to build zlib from source on Linux and ship it with our distribution. (Mac would be OK with the OS default, and we already build zlib from source on Windows.)

CentOS 6 is using version 1.2.3 (patched for security vulnerabilities thankfully) and even CentOS 7 is only on version 1.2.7. So it will be many years before we can use this flag with the standard OS version of zlib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ok, probably not worth the hassle. 1.2.8 is already >5 years old, but it seems not old enough ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CentOS 6 is basically a hardened version of Fedora 12, and Fedora 12 was released in 2009!

I think ZLIB_CONST went into version 1.2.7.1 though, not 1.2.8, so it is available on CentOS 7. So maybe we can use it in 8.x if CentOS 7 is the oldest supported platform by then.

@tveasey tveasey force-pushed the enhancement/deflate-expanding-window-state branch from 82b4239 to 0e126a0 Compare May 23, 2018 20:43
@@ -35,57 +35,183 @@ namespace core {
//! a multi-threaded application it would be best to create
//! one object for each thread.
//!
class CORE_EXPORT CCompressUtils : private CNonCopyable {
class CORE_EXPORT CCompressUtil : private CNonCopyable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you rename the class I think you should also rename the source file to match (plus the corresponding implementation and unit test files).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit.

My rationale for renaming was "utils" suggests to me a collection of utility functionality, i.e. multiple disparate functions and objects, rather than a single (utility) class.

The reason I didn't rename the file was that we actually have two CCompressUtil classes, i.e. the inflator and deflator, and these are the types one would actually use from this header. So this is one of the cases where our naming convention for files breaks down, i.e. there isn't a single natural class after which to name the file. Conversely, breaking these classes out into their own header files seems wrong to me too.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other files that contain collections of classes don't have the initial C - FrequencyPredicates.h and ProbabilityAggregators.h for example. So maybe this file could be just CompressUtils.h? Having the initial C suggests it is named after a class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think that sounds sensible. I'll go with that.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, subject to what I said in #100 (comment)

Now the ES team are requiring green PR builds before merging their PRs we cause a lot of disruption if a C++ change causes a Java test to fail.

Do you have the necessary setup to run the tests in elasticsearch/x-pack/qa/ml-native-tests with locally built C++? If not then I should be able to pull your branch to an environment on one of the London office servers and test it tomorrow.

But please don't merge before this is done.

@tveasey
Copy link
Contributor Author

tveasey commented May 24, 2018

Thanks @droberts195. Just running them now.

@tveasey tveasey force-pushed the enhancement/deflate-expanding-window-state branch from 0bb2eda to ee1ef8e Compare May 25, 2018 09:16
@tveasey tveasey deleted the enhancement/deflate-expanding-window-state branch December 13, 2018 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants