-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Store expanding window bucket values in a compressed format #100
[ML] Store expanding window bucket values in a compressed format #100
Conversation
lib/core/CCompressUtils.cc
Outdated
ret = ::deflateInit(&m_ZlibStrm, level); | ||
break; | ||
case E_Inflate: | ||
ret = ::inflateInit(&m_ZlibStrm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's neater to have a break
on the last case
. I doubt it makes any difference to efficiency.
lib/core/CCompressUtils.cc
Outdated
return false; | ||
} | ||
} | ||
|
||
result = m_FullResult; | ||
if (finish) { | ||
result = std::move(m_FullResult); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means it's now only possible to get the result once, which might be surprising to someone in the future.
It might be better to have another method similar to data()
like finishAndTakeData()
or similar, and only move the result in the latter method.
@@ -433,7 +433,7 @@ void CAnomalyJobLimitTest::testModelledEntityCountForFixedMemoryLimit() { | |||
LOG_DEBUG(<< "# partition = " << used.s_PartitionFields); | |||
LOG_DEBUG(<< "Memory status = " << used.s_MemoryStatus); | |||
LOG_DEBUG(<< "Memory usage = " << used.s_Usage); | |||
CPPUNIT_ASSERT(used.s_PartitionFields > 280 && used.s_PartitionFields < 360); | |||
CPPUNIT_ASSERT(used.s_PartitionFields > 390 && used.s_PartitionFields < 470); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wonder whether the change will break one of the Java integration tests. It would be good to run the x-pack/qa/ml-native-tests
suite with locally built C++ to confirm before merging, and mute any tests that depend on both the Java and C++ changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dimitris removed the asserts which tended to cause the failures in the past in elastic/elasticsearch#30716, which this test supersedes, but I'll double check nothing else does.
include/core/CCompressUtils.h
Outdated
using TByteVec = std::vector<Bytef>; | ||
enum EOperation { E_Deflate, E_Inflate }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devils advocate: What is the reason to have operations/modes over 2 classes/methods ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed easier to share code this way than have two classes; although I could probably extract common functionality out into a private base class. It also felt at the outset like it was going to be the smaller change to CCompressUtils; although in practice it ended up changing quite a lot. I'll have a think about this alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this turns out nicer: I added separate implementations with shared base class for inflation and deflation. These can really just wrap up only the small number of library calls which are task specific, so nearly all the implementation is shared.
include/core/CCompressUtils.h
Outdated
//! of the vector data. | ||
template<typename T> | ||
static Bytef* bytes(const std::vector<T>& input) { | ||
return const_cast<Bytef*>(reinterpret_cast<const Bytef*>(input.data())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider: zlib becomes much nicer w.r.t. const
if you:
#define ZLIB_CONST
But this requires zlib >= 1.2.8, I am not sure what versions we use on the different platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use this we'd have to build zlib from source on Linux and ship it with our distribution. (Mac would be OK with the OS default, and we already build zlib from source on Windows.)
CentOS 6 is using version 1.2.3 (patched for security vulnerabilities thankfully) and even CentOS 7 is only on version 1.2.7. So it will be many years before we can use this flag with the standard OS version of zlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 ok, probably not worth the hassle. 1.2.8 is already >5 years old, but it seems not old enough ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CentOS 6 is basically a hardened version of Fedora 12, and Fedora 12 was released in 2009!
I think ZLIB_CONST
went into version 1.2.7.1 though, not 1.2.8, so it is available on CentOS 7. So maybe we can use it in 8.x if CentOS 7 is the oldest supported platform by then.
82b4239
to
0e126a0
Compare
include/core/CCompressUtils.h
Outdated
@@ -35,57 +35,183 @@ namespace core { | |||
//! a multi-threaded application it would be best to create | |||
//! one object for each thread. | |||
//! | |||
class CORE_EXPORT CCompressUtils : private CNonCopyable { | |||
class CORE_EXPORT CCompressUtil : private CNonCopyable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you rename the class I think you should also rename the source file to match (plus the corresponding implementation and unit test files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a bit.
My rationale for renaming was "utils" suggests to me a collection of utility functionality, i.e. multiple disparate functions and objects, rather than a single (utility) class.
The reason I didn't rename the file was that we actually have two CCompressUtil
classes, i.e. the inflator and deflator, and these are the types one would actually use from this header. So this is one of the cases where our naming convention for files breaks down, i.e. there isn't a single natural class after which to name the file. Conversely, breaking these classes out into their own header files seems wrong to me too.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other files that contain collections of classes don't have the initial C
- FrequencyPredicates.h
and ProbabilityAggregators.h
for example. So maybe this file could be just CompressUtils.h
? Having the initial C
suggests it is named after a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think that sounds sensible. I'll go with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, subject to what I said in #100 (comment)
Now the ES team are requiring green PR builds before merging their PRs we cause a lot of disruption if a C++ change causes a Java test to fail.
Do you have the necessary setup to run the tests in elasticsearch/x-pack/qa/ml-native-tests
with locally built C++? If not then I should be able to pull your branch to an environment on one of the London office servers and test it tomorrow.
But please don't merge before this is done.
Thanks @droberts195. Just running them now. |
0bb2eda
to
ee1ef8e
Compare
For some time I've thought it would be good to investigate storing some of the model state in a compressed format where the access pattern lends itself to doing this without paying prohibitive CPU cost. This PR is the outcome of applying this strategy to the window of historical data used for periodicity testing. It turns out that one gets very significant memory gains by doing this, particularly for sparse data.
This PR breaks into two pieces:
core::CCompressUtils
to support inflation and adding support for vectors of trivially copyable types. This covers the requirements for this work.maths::CExpandingWindow
to optionally deflate the bucket value vector when not in use.Profiling before and after this change showed up the need to make a refinement to the simplest possible implementation. In particular,
maths::CExpandingWindow::addValue
would have needed one call to inflate/deflate per data bucket. This proved prohibitively expensive: around a 25% increase in runtime. I therefore switched to maintaining a small buffer of incremental changes to the window bucket values and only flushing this when it gets full, which is, therefore, the only time addValue triggers an inflate/deflate. I can make this buffer very small, since the window bucket length is typically significantly longer than the data bucketing length, and still get significant speedups. A length of 5 constitutes a pretty good tradeoff: costing us almost nothing in terms of memory overhead and reducing the runtime cost of inflation/deflation to around 2%.The memory gains from this change are fairly data dependent, but I got good improvements across the board (and up to just under 44% for very sparse data where the cost of this window is a big chunk of the model size and we get really good compression, up to 95%, of the mostly empty bucket values). The following is a summary of the gain relative to master (including and excluding #97).
"Partition foo"
"By foo"
This should have no impact on results except where the hard memory limit is in play.