-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPUW] Weightless serialization #28469
base: master
Are you sure you want to change the base?
[NPUW] Weightless serialization #28469
Conversation
…openvino into as/npuw_s11n_cfg
…into as/npuw_s11n_fixes
…into as/npuw_s11n_fixes
…y/openvino into as/npuw_s11n_fixes
…y/openvino into as/npuw_weightless_s11n
…into as/npuw_weightless_s11n
…into as/npuw_weightless_s11n
…into as/npuw_weightless_s11n
} | ||
|
||
// Weightless | ||
// FIXME: all serialization needs a good rewriting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, might be first we need to dissimilate different bools flags with creating of consts and writing them instead of false
and true
// Identify either full flow or weightless | ||
bool is_weightless = false; | ||
if (m_non_llm_props.count(ov::cache_mode.name()) && | ||
m_non_llm_props.at(ov::cache_mode.name()).as<std::string>() == "OPTIMIZE_SIZE") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_non_llm_props.at(ov::cache_mode.name()).as<std::string>() == "OPTIMIZE_SIZE") { | |
m_non_llm_props.at(ov::cache_mode.name()).as<CacheMode>() == CacheMode::OPTIMIZE_SIZE) { |
BTW, weightless cache is used for GPU in CacheMode::OPTIMIZE_SIZE, because in generic case it will drop performance of import_model.
Is it the same for NPU? or you can use weightless cache in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same in our case. With weightless flow the performance is worse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have some failing tests in NPU functional tests, please check them as well.
Also, not sure that we need to remove property from compiler properties to keep backward compatibility. @PatrikStepan, @csoka any idea? L.E.: Or maybe we don't need this since we don't register it in the plugin as well.
// FIXME: create proper op identificators instead of int | ||
std::visit(overloaded{[&stream](const op::Concat& op) { | ||
write(stream, int{0}); | ||
op.serialize(stream); | ||
}, | ||
[&stream](const op::Const& op) { | ||
write(stream, int{1}); | ||
op.serialize(stream); | ||
}, | ||
[&stream](const op::Convert& op) { | ||
write(stream, int{2}); | ||
op.serialize(stream); | ||
}, | ||
[&stream](const op::Permute& op) { | ||
write(stream, int{3}); | ||
op.serialize(stream); | ||
}, | ||
[&stream](const op::Unpack& op) { | ||
write(stream, int{4}); | ||
op.serialize(stream); | ||
}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhhh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, missed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
switch (op_type) { | ||
case 0: | ||
lt_impl->m_transform = op::Concat::deserialize(stream); | ||
break; | ||
case 1: | ||
lt_impl->m_transform = op::Const::deserialize(stream); | ||
break; | ||
case 2: | ||
lt_impl->m_transform = op::Convert::deserialize(stream); | ||
break; | ||
case 3: | ||
lt_impl->m_transform = op::Permute::deserialize(stream); | ||
break; | ||
case 4: | ||
lt_impl->m_transform = op::Unpack::deserialize(stream); | ||
break; | ||
default: | ||
NPUW_ASSERT(false && "Unsupported type"); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Related to openvinotoolkit/openvino.genai#1635