Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading TFlite model metadata #78

Open
phlash opened this issue May 8, 2021 · 10 comments
Open

Reading TFlite model metadata #78

phlash opened this issue May 8, 2021 · 10 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@phlash
Copy link
Collaborator

phlash commented May 8, 2021

Originally posted by @phlash in #77 (comment)

Sounds like a nice solution, but is the metadata normally included in models? Sounds like models with metadata should essentially be a zip file?

Not in the models we're currently using, but only because we didn't need it. Models with metadata in are available from the Google model zoo (https://tfhub.dev/). [edit] I lied, that only contains Deeplabv3 with metadata. Looks like the Media Pipe team haven't added any yet, although they do have model cards. Oh well. end

I'm currently looking at sane ways to read the metadata without adding a new dependency and build pain (it needs Bazel) for the tf_lite_support library that's officially required. So far, getting a blob of metadata out works ok (see snippet below), since TfLite already supports random blobs in a file, however that blob then needs parsing (from flatbuffers format, schema here: https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/metadata/metadata_schema.fbs) to pull out the input normalization constants.. Currently poking through this: https://google.github.io/flatbuffers/md__internals.html and a hex dump of the raw buffer.

int init_tensorflow(...) {
	//...
	auto model = flatmodel->GetModel();
	auto *md = model->metadata();
	if (md) {
		for (uint32_t mid=0; mid < md->size(); ++mid) {
			const auto meta = md->Get(mid);
			printf("found: %s\n", meta->name()->c_str());
			if (meta->name()->str() != "TFLITE_METADATA")
				continue;
			// grab raw buffer and dump it..
			const flatbuffers::Vector<uint8_t> *pvec = model->buffers()->Get(meta->buffer())->data();
			printf("metadata dump (size=0x%X)\n", pvec->size());
			parse_metadata(pvec->data(), pvec->size());  // currently just a hex dump :)
		}
	}

This might turn out to be horribly fragile though!

Question: What do we think about using Bazel for builds (standard Tensorflow tooling)?

@phlash phlash added enhancement New feature or request question Further information is requested labels May 8, 2021
@BenBE
Copy link
Collaborator

BenBE commented May 9, 2021

Question: What do we think about using Bazel for builds (standard Tensorflow tooling)?

On first glance bazel looks like some hipster tooling, written because somebody didn't understand the existing solutions. If we change build systems then cmake is the furthest I'd like to go. The PR we had on that subject wasn't perfect, but still better then introducing some non-common build dependencies (never heard of bazel before TBH).

@floe
Copy link
Owner

floe commented May 10, 2021

[offtopic] +1 for "hipster tooling" ;-) [/offtopic]

@phlash
Copy link
Collaborator Author

phlash commented May 10, 2021

Got a prototype metadata reader working in my tree: https://github.com/phlash/backscrub/tree/tflite-metadata

Not pretty but avoids pulling in a whole new library and 'hipster' build system 😉

@vekkt0r
Copy link
Contributor

vekkt0r commented May 13, 2021

Maybe also a bit off topic in regards to meta data reading; Tried to use that hipster build system for a standalone lib that uses TensorFlow. Turned out to be a huge time sink..

Goal: just make some quick experiments with post processing of model output in Python

  • Problem: Not possible to set custom ops in current (released) tflite Python [1]
  • Solution: Try to create a small wrapper lib to do inference and use it from Python with ctypes
  • Reality: Spend way too much time fighting with Tensorflow building and linking errors

Got it to build for my use case in the end. Tried to make a quick adaption of Bazel for backscrub but never managed to find a good way to integrate opencv dep.

Some take home messages:

  • Building TensorFlow as Bazel external dependency does not seem officially supported, but is doable [2], [3]
  • C API seems to be recommended over C++ by tf devs [4]
  • TF Devs indicate CMake is no longer supported? (old comment) [5]
  • Bazel has nothing like pkg-config or .cmake files. Externals deps that are not built by Bazel can be added by listing paths and globs on what headers and libs to look for

1: tensorflow/tensorflow#44043
2: tensorflow/tensorflow#12761
3: https://stackoverflow.com/questions/48497006/how-to-add-tensorflow-to-existing-bazel-project-as-external-dependencies
4 : tensorflow/tensorflow#35689 (comment)
5: tensorflow/tensorflow#30183 (comment)

@phlash
Copy link
Collaborator Author

phlash commented May 13, 2021

[still OT slightly] Thanks for trying the 'hipster way' 😄. I'm not sure CMake is going away now (it may have been then), as it's properly documented and marked as 'experimental since 2.4' here: https://www.tensorflow.org/lite/guide/build_cmake. I for one would rather use CMake for it's good documentation, popularity and capability (even if I don't like the mess it spews out!). It took me just a few minutes to get a basic 'backscrub + tensorflow' combined build working, so I could enable XNNPACK and double the CPU-based performance: https://github.com/phlash/backscrub/tree/xnnpack-test

[back on topic] Thoughts on my rough metadata extraction hack? It could/should probably use [de]serialisation code generated by flatc rather than hacked up by hand (but then we have to build & run flatc - which has CMake and Bazel support). It should read the 'associated files' metadata rather than assume 'labelmap.txt' exists. Is there other metadata we would want? It might be better to rebase on experimental rather than my tflite-extract branch...

@BenBE
Copy link
Collaborator

BenBE commented May 13, 2021

Does the flatc stuff always need to full spec to be built? Could we try to build a reduced version of it and freeze it for our purposes? Haven't looked at the generated flatc source code (and I'm too afraid I'd rather not do that to reduce the amount of nightmares), but how sane is it? Would interfacing just that reduced part work? Or is the data structure needed sane enough to whip up our own reduced parser for it?

@phlash
Copy link
Collaborator Author

phlash commented May 13, 2021

Google do exactly that for parts of the build already, pre-built headers from flatc (resulting in schema_generated.h), to avoid the pain of building and running it, but only for the parts that they support in Make and CMake builds, not the metadata library (nor it turns out the GPU delegate). I'll have a go at building/running flatc...

@BenBE
Copy link
Collaborator

BenBE commented May 14, 2021

There seems to be a package flatbuffers-compiler (at least on Ubuntu 20.04 LTS and Debian Bullseye) containing flatc, thus this might come with an acceptable level of PITA … It could even be remarked upon as an optional build dependency then …

@phlash
Copy link
Collaborator Author

phlash commented May 14, 2021

Ah-ha! Also in buster-backports so this might be an easy way out. That said flatc builds easily enough with CMake, unfortunately it's a struggle to do that via the main tensorflow-lite build (the dependency mechanism using FetchContent is fiddly and doesn't pass options through). I'll install the packaged version and compile up the metadata schema, see how ugly it looks..

@phlash
Copy link
Collaborator Author

phlash commented May 14, 2021

OK, it looks neater with the compiled metadata serializer: https://github.com/phlash/backscrub/tree/tflite-metadata

This branch is now:

  • based on main
  • uses flatbuffers-compiler package (stock Ubuntu or backport for Debian buster)
  • reads the output labels file name from metadata

Awaiting somewhere to use the metadata values... coming in: #77 then I'll PR this work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants