Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ImageClassification Transfer Learning] Higher level API supporting image predictions based on image file paths and in-memory images, alternatively #4084

Closed
CESARDELATORRE opened this issue Aug 7, 2019 · 1 comment
Assignees
Labels
enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@CESARDELATORRE
Copy link
Contributor

CESARDELATORRE commented Aug 7, 2019

This is probably more for the long-term API of our ImageClassification Transfer Learning once the foundational features for our 'ImageClassification Transfer Learning' are in place.

However, this is the point:

We aim to design and implement a high level API which will provide a very straightforward way of training (transfer learning) image classification and other capabilities such as object detection, etc. That also means, and is equally important, to be able to score/predict very easily according to normal scenarios which for images are two scenarios:

  • A. Score based on a provided image file path.
  • B. Score based on an in-memory image.

The second choice (B) is even more common for many apps, even more when moving to OBJECT DETECTION and LIVE VIDEO/IMAGES coming as streaming.

CURRENT STATE IN ML.NET API

The issue is with the current API design of ML.NET where the ML.NET model scoring API completely depends on the way you created its original pipeline and what schema you were using, meaning:

  • If you trained an ML.NET model with image file paths, your ML.NET model expects an image file path for scoring.

  • If you trained an ML.NET model with in-memory images (not a common/straightforward approach), your ML.NET model expects an in-memory image (Bitmap) when scoring.

However, that's not the typical scenario for a user where the easiest way for training is using image filepaths (that's the way you usually have image sets, right?), but then when scoring/predicting the user should be able to provide either a file path or an in-memory image (in-memory is more common for end-user apps vs. filepaths for batch processes) and anything else should be transparent for him.

CURRENT WORKAROUND:

Sure, you can always go deeper and deep dive in the TensorFlow .pb model that was also generated by our "ImageClassification Transfer Learning", find out the input and output tensor names, implement a different ML.NET pipeline that accepts in-memory images, create that ML.NET model by running that pipeline once (it is not really training, it is just creating the needed transformers for scoring later on) and finally writing a more specific and not very straightforward code for scoring with in-memory images.

That process is what I wrote in the second part of this BLOG POST and related sample app which is using an in-memory image coming through HTTP and was provided by the end-user for predicting its class:

https://devblogs.microsoft.com/cesardelatorre/run-with-ml-net-c-code-a-tensorflow-model-exported-from-azure-cognitive-services-custom-vision/

But that code for scoring a TF model with in-memory images is not straightforward to implement.

INITIAL SOLUTION: Load images from files but convert them into in-memory image objects before loading it into the IDataView, so the schema would match for training and scoring

An initial good solution is to Load images from files but convert them into in-memory image objects before loading it into the IDataView, so the data class for the schema would be something like the following:

Instead of the following:

    public class ImageData
    {
        [LoadColumn(0)]
        public string ImagePath;

        [LoadColumn(1)]
        public string Label;
    }

We want to have something like this being used by the initial IDataView:

    public class ImageInputData
    {
        public Bitmap Image { get; set; }
        
        public string Label;
    }

That way, the ML.NET model's schema that we have when training would match the schema data class needed when scoring by only having an in-memory image without the user needing to go deeper and use the TensorFlow model .pb and creating his own pipeline for scoring, that I explin in this blog post but it is too complex if we want users to use a high-level API:

https://devblogs.microsoft.com/cesardelatorre/run-with-ml-net-c-code-a-tensorflow-model-exported-from-azure-cognitive-services-custom-vision/

POSSIBLE FUTURE HIGH LEVEL API for Image Classification, ObjectDetection and other high level SCENARIOS

The point is that we want to create high level APIs targeting SCENARIOS (Image Classification, ObjectDetection, etc.). That means the API to use for the mentioned use cases (training and scoring) should also be very simple. It is not acceptable to have such as complexity (see BLOG POST above) if you want to score with an in-memory image.

Somehow we should solve the conflict between the current ML.NET pipeline API requirements (you score with the same schema you trained) and the points I explained above.
We probably will need to create a higher level API on top of the current pipelines API which would be more oriented to SCENARIOS (ImageClassification, Object Detection). The current API in the ML.NET pipelines doesn't probably allow what I'm explaining..

Also, that approach doesn't make our solution transparent to the underneath DNN architecture/framework (TensorFlow/Torch, etc.) since it needs the user to have specific code depending if he/she is using TensorFlow or Torch, because the user needs to take the "under the covers" TF .pb model and implement code for scoring which might also be specific for TensorFlow vs. Torch in the future.

Moving forward, this discussion will be even more important for OBJECT DETECTION where in-memory image scoring is critical (streaming live video/images) while users might want to train based on image paths which is simpler.

@CESARDELATORRE CESARDELATORRE added the enhancement New feature or request label Aug 7, 2019
@CESARDELATORRE CESARDELATORRE changed the title [ImageClassification Transfer Learning] Higher level API supporting image predictions based on image file paths or in-memory images [ImageClassification Transfer Learning] Higher level API supporting image predictions based on image file paths and in-memory images, alternatively Aug 7, 2019
@ganik ganik added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Aug 20, 2019
@antoniovs1029
Copy link
Member

Seems this was fixed in #4151

@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

4 participants