Skip to content

Latest commit

 

History

History
79 lines (70 loc) · 6.56 KB

README.md

File metadata and controls

79 lines (70 loc) · 6.56 KB

Documentation: mountain vs beach classifier

Overview

Methodology

  1. Overview the task: Number of steps? Possible approaches? Available frameworks and tools? Data?
  2. Find and get dataset: extracted (~369MB) from MIT's places dataset (26GB). It contains ~15K images for each category. Remark: There are categories beach and mountain in the original dataset but there are only ~5K images for each. That's why I use also other "related classes" to get more data, i.e.
    • "beach related" = "beach" + "beach_house" + "ocean".
    • "mountain related" = "mountain" + "mountain_path" + "mountain_snowy".
  3. Choose appropriate methods & tools:
    1. Framework: TensorFlow

    2. Final model: use transfer learning from a pretrained InceptionV3 model with "imagenet" weights + a few more layers.

      # WARNING: check the notebook for full parameters
      
      # Pretrained InceptionV3
      # up to layer "mixed7" ((None, 10, 10, 768))
      # We don't train these layers, just use the last output.
      pre_trained_model = InceptionV3(weights = "imagenet")
      last_output = pre_trained_model.get_layer('mixed7').output
      
      # Add more layers
      x = layers.Flatten()(last_output)
      x = layers.Dense(512, activation='relu')(x)
      x = layers.Dropout(0.35)(x) # prevent "overfitting"
      x = layers.Dense  (1, activation='sigmoid')(x)
      
      # Model
      model = Model(pre_trained_model.input, x)
      model.compile(
          optimizer = RMSprop(lr=1e-5),
          loss = 'binary_crossentropy',
          metrics = ['accuracy'])
      
      # Train with epochs=100, batch_size=50
    3. Machine: try both Google Colab and personal laptop (Dell XPS 15 7950 with 32GB RAM, 4GB GPU and Intel® Core™ i7-9750H CPU @ 2.60GHz × 12).

    4. Final score: 94.29% accuracy.

  4. Already tried approaches:
    1. Use directly data: the "right" images of beach and mountain. However, there are not many data (5K for each category). Only 90% accuracy or underfitting/overfitting sometimes. That's why I use "other related" images (15K for each category).
    2. Smaller input size (150x150, for quicker training): smaller accuracy.
    3. Using VGG16 places365 pretrained weights (instead of InceptionV3 + "imagenet"): only 88% accuracy.
    4. Using smaller dropout (0.2 instead of 0.35): smaller accuracy.
    5. Using bigger input size (224x224), smaller learning rate or bigger batch_size: not enough GPU memory or time to train the model.
  5. Deployment:
    1. Result: dinhanhthi.github.io/mountain-vs-beach.
    2. Chosen approach (serverless): using TensorFlow.js to load weights + trained model from .json and .bin files + static site + hosted using Github Pages (free).
      • Weakness: trained model is big (193MB), it takes too much time for users to load the site.
      • Solution: Using quantization technique in tfjs-converter to reduce the model from 193MB to 48MB (it's also big but better).
        • Quantization technique: compress our model parameters from Float32s (4 bytes) to Uint8s (single bytes) by mapping each tensors’ value from the range [min value, max value] represented by 255⁴ bits to [0, 255] represented by 255 bits.
        • This converter will convert a saved model (exported by TensorFlow Core) to a format of TensorFlow.js. The converted format contains splitted 4MB .bin files - that way they can be cached by browsers. That's why the users have to wait only once when load the page at the first time, other times, it's quick!
        • Weakness: we lose accuracy (may be not much, may be much).
    3. Other options (server side):
      1. Trained model + Flask + Heroku.
      2. Trained model + Fask + Firebase.
      3. Trained model + Streamlit + Heroku.
      4. I didn't choose them because I've tested some other projects which are built based on these techniques. They're not faster much than the serverless approach but they consume more time to construct.
    4. Frontend: Bootstrap + HTML + CSS + Vanilla JS.
      1. 3 ways to input an image:
        1. Upload from local (recommended).
        2. Choose example images right on the page.
        3. Paste from URL of image (must follow a good format, otherwise, it will not work!). Acceptable URLs are .jpeg, .png, .jpg or base64 urls.
      2. Technical problem with URL option: TFJS's fromPixels() method is causing a CORS error (we cannot fetch the image via an intermediate site like our web-app).
      3. The meaning of results: because our problem is a binary classification. I use sigmoid to get the probability of the score. This score is between 0 (beach) and 1 (mountain).
        1. It's a mountain if score > 0.65 and we output score*100%.
        2. It's a beach if score < 0.35 and we output (1-score)*100%.
        3. Not sure result: score between [0.35, 0.65]. It may come from your input image (not good format, cannot fetch URL, there are both beach and mountain in the image, there aren't either mountain or beach in the image,...).