Skip to content
This repository has been archived by the owner on Aug 15, 2019. It is now read-only.

Create preprocess.py #136

Merged
merged 6 commits into from
Oct 1, 2017
Merged

Create preprocess.py #136

merged 6 commits into from
Oct 1, 2017

Conversation

kaihuchen
Copy link
Contributor

@kaihuchen kaihuchen commented Sep 18, 2017

This program converts input images into a format suitable for use with deeplearnjs.

Before running this program:

  • Create a directory structure as follows:
    • <topDir>
      • preprocess.py
        • <yourProjectDir>
          • <yourImageDir> # default to 'images'
          • <yourImageDir2>
  • Put all images under <yourImageDir>
    (or if using your own directory name, i.e., <yourImageDir>)
    specify it using the --path parameter from command line).
  • You may have <yourImageDir2>, <yourImageDir3>, etc., to facilitate experimentations.
  • Each image must be prefixed with its class label, followed by '_'.
    For example, cat_image00005.jpg

To run:

  1. $ cd <yourProjectDir>
  2. $ python ../preprocess.py # or
    $ python ../preprocess.py --outimgs newimgs #if prefering non-default parameters

Results: find in the current directoty an image file 'images.png' (extension '.png' added automatically), and a labels file 'labels' (or per command line options)

Note:

  • Make sure that the model-builder-datasets-configuration's data.labels.shape matches the number of classes found in data
  • Make sure that the NN model's output layer matches the number of classes
    To-do:
  • This code is not suitable for processing large number of images.
  • Tested with python v2.7. Saw some problem with V3.5

This change is Reviewable

This program converts input images into a format suitable for use with deeplearnjs.

Before running this program: 
   - Create a directory structure as follows:
     <topDir>
         preprocess.py
           <yourProjectDir>
               <yourImageDir>    # default to 'images'           
               <yourImageDir2>             
   - Put all images under <yourImageDir>
       (or if using your own directory name, i.e., <yourImageDir>) 
           specify it using the --path parameter from command line).
   - You may have <yourImageDir2>, <yourImageDir3>, etc., to facilitate experimentations.   
   - Each image must be prefixed with its class label, followed by '_'.
       For example, cat_image00005.jpg
To run: 
   1.  $ cd <yourProjectDir>
   2a. $ python ../preprocess.py; or
   2b. $ python ../preprocess.py --outimgs newimgs     #if prefering non-default parameters

Results: find in the current directoty an image file 'images.png' (extension '.png' added automatically), and a labels file 'labels' (or per command line options)

Note:
   - Make sure that the model-builder-datasets-configuration's data.labels.shape matches the number of classes found in data
   - Make sure that the NN model's output layer matches the number of classes
To-do:
   - This code is not suitable for processing large number of images.
   - Tested with python v2.7. Saw some problem with V3.5
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@nsthorat
Copy link
Contributor

Wow. This is fantastic. Thank you so much for spending the time on this.

A couple high level suggestions:

  • Can we make it so the preprocess.py doesn't have to be in the directory? It'd be nice to run it like ./scripts/preprocess /abs/path/to/image/dir
  • Can you rename it to something more descriptive, how about something like process_images_for_training.py or something like that? I like mile long descriptive names :)

Some minor nits inline! Thanks again!!


Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


scripts/preprocess.py, line 2 at r1 (raw file):

# Copyright 2017 Smesh LLC. All Rights Reserved.
#   http://smesh.net/labs

Some laywercat will find this and yell at us for this :/

Feel free to add yourself a little lower at the top of the file, though (not in the license).


scripts/preprocess.py, line 16 at r1 (raw file):

# limitations under the License.
# ======================================================================
# This program converts input images into a format suitable for use with deeplearnjs.

Can you actually change this to say the model builder demo?

We're going to be thinking hard about the data API at the library level at some point in the near future, and the model builder format might not be here to stay.


scripts/preprocess.py, line 69 at r1 (raw file):

delimiter = '_'

#---------------------------------------

remove this line


scripts/preprocess.py, line 71 at r1 (raw file):

#---------------------------------------
def preprocessImages(FLAGS):
    path = FLAGS.path

use 2 space indentation throughout this file (we dont have a py linter set up yet but this is what it would say)


scripts/preprocess.py, line 95 at r1 (raw file):

        fileList = np.tile(fileList, FLAGS.replicate)
        print('...Dataset has been replicated', FLAGS.replicate, 'times')
    

remove trailing spaces on this line


scripts/preprocess.py, line 108 at r1 (raw file):

    a = imageList
    print('...Created', a.shape[0], 'images')
    #print('min/max pixel values: ', a.min(), '/', a.max())

remove comment


scripts/preprocess.py, line 116 at r1 (raw file):

    print('...Saved composed image to:', outImageFile+'.png')

    #-------------

remove this line


scripts/preprocess.py, line 127 at r1 (raw file):

    FLAGS.nClassesIn = len(classesClearText)
    labels5 = pack(classesClearText.tolist(), labels, FLAGS)
    

remove trailing whitespaces


scripts/preprocess.py, line 141 at r1 (raw file):

    length = len(labels)
    result = [ np.NaN ] * length * nClasses
    

remove trailing whitespaces


scripts/preprocess.py, line 197 at r1 (raw file):

        print('Error, unrecognized flags:', unparsed)
        exit(-1)
        

remove trailing spaces


Comments from Reviewable

@kaihuchen
Copy link
Contributor Author

kaihuchen commented Sep 25, 2017 via email

@nsthorat
Copy link
Contributor

You just need to commit those changes to this branch and I'll see them. I'll then be able to merge!

@kaihuchen
Copy link
Contributor Author

The requested changes have been committed.

@nsthorat
Copy link
Contributor

Hi Kaihu,

It looks like the changes didn't get committed (I still see some lines with just white space), was it committed to another branch?

Thanks! :)


Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


Comments from Reviewable

@kaihuchen
Copy link
Contributor Author

My bad! Commited again, please check.

@nsthorat
Copy link
Contributor

nsthorat commented Oct 1, 2017

:lgtm_strong:


Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@nsthorat nsthorat merged commit 109574d into tensorflow:master Oct 1, 2017
mnottheone pushed a commit to mnottheone/deeplearnjs that referenced this pull request Dec 1, 2018
* Create preprocess.py

This program converts input images into a format suitable for use with deeplearnjs.

Before running this program:
   - Create a directory structure as follows:
     <topDir>
         preprocess.py
           <yourProjectDir>
               <yourImageDir>    # default to 'images'
               <yourImageDir2>
   - Put all images under <yourImageDir>
       (or if using your own directory name, i.e., <yourImageDir>)
           specify it using the --path parameter from command line).
   - You may have <yourImageDir2>, <yourImageDir3>, etc., to facilitate experimentations.
   - Each image must be prefixed with its class label, followed by '_'.
       For example, cat_image00005.jpg
To run:
   1.  $ cd <yourProjectDir>
   2a. $ python ../preprocess.py; or
   2b. $ python ../preprocess.py --outimgs newimgs     #if prefering non-default parameters

Results: find in the current directoty an image file 'images.png' (extension '.png' added automatically), and a labels file 'labels' (or per command line options)

Note:
   - Make sure that the model-builder-datasets-configuration's data.labels.shape matches the number of classes found in data
   - Make sure that the NN model's output layer matches the number of classes
To-do:
   - This code is not suitable for processing large number of images.
   - Tested with python v2.7. Saw some problem with V3.5

* Update and rename preprocess.py to process_images_for_training.py

* Update process_images_for_training.py

* Updated process_images_for_training.py

* Merge branch 'master' into patch-1

* Update process_images_for_training.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants