Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: the input array must have size 3 along channel_axis, got (923, 600) #154

Open
selected-pixel-jameson opened this issue Mar 3, 2022 · 15 comments

Comments

@selected-pixel-jameson
Copy link

selected-pixel-jameson commented Mar 3, 2022

When performing a search on the database I'm getting this error.
Screen Shot 2022-03-03 at 7 55 31 AM

I'm running Python 3.8 and had to update the packages to get them to compile. I had to use 3.8 because I'm running on a Macbook Pro M1 and anything lower will not compile. When I use this version of python I have to install the latest version of scikit-image or it won't compile when trying to install the package via pip.

These are the package versions that I have

elasticsearch==8.0.0
scikit-image==0.19.2
six==1.15.0
flask==2.0.3

In order to fix this issue I had to change the following lines of code.

Screen Shot 2022-03-03 at 7 53 15 AM

When I search for the exact image that is in the path it does not return accurate results.

Any guidance would be greatly appreciated! Thank you.

@selected-pixel-jameson
Copy link
Author

Upon further investigation it appears as if this error is being thrown when calling

return rgb2gray(image_or_path) on line 257 in the goldberg.py

Screen Shot 2022-03-03 at 8 42 36 AM

Thanks again!

@selected-pixel-jameson
Copy link
Author

I've got my environment setup using scikit-image==0.17.2 now and I'm able to run the code as is from the master branch now. However, search results still are not coming back correct.

I'm using ElasticSearch version 7.16 and using the elasticsearch 8.0.0 Python library.

I'm simply trying to get the image indexed to return as a result when searching and similar images, but the images returned are not close to similar or exactly matching at all.

@selected-pixel-jameson
Copy link
Author

FYI, if there is someone out there who would be interested in helping me with this I'd be more then willing to tip them or some other sort of payment.

@selected-pixel-jameson
Copy link
Author

I changed the elasticsearch version to 7.0.0 and reindexed all the images after I got the scikit-image version to 0.17.2 and it is now working! 🎆

@selected-pixel-jameson
Copy link
Author

selected-pixel-jameson commented Mar 4, 2022

Upon further test this seems to be good at finding the exact same image and images with extremely slight variations, but it seems like it runs into issues when possibly color variations start coming into play? Here is an example.

This is the indexed image.

This is the image I'm using to search with.

While results come back they don't seem accurate at all. This is the first result that comes back.

0.5466048990253396 (Dist)

As I stated before if anyone has some insight into how to get adjust these I'd be willing to compensate them for their time. Really need a solid solution for this.

@selected-pixel-jameson
Copy link
Author

Interestingly enough this seems to be the same issue logged in this issue #64.

I'm unable to determine if it was actually resolved or not though.

@selected-pixel-jameson
Copy link
Author

It seems now that the issue is the score that is being associated with the results.
If I change the result size to 10,000 the correct images come back. They have the best dist value out of all the images, but many of the other images have a higher score.

Not sure how to resolve this one yet.

@selected-pixel-jameson
Copy link
Author

I do not fully understand the vector / word storage in elasticsearch yet, but I'm starting to think that the logic behind this entire project is flawed unless the score is not accurately being calculated for some reason on my end of things.

@selected-pixel-jameson
Copy link
Author

I've implemented a dense_vector field on the elasticsearch index at this point and have changed the query for searching to

body = {
            "query": {
                "script_score": {
                    "query": {
                        "bool": {
                            "filter": {
                                "term": {
                                    "image.metadata.isKey": "true"
                                }
                            }
                        }
                    },
                    "script": {
                        "source": "1 / (1 + l2norm(params.queryVector, 'image_dense_vector'))",
                        "params": {
                            "queryVector": dense_vector
                        }
                    }
                }
            }
        }

Unfortunately this is still not returning an accurate result for the example provided above.

@selected-pixel-jameson
Copy link
Author

At this point I'm unable to figure this out. Ultimately the issue boils down to the fact that the score that is being generated is inaccurate when compared to the distance and the results are being ordered by the distance after the query has run.

At this point I think the only option is to literally pull all 500,000 results that will ultimately be in the database as the elasticsearch word query seems inaccurate and I'm unable to get the dense vector object to work.

@A7-4real
Copy link

channel axis (error) - make_record - error

@selected-pixel-jameson I got the same error while performing search on image-search, elasticsearch as a database running on docker

@A7-4real
Copy link

corrected search_image function

by changing first argument as "path" from "transformed_img" while make_record function in line "273" solves the issue

from this --> l = self.search_single_record(transformed_record, pre_filter=pre_filter)
to --> l = self.search_single_record(path, pre_filter=pre_filter)

as you can see in make_record function -
def make_record(path, gis, k, N, img=None, bytestream=False, metadata=None):

first argument is expected to be "path"

@A7-4real
Copy link

Test Dataset -

test_dataset_faces

Results -
Reverse image_search result for women2 with filter 3 after adding filter 1 cmd
Reverse image_search result for women2 with filter 3 after adding filter 2 cmd

After solving the issue the results are pretty awesome!

@A7-4real
Copy link

But it still not able to perform search on different orientation of image. What are your views on this?

@selected-pixel-jameson
Copy link
Author

selected-pixel-jameson commented May 6, 2022

@A7-4real I'll have to investigate your solution further. Would require me to revert some changes I've made. Did you ever get the orientation piece working? That part is something I really need to get working.

However, I would point out that the issue I'm having is not something you are going to see with a small dataset.

I ended up reworking the search_single_record function to this.

    def search_single_record(self, rec, pre_filter=None):
        path = rec.pop('path')
        signature = rec.pop('signature')
        if 'metadata' in rec:
            rec.pop('metadata')

        dense_vector = rec['image_dense_vector']
        rec.pop('image_dense_vector')
        # build the 'should' list
        should = [{'term': {'{}.{}'.format(self.doc_type, word): rec[word]}} for word in rec]
        body = {
            'query': {
                   'bool': {'should': should}
            },
            '_source': {'excludes': ['{}.simple_word_*'.format(self.doc_type)]}
        }

        body = {
            "query": {
                "script_score": {
                    "query": {
      
                            "exists": {
                                "field": "image.metadata"
                            }
                        
                    },
                    "script": {
                        "source": "1 / (1 + l2norm(params.queryVector, 'image_dense_vector'))",
                        "params": {
                            "queryVector": dense_vector
                        }
                    }
                }
            }
        }
       

        if pre_filter is not None:
            body['query']['bool']['filter'] = pre_filter
    
        res = self.es.search(index=self.index,
                              body=body,
                              size=self.size,
                              timeout=self.timeout)['hits']['hits']

        sigs = np.array([x['_source'][self.doc_type]['signature'] for x in res])

        if sigs.size == 0:
            return []

        dists = normalized_distance(sigs, np.array(signature))

        formatted_res = [{'id': x['_id'],
                          'score': x['_score'],
                          'metadata': x['_source'][self.doc_type].get('metadata'),
                          'path': x['_source'][self.doc_type].get('url', x['_source'][self.doc_type].get('path'))}
                         for x in res]

        for i, row in enumerate(formatted_res):
            row['dist'] = dists[i]
        formatted_res = filter(lambda y: y['dist'] < self.distance_cutoff, formatted_res)

        return formatted_res

I've been getting pretty accurate results for one of the purposes I need this for, but when the images are off or skewed or distorted it doesn't work very well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants