Universal Reader having issues with anything but image paths #77

mintary · 2024-08-16T19:29:22Z

Hi there, forewarning that I have very little experience with image processing and Torch. I have not touched the configuration files at the moment. Currently trying to pass URLs to the analyzer, but I keep running into the following error:

"File \"C:\\Users\\win1\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\torchvision\\transforms\\_functional_tensor.py\", line 926, in normalize 
if std.ndim == 1:        
std = std.view(-1, 1, 1)
return tensor.sub_(mean).div_(std)
~~~~~~~~~~~ <--- HERE
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1"

Replacing the configuration variables with their values for bath_size, fix_img_size, return_img_data, and include_tensors, this is my current endpoint:

@app.post("/predict/")
async def predict(url: str):
    try: 
        response = analyzer.run(
            image_source=url,
            batch_size=8,
            fix_img_size=True,
            return_img_data=False,
            include_tensors=True,
            path_output=None,
        )

Here's an example output:

{"asctime": "2024-08-16 14:57:30,765", "levelname": "INFO", "message": "Running FaceAnalyzer", "taskName": "Task-12"}
{"asctime": "2024-08-16 14:57:30,766", "levelname": "INFO", "message": "Reading image", "taskName": "Task-12", "input": "https://image-cdn.essentiallysports.com/wp-content/uploads/20200606234527/the-rock-dwayne-johnson-muscles-740x662.png"}
{"asctime": "2024-08-16 14:57:33,463", "levelname": "INFO", "message": "Detecting faces", "taskName": "Task-12"}
INFO:     127.0.0.1:63490 - "POST /predict/?url=https%3A%2F%2Fimage-cdn.essentiallysports.com%2Fwp-content%2Fuploads%2F20200606234527%2Fthe-rock-dwayne-johnson-muscles-740x662.png HTTP/1.1" 500 Internal Server Error

I also tried to pass a PIL Image.Image object directly (again, I need to first convert this into RGB form which makes sense, the same dimension-matching error pops up if I do not), but despite the type matching, it appears that the detector is not finding any faces, i.e.:

@app.post("/predict/")
async def predict(file: UploadFile = File(...)):
    try: 
        image = Image.open(BytesIO(await file.read())).convert('RGB')

        response = analyzer.run(
            image_source=image,
            batch_size=8,
            fix_img_size=True,
            return_img_data=False,
            include_tensors=True,
            path_output=None,
        )

With output:

{"asctime": "2024-08-16 15:21:51,393", "levelname": "INFO", "message": "Running FaceAnalyzer", "taskName": "Task-7"}
{"asctime": "2024-08-16 15:21:51,393", "levelname": "INFO", "message": "Reading image", "taskName": "Task-7", "input": "<PIL.Image.Image image mode=RGB size=978x605 at 0x2446A6248C0>"}
{"asctime": "2024-08-16 15:21:51,678", "levelname": "INFO", "message": "Detecting faces", "taskName": "Task-7"}
{"asctime": "2024-08-16 15:22:00,407", "levelname": "INFO", "message": "Number of faces: 0", "taskName": "Task-7"}
Response(faces=[], version='0.5.0')

Would appreciate any help! The reader works perfectly fine if an image path is given.

The text was updated successfully, but these errors were encountered:

tomas-gajarsky · 2024-11-23T10:27:23Z

Thank you for the detailed report! The runtime error seems to stem from mismatched image channels, likely due to an alpha channel (RGBA) in some inputs. While you correctly convert PIL images to RGB, this might not be happening for images loaded from URLs. I’ve updated the read_pil_image and read_numpy_array methods in version 0.5.1 to ensure all inputs are properly converted to RGB before processing, which should resolve this issue.

Regarding the detector not finding faces: This could be due to the current configuration of the RetinaFace detector postprocessor. I suggest trying different settings to better match your use case. Specifically, you might want to adjust the following parameters in your configuration file:

confidence_threshold: Increase it to filter out low-confidence detections.
score_threshold: Experiment with lowering it to allow detections with lower confidence scores.
expand_box_ratio: Increase it slightly if faces near the image edges are being missed.

These adjustments can help fine-tune detection results. Please update to version 0.5.1 and let me know if these changes help or if you need further guidance!

tomas-gajarsky added the bug Something isn't working label Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Universal Reader having issues with anything but image paths #77

Universal Reader having issues with anything but image paths #77

mintary commented Aug 16, 2024

tomas-gajarsky commented Nov 23, 2024

Universal Reader having issues with anything but image paths #77

Universal Reader having issues with anything but image paths #77

Comments

mintary commented Aug 16, 2024

tomas-gajarsky commented Nov 23, 2024