Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal Reader having issues with anything but image paths #77

Open
mintary opened this issue Aug 16, 2024 · 1 comment
Open

Universal Reader having issues with anything but image paths #77

mintary opened this issue Aug 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@mintary
Copy link

mintary commented Aug 16, 2024

Hi there, forewarning that I have very little experience with image processing and Torch. I have not touched the configuration files at the moment. Currently trying to pass URLs to the analyzer, but I keep running into the following error:

"File \"C:\\Users\\win1\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\torchvision\\transforms\\_functional_tensor.py\", line 926, in normalize 
if std.ndim == 1:        
std = std.view(-1, 1, 1)
return tensor.sub_(mean).div_(std)
~~~~~~~~~~~ <--- HERE
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1"

Replacing the configuration variables with their values for bath_size, fix_img_size, return_img_data, and include_tensors, this is my current endpoint:

@app.post("/predict/")
async def predict(url: str):
    try: 
        response = analyzer.run(
            image_source=url,
            batch_size=8,
            fix_img_size=True,
            return_img_data=False,
            include_tensors=True,
            path_output=None,
        )

Here's an example output:

{"asctime": "2024-08-16 14:57:30,765", "levelname": "INFO", "message": "Running FaceAnalyzer", "taskName": "Task-12"}
{"asctime": "2024-08-16 14:57:30,766", "levelname": "INFO", "message": "Reading image", "taskName": "Task-12", "input": "https://image-cdn.essentiallysports.com/wp-content/uploads/20200606234527/the-rock-dwayne-johnson-muscles-740x662.png"}
{"asctime": "2024-08-16 14:57:33,463", "levelname": "INFO", "message": "Detecting faces", "taskName": "Task-12"}
INFO:     127.0.0.1:63490 - "POST /predict/?url=https%3A%2F%2Fimage-cdn.essentiallysports.com%2Fwp-content%2Fuploads%2F20200606234527%2Fthe-rock-dwayne-johnson-muscles-740x662.png HTTP/1.1" 500 Internal Server Error

I also tried to pass a PIL Image.Image object directly (again, I need to first convert this into RGB form which makes sense, the same dimension-matching error pops up if I do not), but despite the type matching, it appears that the detector is not finding any faces, i.e.:

@app.post("/predict/")
async def predict(file: UploadFile = File(...)):
    try: 
        image = Image.open(BytesIO(await file.read())).convert('RGB')

        response = analyzer.run(
            image_source=image,
            batch_size=8,
            fix_img_size=True,
            return_img_data=False,
            include_tensors=True,
            path_output=None,
        )

With output:

{"asctime": "2024-08-16 15:21:51,393", "levelname": "INFO", "message": "Running FaceAnalyzer", "taskName": "Task-7"}
{"asctime": "2024-08-16 15:21:51,393", "levelname": "INFO", "message": "Reading image", "taskName": "Task-7", "input": "<PIL.Image.Image image mode=RGB size=978x605 at 0x2446A6248C0>"}
{"asctime": "2024-08-16 15:21:51,678", "levelname": "INFO", "message": "Detecting faces", "taskName": "Task-7"}
{"asctime": "2024-08-16 15:22:00,407", "levelname": "INFO", "message": "Number of faces: 0", "taskName": "Task-7"}
Response(faces=[], version='0.5.0')

Would appreciate any help! The reader works perfectly fine if an image path is given.

@tomas-gajarsky tomas-gajarsky added the bug Something isn't working label Nov 9, 2024
@tomas-gajarsky
Copy link
Owner

Thank you for the detailed report! The runtime error seems to stem from mismatched image channels, likely due to an alpha channel (RGBA) in some inputs. While you correctly convert PIL images to RGB, this might not be happening for images loaded from URLs. I’ve updated the read_pil_image and read_numpy_array methods in version 0.5.1 to ensure all inputs are properly converted to RGB before processing, which should resolve this issue.

Regarding the detector not finding faces: This could be due to the current configuration of the RetinaFace detector postprocessor. I suggest trying different settings to better match your use case. Specifically, you might want to adjust the following parameters in your configuration file:

  • confidence_threshold: Increase it to filter out low-confidence detections.
  • score_threshold: Experiment with lowering it to allow detections with lower confidence scores.
  • expand_box_ratio: Increase it slightly if faces near the image edges are being missed.

These adjustments can help fine-tune detection results. Please update to version 0.5.1 and let me know if these changes help or if you need further guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants