Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem with UMat #488

Closed
Daniel-Alievsky opened this issue Dec 6, 2017 · 22 comments
Closed

Performance problem with UMat #488

Daniel-Alievsky opened this issue Dec 6, 2017 · 22 comments

Comments

@Daniel-Alievsky
Copy link

I believe that UMat matrix processing should work essentially faster on a computer with GPU. I tested GaussianBlur function with 1000x1000 RGB image, kernel 91x91, and in C++ (Windows) the difference is impressive: Mat is processed in 100 ms, UMat in 6 ms. (On a computer of my colleague, UMat is processed even in ~1 ms, while Mat in the same time 100 ms.)

Javacpp OpenCV bridge supports UMat. But I don't see any difference in speed! (javacpp-presets, opencv, 3.3.1-1.3.4-SNAPSHOT) Why? Maybe the reason is too "conservative" build of OpenCV dlls inside Windows JAR opencv-3.3.1-1.3.4-20171206.015716-63-windows-x86_64.jar, where GPU and multikernel CPU optimization is disabled? If so, maybe, you can provide alternative builds (for example, with other "classifier" or other artifactId) where all possible optimization is enabled?

Below is my test:

`
public final class SimpleJavaCPPOpenCV {
private static final int KERNEL_SIZE = 91;

public static void main(String[] args) {
    if (args.length == 0) {
        System.out.printf("Usage: %s source_image%n", SimpleJavaCPPOpenCV.class);
        return;
    }
    final String sourceFile = args[0];
    final opencv_core.Mat mat = imread(sourceFile);
    final opencv_core.Mat result = mat.clone();
    final opencv_core.Size ksize = new opencv_core.Size(KERNEL_SIZE, KERNEL_SIZE);
    opencv_imgproc.GaussianBlur(mat, result, ksize, 0.0); // - warming
    long t1 = System.nanoTime();
    for (int k = 0; k < 10; k++) {
        opencv_imgproc.GaussianBlur(mat, result, ksize, 0.0);
    }
    long t2 = System.nanoTime();
    System.out.printf("Mat %s (at 0x%x) blurred by %dx%d in %.3f ms%n",
        mat, mat.address(), ksize.width(), ksize.height(), (t2 - t1) * 1e-6 / 10);

    opencv_core.UMat umat = mat.getUMat(ACCESS_READ);
    opencv_core.UMat uresult = umat.clone();
    opencv_imgproc.GaussianBlur(umat, uresult, ksize, 0.0); // - warming
    t1 = System.nanoTime();
    for (int k = 0; k < 10; k++) {
        opencv_imgproc.GaussianBlur(umat, uresult, ksize, 0.0);
    }
    t2 = System.nanoTime();
    System.out.printf("UMat %s (at 0x%x) blurred by %dx%d in %.3f ms%n",
        umat, umat.address(), ksize.width(), ksize.height(), (t2 - t1) * 1e-6 / 10);
    imwrite(sourceFile + ".javacpp.ublur.png", uresult);

    opencv_imgproc.cvtColor(mat, mat, opencv_imgproc.COLOR_BGR2GRAY);
    t1 = System.nanoTime();
    for (int k = 0; k < 10; k++) {
        opencv_imgproc.GaussianBlur(mat, result, ksize, 0.0);
    }
    t2 = System.nanoTime();
    System.out.printf("Mat %s (at 0x%x) blurred by %dx%d in %.3f ms%n",
        mat, mat.address(), ksize.width(), ksize.height(), (t2 - t1) * 1e-6 / 10);


    umat = mat.getUMat(ACCESS_READ);
    uresult = umat.clone();
    t1 = System.nanoTime();
    for (int k = 0; k < 10; k++) {
        opencv_imgproc.GaussianBlur(umat, uresult, ksize, 0.0);
    }
    t2 = System.nanoTime();
    System.out.printf("UMat %s (at 0x%x) blurred by %dx%d in %.3f ms%n",
        umat, umat.address(), ksize.width(), ksize.height(), (t2 - t1) * 1e-6 / 10);
    imwrite(sourceFile + ".javacpp.ublur.gray.png", uresult);
}

}
`

And its results of my computer for 1000x1000 RGB test image:

Mat org.bytedeco.javacpp.opencv_core$Mat[width=1000,height=1000,depth=8,channels=3] (at 0xb7f310) blurred by 91x91 in 102.501 ms
UMat org.bytedeco.javacpp.opencv_core$UMat[address=0xb8d040,position=0,limit=1,capacity=1,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0xb8d040,deallocatorAddress=0x7ff8604663e0]] (at 0xb8d040) blurred by 91x91 in 99.260 ms
Mat org.bytedeco.javacpp.opencv_core$Mat[width=1000,height=1000,depth=8,channels=1] (at 0xb7f310) blurred by 91x91 in 32.711 ms
UMat org.bytedeco.javacpp.opencv_core$UMat[address=0xb8d760,position=0,limit=1,capacity=1,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0xb8d760,deallocatorAddress=0x7ff8604663e0]] (at 0xb8d760) blurred by 91x91 in 32.568 ms

@saudet
Copy link
Member

saudet commented Dec 6, 2017

We would probably need to enable OpenCL to let is use the GPU. It is indeed something that is possible with the new "extensions" feature, see #416, for example, but someone needs to take the time and make it happen.

@Daniel-Alievsky
Copy link
Author

Sorry, I didn't understand well how can I use this feature. Should I add anything to my pom file? Currently I have

        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>opencv</artifactId>
            <version>3.2.0-1.3</version>
        </dependency>
        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>opencv</artifactId>
            <classifier>linux-x86_64</classifier>
            <version>3.2.0-1.3</version>
        </dependency>
        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>opencv</artifactId>
            <classifier>windows-x86_64</classifier>
            <version>3.2.0-1.3</version>
        </dependency>

(or something similar for 3.3.1). What must I change?

@saudet
Copy link
Member

saudet commented Dec 6, 2017

Like I said, someone needs to work on that before we can use it...

@Daniel-Alievsky
Copy link
Author

Ok, thank you. I'll appreciate if you will tell me here when this issue will be resolved.

@saudet
Copy link
Member

saudet commented Dec 18, 2017

I've merged support for OpenCL with commit 681ca06 .

@Daniel-Alievsky
Copy link
Author

Daniel-Alievsky commented Dec 18, 2017

I've re-checked the speed with the current snapshot
3.3.1-1.3.4-SNAPSHOT
However, the results are the same: speed of UMat is identical to speed of Mat.

@saudet
Copy link
Member

saudet commented Dec 18, 2017 via email

@Daniel-Alievsky
Copy link
Author

It is not so urgent, but I'm interested in stable solution, which will work from Java with maven Javacpp modules. Can I hope that this will be resolved soon?

@saudet
Copy link
Member

saudet commented Dec 18, 2017 via email

@saudet
Copy link
Member

saudet commented Dec 21, 2017

Binaries are now available! You can test them with something like this in your pom.xml file:

        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>opencv-platform</artifactId>
            <version>3.3.1-1.3.4-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>opencv</artifactId>
            <version>3.3.1-1.3.4-SNAPSHOT</version>
            <classifier>linux-x86_64-gpu</classifier>
        </dependency>
        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>opencv</artifactId>
            <version>3.3.1-1.3.4-SNAPSHOT</version>
            <classifier>windows-x86_64-gpu</classifier>
        </dependency>

@Daniel-Alievsky
Copy link
Author

I've switched to this version in POM and rechecked my test from IntelliJ IDEA.
The results are the same:

Mat org.bytedeco.javacpp.opencv_core$Mat[width=1000,height=1000,depth=8,channels=3] (at 0xafde10) blurred by 91x91 in 108.459 ms
UMat org.bytedeco.javacpp.opencv_core$UMat[address=0xb0c560,position=0,limit=1,capacity=1,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0xb0c560,deallocatorAddress=0x7ffc3ede69b0]] (at 0xb0c560) blurred by 91x91 in 99.832 ms
Mat org.bytedeco.javacpp.opencv_core$Mat[width=1000,height=1000,depth=8,channels=1] (at 0xafde10) blurred by 91x91 in 36.198 ms
UMat org.bytedeco.javacpp.opencv_core$UMat[address=0xb0c740,position=0,limit=1,capacity=1,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0xb0c740,deallocatorAddress=0x7ffc3ede69b0]] (at 0xb0c740) blurred by 91x91 in 32.259 ms

What else should I do to enabled GPU support?

@saudet
Copy link
Member

saudet commented Dec 22, 2017 via email

@saudet
Copy link
Member

saudet commented Dec 22, 2017

And we might also need to call setUseOpenCL(true), but I forgot to include ocl.hpp. Here it is: d41b05e
If haveOpenCL() returns false though, it probably means that your driver doesn't support OpenCL.

@Daniel-Alievsky
Copy link
Author

I've reloaded maven library, but I cannot find Java methods called "haveOpenCL()" or "setUseOpenCL".

Also, what are OpenCL drivers and how must I specify them while using JavaCPP opencv? I have native C++ software that successfully use opencv UMat with high performance. It works without any problems on my Windows 8.1.

@Daniel-Alievsky
Copy link
Author

Daniel-Alievsky commented Dec 22, 2017

For comparison, we've created very simple C++ test:

// gauss.cpp
//

#include "stdafx.h"

#include <iostream>
#include <chrono>

#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc/imgproc.hpp>

int main() {
	std::cout << "reading file 'example.png'" << std::endl;
	cv::Mat cvMat = cv::imread("example.png");
	cv::Mat cvMatResult;
	int kernel = 91;

	cv::GaussianBlur(cvMat, cvMatResult, cv::Size(kernel, kernel), 0); // warming

	std::cout << "Gauss blur with kernel " << kernel << " with cv::Mat" << std::endl;

	auto clock_t1 = std::chrono::steady_clock::now();
	cv::GaussianBlur(cvMat, cvMatResult, cv::Size(kernel, kernel), 0);
	auto clock_t2 = std::chrono::steady_clock::now(); 
	double clock_mcs = static_cast<double>(std::chrono::duration_cast<std::chrono::milliseconds>(clock_t2 - clock_t1).count()); 
	std::cout << "Mat: " << cvMat.cols << "x" << cvMat.rows << " Duration: " << clock_mcs << "(ms)" << std::endl; 

	cv::imwrite("cvMatResult.jpg", cvMatResult);

	cv::UMat cvUMat = cvMat.getUMat(cv::ACCESS_WRITE);
	cv::UMat cvUMatResult;
	cv::GaussianBlur(cvUMat, cvUMatResult, cv::Size(kernel, kernel), 0); // warming
	std::cout << "Gauss blur with kernel " << kernel << " with cv::UMat" << std::endl;

	clock_t1 = std::chrono::steady_clock::now();
	cv::GaussianBlur(cvUMat, cvUMatResult, cv::Size(kernel, kernel), 0);
	clock_t2 = std::chrono::steady_clock::now();
	clock_mcs = static_cast<double>(std::chrono::duration_cast<std::chrono::milliseconds>(clock_t2 - clock_t1).count());
	std::cout << "UMat: " << cvUMat.cols << "x" << cvUMat.rows << " Duration: " << clock_mcs << "(ms)" << std::endl;

	cv::imwrite("cvUMatResult.jpg", cvUMatResult);

	system("pause");

    return 0;
}

It works fine, without any additional drivers, it requires only opencv_world331.dll
The results for the same picture:

reading file 'example.png'
Gauss blur with kernel 91 with cv::Mat
Mat: 1000x1000 Duration: 98(ms)
Gauss blur with kernel 91 with cv::UMat
UMat: 1000x1000 Duration: 14(ms)
Press any key to continue . . .

@saudet
Copy link
Member

saudet commented Dec 22, 2017

Usually no need to install anything special. It works fine here on Linux:

    public static void main(String[] args) {
        System.out.println(haveOpenCL());
        System.out.println(useOpenCL());
    }

outputs

true
true

@Daniel-Alievsky
Copy link
Author

Where can I find haveOpenCL() and useOpenCL() methods? In which package/class?

@saudet
Copy link
Member

saudet commented Dec 23, 2017 via email

@Daniel-Alievsky
Copy link
Author

Ok, after full reloading new maven libraries (3.3.1-1.3.4-SNAPSHOT, windows-x86_64-gpu) I see these methods. Unfortunately, both methods return false, The direct call "opencv_core.setUseOpenCL(true);" does not help. At the same time, C++ test works fine.

What can I do now? It seems that IntelliJ IDEA cannot completely load something. But in C:\Users\Daniel.m2\repository\org\bytedeco\javacpp-presets\opencv\3.3.1-1.3.4-SNAPSHOT directory I see correct JAR opencv-3.3.1-1.3.4-SNAPSHOT-windows-x86_64.jar with full set of DLLs...

@saudet
Copy link
Member

saudet commented Dec 23, 2017 via email

@Daniel-Alievsky
Copy link
Author

Now it works, thank you! :)

@saudet
Copy link
Member

saudet commented Jan 17, 2018

Version 1.4 has been released with GPU-enabled binaries for OpenCV 3.4.0:
http://search.maven.org/#search%7Cga%7C1%7Cbytedeco%20opencv
Enjoy and thanks for testing this out for me!

@saudet saudet closed this as completed Jan 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants