-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Debian packaging #254
Comments
@cdluminate Thank you for your interest in BLIS. We have been eagerly waiting for the right person from the Debian community to come along and "sponsor" our project in the Debian/Ubuntu universe. :)
My first instinct is to answer with an emphatic "yes." However, the framing of your question is ultimately subjective, and also dependent on the actual implementation referred to by The lead developers of the BLIS project exercise great care in their approach to software development. One of our guiding principles is to try to "get it right" the first time so we are much less likely to have to revisit/fix it in the future, and we believe this methodical approach pays dividends in long run. We still make mistakes from time to time, but I think the public record here on github shows that we are quite responsive to the community's feedback, especially with bug reports. We sometimes fix issues within hours of them being reported, and BLIS has multiple tools for checking correctness at our disposal, including a comprehensive BLIS testsuite, a C translation of the netlib BLAS test drivers, and integration with Travis CI that uses the former two mechanisms to test multiple hardware configurations via Intel's software development emulator (SDE). So, to conclude, the answer to your question is "probably," but it depends on what you expect of
Yes. We have performed many performance experiments that compare BLIS against OpenBLAS. Last week, at our annual BLIS Retreat--a workshop here at UT-Austin centered around BLIS-related topics--Devangi Parikh (@dnparikh) presented performance results for multiple level-3 BLAS operations, floating-point datatypes, and problem sizes, and she did so for Intel Haswell/Broadwell, Intel SkylakeX, and Cavium ThunderX2 (ARMv8). The overall story of the performance results is that BLIS is remarkably competitive and consistent in its performance. Devangi: could you link our guest to PDFs of your graphs from the Retreat? Also calling out to @nschloe so he can comment about BLIS in general, if he likes. |
I found BLIS as I was looking for BLAS operations on C-ordered arrays for NumPy. BLIS has that, but even better is the fact that it's developed in the open using a more modern language than Fortran.
Plots about that should definitely go into the main README. |
@nschloe Thanks for your comments, Nico. I agree that the time has come for us to include some basic plots in the source distribution. |
Comments:
Robert |
@fgvanzee This is the background for my first question: Debian/Ubuntu have an alternatives system, by which the user can switch the BLAS implementation smoothly without recompiling any software, e.g.
All the alternative candidates for the If BLIS's CBLAS implementation is mature enough, it could be used as a drop-in replacement of
That's good to hear.
I have enough permission to upload package to Debian. However when Debian is about to release a new version, .e.g.
It won't be hard for me to update a Debian package as long as there is neither significant ABI/API change, nor significant change in build system.
+1 And thanks @rvdg for the plot. |
One more question about packaging: As shown in #255, BLIS has the best performance with openmp threading, as long as
Is this correct? And does it make sense to provide another |
Seems right, yes. You may want to do a thorough review of all configure options. For example, the BLAS integer size option is important to some people.
Could you clarify this question? |
@fgvanzee I spotted significant performance drop when a program used iomp and gomp at the same time. Besides, Intel MKL provides the sequential threading (single thread) module.
|
@cdluminate Sounds like you are trying to link your @devinamatthews @jeffhammond You guys have more experience with using EDIT: BTW everyone, I'm taking the day off today. :) Thanks for your patience, @cdluminate . Hopefully others can step in and help us figure out your issues. |
@cdluminate I would guess that when using both iomp5 and gomp at the same time that you are ending up with N^2 threads. What happens when you run with OMP_NUM_THREADS=N and BLIS_NUM_THREADS=1 (assuming there is meaningful multithreading in the calling program)? But, even if it "works" right now, mixing two different OpenMP runtimes is a recipe for disaster. In the context of a Debian package, I would think that gomp would make a sensible default. I guess that pthreads would be even better but as noted above we don't have a thread pool implementation and so performance suffers. |
Intel OpenMP runtime defines the GOMP API so if you link it into a GCC program, you should not end up linking against |
@fgvanzee @devinamatthews @jeffhammond Thanks for the pointers. But I'm sorry that observation was found in one of my old cxx code that doesn't use BLIS ... and using different threading libraries at the same time resulted in creation of too many threads. GCC uses gomp by default but clang doesn't ... Anyway I think the debian package should ship with openmp version first. |
Just a NOTE: I registered libblis as an alternative candidate to libblas.so.3 and libblas.so, and assigned blis with priority 37. I would be happy to increase the priority of BLIS to higher value if there are strong proof that suggests BLIS is better than OpenBLAS, in terms of cblas_* performance. The default priority chain for Debian and Ubuntu will look like this:
Please ignore the lowest priority |
@cdluminate That's great news. We appreciate your willingness to include BLIS in the Debian universe. I'm sure many of our users will be pleased to be able to enable BLIS via the distribution's internal package management tools.
It all depends on you hardware, operation, datatype, problem size, and how much threading you need, and maybe other factors as well. That said, Robert has pointed you to evidence that we gathered very recently on ThunderX2 and SkylakeX--two architectures designed for high performance. (Devangi didn't include OpenBLAS on the Haswell graphs because we couldn't get it working in multithreaded mode, despite intense efforts by deeply experienced individuals.) For now, OpenBLAS outperforms BLIS for small problems (less than about 100), but that is one of the only use cases for which OpenBLAS outpaces BLIS. The evidence suggests that, for almost all larger problems, almost all operations of almost all datatypes seem to yield better performance when executed via BLIS than with OpenBLAS. Also consider that there are other metrics by which to measure "goodness" or "betterness" than raw performance. We believe strongly that BLIS stacks up well against OpenBLAS on virtually all of these other measures of software quality. Most important among these metrics: BLIS provides BLAS (and CBLAS) APIs, but unlike most other BLAS libraries, BLIS provides much more than just BLAS APIs. (The BLAS APIs are quite limiting for many individuals and applications, and BLIS contains APIs that attempt to break free of those limitations and expand the space of parameterization and storage formats.) For a full list of features that BLIS provides, I invite you to read our main github page, whose content is stored in the |
@cdluminate Clang uses the LLVM OpenMP runtime, which is the Intel OpenMP runtime and thus contains the GOMP symbols so it too should not lead to O(n^2) threads when layered with GCC. Pthreads+OpenMP can definitely hit this, however, and it is the expected behavior. |
For interested parties, the GOMP symbols in KMP (Intel/LLVM OpenMP runtime) can be observed here: https://github.com/llvm-mirror/openmp/blob/master/runtime/src/kmp_gsupport.cpp |
Another question: What's the correct configuration parameter to compile the 64-bit-index version of BLIS (or in MKL's term, ILP64 interface)? Is it |
It depends on what you want. If you only care about the integer size in the BLAS API, then you only need |
One could argue that |
Could you explain the precise circumstances under which |
@fgvanzee Do you support the case where a 64-bit BLAS integer exceeds (e.g. dot product on a vector of 17 GB of floats) |
@jeffhammond No. I don't do anything fancy--just regular typecasts. I assume the developer/user knows what he's doing when he assigns, implicitly or explicitly, the integer size of both BLAS and BLIS integers. |
@fgvanzee I encourage you to take a random sample of computational scientists you encounter in ICES or online to see if that assumption is valid. There is a use case for dangerous truncation, but it is a dubious one. NWChem always uses 64-bit integers because these are used as offsets into distributed arrays (i.e. Global Arrays) and it is easy to have 1D arrays that contain more than A good compromise here is to disallow 64b BLAS + 32b BLIS by default but have an override (e.g. |
Thanks Jeff. I've opened issue #274 to track this. |
update: blis 0.5.0 was built for six architectures on Ubuntu disco. The packaging has been significantly changed. In short, the source package yields the following binary packages:
BLIS has been compiled in 6 different configurations as above. The 64bit variants are similar to those with 32bit indices. And it's soname has been modified to If this looks good to you, I'll upload the package to Debian unstable shortly after your ack. |
@cdluminate This all sounds great. Thanks so much for your contributions. If I were able to put together a bugfix release (0.5.1) in short order (24 hours?), would it be better for me to do that before you move forward so we can get all the latest commits into the Debian package? (Hopefully slipping in a new version prior to upload won't be too disruptive to you.) |
@fgvanzee Just take your time. I'm fine to upload 0.5.1 again after 24 hours. What I concern is that, if 0.5.0 has severe regression bug or something alike, please tell me to stop uploading that. |
@cdluminate Thanks. I don't think there are any really bad bugs in 0.5.0--mostly they are more benign improvements--but I'd have to check during my commit review (which is when I write the ReleaseNotes entry) to have a better sense of what bugs were fixed. |
https://ftp-master.debian.org/new/blis_0.5.0-1.html Uploaded and pending for ftp team to review. |
@cdluminate Sounds good, thanks! I'm working towards 0.5.1 today, tomorrow at the latest. |
FYI: BLIS was added to the Gentoo main repository: gentoo/gentoo@5c3ae58 |
@cdluminate Very cool. Thanks for letting us know! |
I see Nico is working on BLIS packaging. I'm interested in packaging BLIS for Debian based on Nico's work. However I have some questions before doing the actual work:
The text was updated successfully, but these errors were encountered: