-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized BlackWhiteRaster/Added Needed RasterablePart Function #102
Conversation
Looks reasonable to me. However since this is a rather big change, did you run some performance comparisons in order to make sure this does not slow down existing drivers significantly? |
You can also just add in the changes for the RasterPart those are just a couple functions that rely on the current functions and state of the boolean. Those can be clearly identified as safe pretty quickly. Gist of text. Source and timing of all dithering algorithms applied to a bunch of real images. tl:dr Dithering will take only 39% as much time on average using the new version of the code as it will using the new version. I think that that's the most time intensive process the raster is asked to do. The timing includes the time to perform the dithering (so really the class itself is going to be even faster). If raster an image with the dither time being noticable you should feel the difference. Going 2.5x faster strongly implies it shouldn't slow down the existing drivers. It should show pretty strong increases in speed doing the core operations of rasterpart too. Namely you're getting scanline data one byte from each array while iterating the scanline. And that's going to turn the memory into a crapshow. There's also some of the faster shorthand functions in the new raster that will outperform RasterPart's code because it cheats and knows how the data is structured. -- Though all the actual raster code seem to use getRasterLine() which will have the memory cache-miss problem. But, admittedly I don't have ample coverage for either testing or profiling the change there. And I should certainly do both before it's accepted. |
Really though, I just wrote a driver that did rastering and I wish the RasterizableJobPart was usable but it was not. In fact it's only used by LaserCutter.java to convert RasterableParts into VectorParts. And I couldn't even just call that function either, I'd have to make a LaserCutter instance and overload getRasterPadding(). I would rejigger those classes. I'd cut RasterizableJobPart and just make it a utility class. It's only actually used by Merely fixing the indexing on the BlackWhiteRaster should actually speed things up. But, if you really wanted it to go fast you'd be better off doing a number of reasonable optimizations. Everybody is calling RasterizablePart.getRasterLine() but regardless of the whole "this save memory allocation" thing it categorically does not. The 800 pound gorilla in that function ByteArrayList actually moving the bytes at all. Really you should return a list that is merely a window into the data that remains where it is. So that getRasterLine() really just gets you a reasonable iterator that you can use the way you need to. That directly calls the functions within the BlackWhiteRaster class rather than copying it off to a sortof strange ArrayList wannabe class. You shouldn't make a copy of the data, that's a huge operation. You should return an List object that simply delegates the get() method to the getByte() method within the raster. But, after a while you're optimizing with diminishing results. But if speed of the rasterpart builders is a thing that concerns you it wouldn't be too hard to fix them. Or expand BlackWhiteRaster to encode greys (after all the difference is implementing bit_depth). It changes the math but it's all still pretty easy. ( I've done it before: https://github.com/K40Nano/K40Nano/blob/master/k40nano/PngRaster.py). Really the import thing would be a utility class that actually helps. It would be nice to fix some of these design issues. That class just kinda hurt me. But, more pressingly the other one made it so my code will run forever to the left even though it knows the laser doesn't need to be there. Because there wasn't anything in the helper bits that helped. Though if you wanna see what I mean by implementing the list itself there, I made an example where I implemented getLine() in BlackWhiteRaster (untested). https://gist.github.com/tatarize/e91c077f8759d9a9b5165b782ec6a6fd You get the List of bytes, but you have access to BlackWhiteRaster.PixelWindow class and you can tell it to give you the next line or reversed next line, or whatever and it's an instant constant time operation as the only thing it does is delegate. |
Though really where your problems actually start is BufferedImageAdapter. If you know what type the BufferedImage make as many calls to getRGB() as you have pixels, just call the bulk function and get all the pixels at once. That will you are tanking your speed there. What you actually want to do is get rid of most of that stuff and just use BufferedImages. Rather than getGreyscale() for that stuff, you can just write make a greyscaled bufferedimage and use that. Also, you use like one method to convert to greyscale there's actually about 6 solid and different algorithms you could use. When you dither, make a dithered black and white bufferedimage. And when you iterate the pixels, write a helper class that gets the underlying bufferedimage memory and just build an interface to the data thing that helps you iterates through the pixels in a highly relevant way. Like how the very lovely VectorCommand api works. That is beautiful. It would be nice if you'd crush up the rasterization into very tiny pieces like that rather than giving all drivers a chance to reinvent the wheel. Rasterizations can need a couple different properties, DPI, buffer_padding, clip_white_pixels from the end, bidirectional/unidirectional, can_skip_blank_lines. Then you feed my driver some commands like:
I'm expected to get lines of bytes and do this stuff myself. As a result my driver is missing some cool features other ones have like the ability to clip off those extra white bytes. In reality with data moved around and refigured this is just a fancy reading operation of the bufferedimage. And can be done without allocating any memory. You only need to allocate memory for the greyscale conversion and the dithering conversion (and really you could chain those together). You shouldn't ever need to copy it again. But, you seem to do no fewer than two extra copying of all the data just to format it in a way that the driver can't actually use very easily. The VectorCommand methodology there is beautiful. The api for the raster should be very similar. And it could be done in what amount to O(1). |
While your code looks good, it changes the output of the Epilog driver, which should not be the case. Saving to file shows different output in the engrave data. Probably you mixed up some index, so that the pixels are not stored correctly. Regarding the discussion on the overall interface structure for raster engraving: Looking at the drivers we already have, they seem to fall into two categories with very few exceptions:
Your K40 seems to be the rare exception with a very large set of low-level instructions that doesn't really fit any of the above, so I wouldn't recommend optimizing the interface too much for your use-case. |
And the timings with getRasterLine() are done. Original: Optimized: The access time for the getRasterLine() is pretty unaffected. |
My testing coverage for getRasterLine() seems pretty low, odds are good I messed up something there. In fact it's almost certain that I messed up the endedness. Yup. Okay, I'll fix that. |
getByte() should read
If you wanna test it before I finish the push. |
The board itself has more in common with some of the other ones that use USB. Like if you adapted to controlling Ruida devices via usb. Some of the core element would be a bit more similar. It's more that the library doesn't do such things currently. It seems alien by exclusion. Ruida does a lot of low level stuff too. It's not really as cryptic, but a USB Ruida driver would include a number of similar things. In fact, if you wanted to control that device properly with LibLaserCut my driver would make for a nicer starting off point. You supported the easier stuff, and that stuff is generally similar. But, a lot of the other stuff will be connecting to this thing sending this weird ass alien format to this place and jiggling the bits like this. It's pretty clear LibLaserCut kinda isn't setup well to expect a driver like the NanoK40 one. It would be entirely reasonable to be a lot more liberal with what you expect. If getting a bunch of bytes for the raster is a thing, any good implementation of RasterPart should let you do that. But, for my uses it woulda been nice to have the option to have it pre-chewed and spoonfed to my driver. Optimizing the backend stuff where it makes too many copies and, accesses BufferImages in wrong ways, that's all just window dressing. Doing the right thing in 200 ms when you could have done it in 2 ms hardly matters if the whole project will take 2,000,000 milliseconds (33 minutes). |
@tatarize thank you for all you work. You seem to be really into that stuff. It's a shame I am only reading through pull requests in my spare time because my day job is far away from visicut instead of having the time to hack on it. |
@t-oster scars of a misspent youth. I spent more Friday nights insuring my functions didn't allocate any unnecessary memory than I care to admit. Still to this day one of my hobbyhorses is the flawed belief that you need to allocate memory to perform a convolution on image data. --- "You fools! That's only a requirement if the results pixel is in the center and that's arbitrary, if you place it in the upper left hand corner of the kernel you can perform the algorithm as a scanline operation within its own memory footprint!" |
With regard to the conversations about the problems with these classes also see #104 it's likely too drastic. But, it show how I would optimally correct the problems. Namely nuke lots of it, move the stuff that's left to Visicut. Then just have RasterPart and Raster3dPart take BufferImages. That way that stuff can be actually sped up, rather than moving a couple loop structures so it fits with memory structure a bit better. |
I tested again and now everything is okay, "save to file" leads exactly the same results. |
Mg's suggestion in the other pull request pegs this as actually low hanging fruit. I'd actually rather a different change that will similarly fix the timing. Namely back it with a BufferedImage wrapped in this class. So just delegate everything properly to a BufferedImage that is created to hold the data. That way if you want you can just get the BlackWhiteRaster.image and draw it to a screen or copy memory from it etc. Seems much easier to optimize against. |
I just remembered why I avoided buffered image in the first place: it is (or was?) not available on Android! So before the changes, a native Android app could use liblasercut,. https://stackoverflow.com/questions/5392695/bufferedimage-in-android |
I write Android too. And have written Java/Android libraries before: My actual reasons to want this change is generally to get rid of the classes altogether. Interoperability with Android would somewhat support this idea. The Bitmap class has other ways to pack the data see (https://developer.android.com/reference/android/graphics/Bitmap.Config.html) the only real overlap with BufferedImage is that Int8888 is the same as INT_ARGB. And you're doing a lot of image manipulation and I sort of think the correct amount to do is none. One of the more pressing differences within Android though, is that memory is at a premium. You sortof can't store more than one bitmap in memory. For a minimum amount of work, I think maybe it should be rewritten so that rasterpart and raster3dpart simply accept an array of bytes and a scanline value. With the expectation it any manipulation has already been done. They could so the highly efficiently data chunking themselves. And image manipulation could be done better and faster and spunoff. Likely through just taking a bunch of ARGB ints and turning them into proper bytes. I'll head that direction in my never to be approved break_everything pull request. -- Also the Usb4Java utility doesn't work great on Android, the security stuff is really tight and basically impossible to do right to run, since it's natively packing up some libusb elements in jars. Android however does have USB. ( https://developer.android.com/guide/topics/connectivity/usb/host ) that would certainly work, and I did strongly muse over the idea of controlling the Stock K40 laser cutter with a usb-otg cable. It might be kind of neat to do. But, for interoperability the K40Usb class within the K40Nano driver would need a second form for working with Android and for both of them to not exist in the same world. Much as my EmbroideryIO project spins off a different version of EmbMatrix for the relevant system as AffineTransform and Matrix don't play nice together. |
However, with that stuff in mind. Don't approve either version. I'll do a quick rerewite to use byte[], scanline, width and bitdepth. It seems like the bitpacking doesn't make much difference. The biggest point is not messing up the scanline order. Also, there's a fun method by which even the K40 can do a 3d raster. Basically even without power control and certainly without wanting to constantly change the speed. I can actually do passes. So the first pass it burns all the pixels between 0-64 of value, then 0-128 of value, then 0-192 of value. I might as well add in that nudge that possibility in there. Even if there's no ultimate desire to add it. |
Okay, I basically implemented a proper raster in raw java. Still stored correctly. But, backed by bytes for raw data chunking. But, also, it can store any bit depth or sample count (that comes in at less than 32). So you could store 17 bit greyscale pixels if you wanted or 10 bit, 3 sample, rgb color. It basically just fixes the locations. You could even tag on some interleaving byte, throw it through zlib and save this thing as a png. (I've implemented PNG before). The important thing is you can have 1 sample 1 bit greys and 1 sample 8 bit greys in the same structure. The fake greyraster stuff at the end there isn't just for show anymore. It actually correctly works. If you tell it you have an bitDepth of 8. Nothing changes except all the greys have that bit depth. When you read the raster line they give you what you'd expect without having the change the structure anywhere. You get that line of 8 pixel bytes and you get that line of 1 pixel 8-bit grey bytes. It's just handing you the bytes. Also, you can have more or less bitDepth (even those that cross byte boundaries). Pure java and will port to anything. You could store RGBA or int8888 in without issue (4 sample, Bitdepth of 8). It's like how I packed it in ints and the previous version packet it in bytes except that it's solved for the general case. |
Also, since it's a real raster, you could actually just use the same routines for RasterPart and Raster3dPart the important thing is the amount of bits per pixel and using those bits to control the properties. You don't actually need RasterPart to give you the depth, the raster itself would have a depth. And I should likely take several of those useful functions in RasterPart and move them to BlackWhiteRaster in a coherent fashion. In 8 bit mode, not only does getRightmostBlackPixel() perform like it's in 1 bit mode, but it doesn't even make sense. It really should be the RightMostNonWhite pixel. |
There's some incongruity cropping up with black = 1 in 1 bit and black = 0 in 8 bit. I'll have to leave it as it is. It would work for a general greyscale raster too which is great but doing that also requires having it know what black is in a consistent manner. There's no piecemeal way to fix that, especially with drivers that take the bytes and send them off to places. |
Graphics should always been iterated looped
y
and thenx
the reason being the memory is laid out sequentially so it is always preferred to read across the scanlines as that is what all modern processors are built around. And most of the graphic formats specifically use scanlines. The elements of the array should try to avoid any unneeded overhead, such arrays of arrays. They should be packed effectively to allow shorthand operations like checking 32 bits at once whether or not they are blank.I also added firstBlackPixelOnThisOrNextLine and lastBlackPixelOnThisOrNextLine. The RasterablePart needed a helper for this. It was a sticking point of my driver's raster optimization. I'm was only allowed one transition from right-to-left each scan line (required by the hardware). So I cannot do it before the last black pixel on either this current line or the next line. But, I also couldn't know whether I was going left or right (it's private) and thus couldn't perform the correct operation of either min or max as the correct extreme depends on the direction.
Also expanded the test for BlackWhiteRaster to prove the added shorthand functions there.
No attempt was made to actually use those functions in RasterablePart that has many of the same functions but they will be slower due to the abstraction.