-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized transform.scale2x()
#2859
Optimized transform.scale2x()
#2859
Conversation
How would switching from a macro to a static inline function have any impact on performance? They both compile into the caller. |
I'm not sure but it should be a combination of factors, especially because the macros repeated ternary ops and other operations whereas the functions do not. Anyways I tested for this and no macros = 30-35% performance improvement: Take a READINT24 macro for example, it expands to this code: ((srcpix + ((((0) > (looph - 1)) ? (0) : (looph - 1)) * srcpitch) + (3 * loopw))[0] << 16 |
(srcpix + ((((0) > (looph - 1)) ? (0) : (looph - 1)) * srcpitch) + (3 * loopw))[1] << 8 |
(srcpix + ((((0) > (looph - 1)) ? (0) : (looph - 1)) * srcpitch) + (3 * loopw))[2]) So it does the same calculation thrice. With a function we pass in that pointer with the calculations in once and just index it so we save 2 calculations. So since we have 5 calls, we get 20 calculations with macros and 5 without. This still applies for the WRITEINT24 macro. |
What are these so-called “macros”? And by the way, the dash-board is really cool 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent some time investigating the restrict keyword as it was new to me, but I think we'll be alright with it. Technically Cpython's 3.7 to 3.10 only promise to support 'some' features of C99 - but I think practically you won't find a real compiler that will build those CPython versions that doesn't also support restrict. Though perhaps we should clarify somewhere in our docs what C versions can build pygame-ce. 3.11 onwards support C11 interestingly.
Anyway, the code seems alright to me I don't see any problem with replacing macros with functions, in fact generally I prefer functions all else being equal. In this case it seems to help performance rather than hinder so... great!
My testing results comparing different scaling functions doubling the size of a 64x64 pixel image 1 million times:
Current main branch 2x scale speeds:
-----------------------------
SCALE: 10.366427599918097
SCALE2X: 17.69909530004952
SCALE_BY: 10.473752800025977
SMOOTHSCALE: 18.78695980005432
SMOOTHSCALE_BY: 18.953008799930103
This PR branch 2x scale speeds:
-----------------------------
SCALE: 10.371418899972923
SCALE2X: 6.727054100017995
SCALE_BY: 10.522077800007537
SMOOTHSCALE: 18.166397199966013
SMOOTHSCALE_BY: 18.299656499992125
Test program:
from timeit import timeit
import pygame
from pygame.transform import scale, scale_by, smoothscale, smoothscale_by, scale2x
pygame.init()
WIDTH, HEIGHT = 500, 500
win = pygame.display.set_mode((WIDTH, HEIGHT))
IMAGE = pygame.image.load("images/glow.png").convert()
GLOB = {
"scale": scale,
"IMAGE": IMAGE,
"scale2x": scale2x,
"scale_by": scale_by,
"smoothscale": smoothscale,
"smoothscale_by": smoothscale_by,
}
def test(test_str: str):
print(timeit(test_str, globals=GLOB, number=100000))
print("SCALE: ", end="")
test("scale(IMAGE, (128, 128))")
print("SCALE2X: ", end="")
test("scale2x(IMAGE)")
print("SCALE_BY: ", end="")
test("scale_by(IMAGE, 2.0)")
print("SMOOTHSCALE: ", end="")
test("smoothscale(IMAGE, (128, 128))")
print("SMOOTHSCALE_BY: ", end="")
test("smoothscale_by(IMAGE, 2.0)")
So this seems to actually restore the promise of scale2x to actually be faster than scale again. I think previous optimisations to scale had made it actually worse as a doubler than the baseline scale.
Approved!
It is interesting to see that pre-this-pr the performance of |
@Gabryel-lima macros are a concept in C and C++. https://www.geeksforgeeks.org/macros-and-its-types-in-c-cpp/ @itzpr3d4t0r I don't think we can use restrict here, because technically there could be other instances of things accessing the surface pixels at the same time right? Like with surfarray? I'd be interested to see if there's actually a significant difference between using restrict and not using restrict. |
Uint8 *src_row = srcpix + looph * srcpitch; | ||
Uint8 *dst_row0 = dstpix + looph * 2 * dstpitch; | ||
Uint8 *dst_row1 = dstpix + (looph * 2 + 1) * dstpitch; | ||
|
||
Uint8 *src_row_prev = srcpix + MAX(0, looph - 1) * srcpitch; | ||
Uint8 *src_row_next = | ||
srcpix + MIN(height - 1, looph + 1) * srcpitch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in your PR description you said you had optimized using pointers and using functions instead of macros, but reading this code it seems like the main optimization is actually some manual Loop-invariant code motion.
Very clever! This reminds me of your Rect union PR, also a great way of looking at existing code and making it faster in a straightforward way.
It’s a shame the compiler doesn’t do that automatically in these cases.
How exactly does this happen? Shouldn't locking/unlocking by surfarray and this function prevent multiple threads accessing the memory at the same time. Is there another way that this memory could be accessed simultaneously, outside of a multithreaded context, that I'm blanking on? My computer science educational background is sometimes spotty. |
I have to admit |
@Starbuck5 Wow, very interesting! Thanks! |
It should, I'm just being cautious... Maybe if you constructed a Surface with shared memory from another object you could edit it while running something else? I don't have a solid scenario in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave this an exhaustive battery of tests of different surface depths, formats, and opacities to verify that this gives the same results before and after, and it does. Nice work!
Test script
import pygame
import random
import hashlib
random.seed(36)
def make_surfaces(size):
return [
pygame.Surface(size, depth=8),
pygame.Surface(size, depth=16),
pygame.Surface(size, depth=24),
# SDL_PIXELFORMAT_XRGB8888
pygame.Surface(
size, depth=32, masks=(0x00FF0000, 0x0000FF00, 0x000000FF, 0x00000000)
),
# SDL_PIXELFORMAT_RGBX8888
pygame.Surface(
size, depth=32, masks=(0xFF000000, 0x00FF0000, 0x0000FF00, 0x00000000)
),
# SDL_PIXELFORMAT_XBGR8888
pygame.Surface(
size, depth=32, masks=(0x000000FF, 0x0000FF00, 0x00FF0000, 0x00000000)
),
# SDL_PIXELFORMAT_BGRX8888
pygame.Surface(
size, depth=32, masks=(0x0000FF00, 0x00FF0000, 0xFF000000, 0x00000000)
),
# SDL_PIXELFORMAT_ARGB8888
pygame.Surface(
size,
pygame.SRCALPHA,
depth=32,
masks=(0x00FF0000, 0x0000FF00, 0x000000FF, 0xFF000000),
),
# SDL_PIXELFORMAT_RGBA8888
pygame.Surface(
size,
pygame.SRCALPHA,
depth=32,
masks=(0xFF000000, 0x00FF0000, 0x0000FF00, 0x000000FF),
),
# SDL_PIXELFORMAT_ABGR8888
pygame.Surface(
size,
pygame.SRCALPHA,
depth=32,
masks=(0x000000FF, 0x0000FF00, 0x00FF0000, 0xFF000000),
),
# SDL_PIXELFORMAT_BGRA8888
pygame.Surface(
size,
pygame.SRCALPHA,
depth=32,
masks=(0x0000FF00, 0x00FF0000, 0xFF000000, 0x000000FF),
),
]
def populate_surf(surf):
for y in range(surf.get_height()):
for x in range(surf.get_width()):
surf.set_at(
(x, y),
(
random.randint(0, 255),
random.randint(0, 255),
random.randint(0, 255),
random.randint(0, 255),
),
)
hashes = []
expected_hashes = [
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"07dd4345d1c64db0b8c339a7d5ce501747d80e87ad1feeb08ad5a633a4c56a7a",
"7a7274e2eb5ab40b777402950e21c536d8e8dd9b53e0de4a27eda350d0f5ea1a",
"5a292773e067c5ac87d7a7f88ca73aa6cdc812b4a7a9ef41e85b623b6479f9da",
"5fd5ab7eb26484c778adb0c63d5afd90d53297923f2963aff27db3a679ca45ba",
"e11f4c01f77ee6d769c8bb03906a3f2564dcc265e6aa04912b7eb6839774b14d",
"49980c6c196eac008b0dd2df243d3b7c5cd3a08df8657096d4a66731d33c5e43",
"c041a7aaab37eee55c6e33039ae7e2f9ef9f72c26c8f2c1fd201766060bac5cb",
"023ec57431e52f755dd5ee6b1511d3d16a3f3a721b58670080e5b7b3e872b4e2",
"b11cd8665dac1418d3df4db44a77576fef08d9b52837b61353d0874150b56777",
"98d74bd3bd28e4024ee935a9989baedf8e4536e519792be8829414a9d010955a",
"13623a27d013c805c66585d462a7e327c360bf5b4e64810c4be6f59f27ed4644",
"d08cfd3963bdc6dc9506d45fe9cab27f12883648c5aa961f0125ab3802a85013",
"b9708a1a0826b8770298fcc5d877a86453f8e6e1efffea6968e0439333e52ec6",
"f6ea89a84915fe270258f5966b551c089737030492d949b13bc3ee587262ae18",
"b620f47215f03bf52e9935185ed99f3ecc284484f96176adb6f5a31988d7fc7a",
"be86f36e9a7ec4a4720e4a351b24c6fd842941907e5971be90def8648ca61348",
"8d83096742b76dd6aace1b528d3a77c0f487a843f101e54fd33d8f3c74e42b6b",
"14eb95be7d420e08127804968f8ef2209b196633d8fe9f99b6ca653aa78d6e37",
"96c70430b8d6a3c747ad489fd27affbe32de6ed25741fe244bc542c5aa80a9d5",
"67cb035d06d738fc0066b4ebea7a8b016294e9c55dcca561d0a8bb530f81491e",
"2b00bcac9b1403787526cf08d852b3ef18e49dd4224c9d6d94e0a7cfeed0def2",
"d326c56afc5c4f77dacffe0fc0a71b58279025cad631b71e5c7e7114120ab583",
"7e0eb44be702d2e7b44ffbb2ec6ab785fbcdfd00ec1fd48368a275bc2748b83a",
"02f0dbf676ec24d2c52087e26a2c6fbefac752a5a0be9ad15f494b2bc5bd0947",
"34f1132407853091b7644dd683df1646c0c766e14310d268801c9db7524dbfd6",
"20d5941a6a3dd75389813b04db1af5fc89a9d1ea70cae102abf5519985149a76",
"43752418959a7301a29685720cd2072f04e56af0e30412b8cb69997ca1d53e4c",
"666a3524bdc8a94f64a5ef6ee8a8c9439011a7f4994e88a9944117f9c5d80abf",
"a6033c810fbc519187db0b3f841e96996b81a0a52efa05f636fb5d1cb582c380",
"a8f643a1cd09bfbe05713c6f8bc0af218441ebe89d44d48bfdd3426d869342b4",
"165191ec5630361a8b3b639dc86569cfec4feb6ccf52c505cf2d303d50bb6346",
"67874fd3cf12be0ded721c2d1e217ad209fff4279b21e212b3a85bc9f0b15085",
"a2db659ed1a43809ff2672d6adc38586893aab7da60aeb34ee8e21ba3d884488",
"2ef0b00133371211f0feabd05bd35448bae67016e491657141553a25ff9583c7",
"a6bb69e866c0fbb8113aefc0103364d5f2dbcd2766409eed955ec4caf9e88ed5",
"f105538c2b09a857807b831183d9023ec463d347c7318c7938814f659c21ad4a",
"6784989dc604a107134d0a2264b58b6b209c4ccab00aca2bb69324fc4142187a",
"90603b55c0bb439db2a1b140077ba52a99c9a087bdee365ae44b26800d85712e",
"5ba8bf393bf55bcaccb2572e8e22aa44d6086fbe787df2615c089d2638bf1e6d",
"3e78d69a1bcb3c6269712a8bf23eb884e206279d86330693595a3347940f0b3a",
"a49311f6e3f995ff3f887407c9acce21cd3c54c947a18384cda699f71192aed9",
"9336fecee2804871be09b33f20008405c0006d2a9ed53e392f68c2c79b24b87b",
"230f1a1e4dba953276833251ef25bb57ef2e336b0b013d2f5c0b4b354a83fff9",
"3a34093ccfcf0fabfee8692bb11430672d477cdf0a1c22d79fe103bbe0c6cb04",
"b3add4070aa303393373a97977418c040c0ccdc901c6437c2e57cd6a761c4be0",
"e3fb3394076e2aadc3b807c70b7f1db7b49d81b4c167be81cbbc3d2d994054ce",
"78d5c92e8164d91360ed857fe35b0c7d0014f67a27964e1520f91db8d8322b04",
"5b20885a17bc23e92e6487d6689d1c4d2f1daab974e6f3f415fc6f965027fe63",
"3387400b23627651660fdc50d88550c14bfff0fd79cd5e0433a5792bb033458d",
"370b64d0b3a01c6179c3ca6962c660af8d2113feeb01e33185663882e0b75bb2",
"85694d3d77670f12c9b9de91b7d79cefe633997c9cac9472ca64c54f55e7ab21",
"c6c2b536861ac8b0f00ad39f3ba6bb3ac801e0643c1c6c42af8fc86a6c057de2",
"5c9f48abf47e5adc36510ddb4f8fe76b1c0bb2516cff134dfd34899425dcde8a",
"ef1b2984e6e86470095bac34ebb9ab3da5bdccea17e33a3328af37d33664370b",
"9543f2ea21c1ab3bdac5d568e6baf530127384bf1e0487f3376f05621df6a3f4",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"b5fcab43e5732c36c493a607b0063eee59bc7d27a5500238d5b4d69e0249a35f",
"5c84db7d237a2c14eaa7d32ac36c393b7b4f41a3d06fa0eeca96e447d4425575",
"5448e7b8bdb7c9b24672df1a97f2e687786247db667ac006133be6d619b3fcdc",
"8d9c05da498cef64f7a26c99d3f1d7142903b9baa8665dd96a3a4ee512a612fc",
"471629c911ba1fcc3d67551a4d9730d492ba6c5ed483122d4ab07b226d73035b",
"06778b49fedd4abbbc9b983f39b339bb31a6c9cecd4872273aa7647b6b307ff1",
"a3d66608e8c57325d85cb74951a7344d8ec710d10d40458d4d821123d4f93194",
"3eda1c1d443a20b78d857ef3c3a9c2c9b92e811d3e5bfe50488571bff0f0a6de",
"369d9a42964896c883357f731e4c45d0eaeb391062d2f672e5742f9bc22ffec1",
"9bb494ee6286478869b02f36628e3ad5ad6e1084a5d5e1120fb67b7997fd8d95",
"69d0d5122fedbe82064076400086b9b1e4e515097ba4035e6a7adeea9ddaf41f",
"bdd218413ebc1a23f25ba91686d4e9f3aeacfb0bceb8069a151b57c7b51002c2",
"b41a727c0d11085617593c0a0c8fbca763b0cdb5990c7d20cee4f75c4d8d6994",
"f7cf4233252b585c24648935baf9a3b41b1ab18c64ac00cb1d3608d6f50771d3",
"aea1a225a824cd161619480603464c1c52f4f7fd853ade676843445607534c8b",
"108a3d90ec596b448765e1479f144fc6966e33fcd64c6331746636e05841ce24",
"70e85ac264a78ebe53f663fd4d626b22d9d9d5313a99cffb9d7858d2d72a82c8",
"bdf936a7f30e73234bd7492648a8b0741c7c71c246e9022539696cc62a08b7f0",
"5059837102570c517bbdf9c7d564c6ec0db3ba69afa9e61ffecd22e4adabca96",
"d2a0e15ad8cd44acfb3c3322fd05ba4f4c8a9484b6e3b713a39cf2e710a73bf7",
"39a7a2110c4a7a5946f6bc87c21af33a58504ca509901ff01eca551a682a4eac",
"f289a0adff77ec383947a4b3d105ee99eddb5d8d6c65a440d77270846b70d989",
"98a501ae221fb75fa53be651eda9cd198ccf6c62a90d25db1976b59cfda8ff50",
"aca289c40bbd7a2c54ffbc408c755a81570f2e7d530737166978feb18ecf4404",
"bcffb698ab7e60a0d4d49a9c4de4b61fda9126061e34fbe4461d35f47677e043",
"96bc8c33ac108d228096cf776fc50a6eb07931e83494274ed28f6bec4447ac75",
"6544fe38bcf43329f9a16593880c7bcc7921e9a8fab6f33a81e690466c15503a",
"da6a79f82b70703c262f5b91886502b7abdaa6d846c1ddce960a04a4d6734bff",
"89084dac9b8ff825672d217341d03984f7a2a57458b53c251604606ec0f4105b",
"0493a64bd06d28f3cb8cd01da4e75983ea440f26034f799ccfe847fd5c4aa3bb",
"6f33c0cf04e310224d79837a971dd0b7e72275616cd4a905045b3a047f43abb7",
"ab9864a661792b5cc502bb58658f3388626b672bae8d8cb97878af50bf04089f",
"052110483e2eb2e23504d75aa0c1988667666cfdc25b8d4daff73a7f39daf052",
"eaf09c2868d22d3664fd2cc74092cec08df14ffd85977cbf2a38ba1ff679b7df",
"1c37e06b8acd0222892ccd6c9aa5f0cfae6d4effbdcd43533d545a83ec295c33",
"7482f35748119bf0460e2a11497c259f6f49b73f5688152351740716d1ad4cac",
"423b715a2b31bda2790029c816f3ac8ee9e8a765f79124e1a9c298999eaa18cd",
"4eeaeb7c7da928a7f2b05c5cfb066642781a3e9bbd426bfbd0f84f7513134b70",
"d37a62725f8f6f8aa083fcad73dc7a6dfe0c15d735b857258545a4ab50fc07f3",
"a23efe543d1a6fcb5b76589dd13c1f29512327e8d6237fdb8ecffa6046594f11",
"f592aa372bc3e3e902b90fc19331c7657831911d82667fdd7a841b6b3838686b",
"bcbcfa67fa27b662bb6533922fdd8c6d7be9d91ba86d1722f332647cb23c96a9",
"578271ccc74c00e144f1f0839493c208b270f2786bee5c721f4126fdf40a6593",
"c7fa39a3887419effc5af8f7629a15fd3384152b4b796095ca149bed708bce46",
"3876b1ff204c87b7ded03550afdc6c3bc990eeaff3a6f0268c3f6438830e381d",
"fc01a6d64bf57fe1bfbbe34c26bca052bc3e73722f0853f633a72d82494ea731",
"a18efed92cacf613cc79b0632980ec691095cdf95e098c2e6694b6d687bc5d59",
"375bd25f342a3b559d1f31c755e005b6b7d2766c666f57ceeaf417b187cceff5",
"ef525c951c47768c7dad6ce72a8ad62e718ac18fa156c29f0410ff567f7634b1",
"8c18d28327c466afe4e79d9d4363e83765105259ac73c581e525a2a0b87e8cc8",
"4aeaeebc2c8888a416c73bd379fce848a13197165b6d4e582f6f5b868f9ba5d2",
"a48981cfce1be50c15b6f917291dbffb69dec384dd9a6449940e5355fbff9fee",
"a120add0bd9f78a6d73e14ed5fb45122c7ad109c01b560d92d1182168a092dea",
"fc4d04eacb266998a51ee21d71b5fb8ce98a10eb431d0706d1527c89b6b9a799",
"71bd9208904f988977a1b96ad158d265da2a64dd84ea6b733c8897ce55fabe6b",
]
import time
start = time.time()
for size in [
(0, 0),
(1, 1),
(2, 2),
(3, 3),
(4, 4),
(59, 10),
(1, 0),
(0, 1),
(15, 29),
(64, 64),
(10, 2),
(129, 12),
(60, 79),
]:
for surface in make_surfaces(size):
populate_surf(surface)
scaled = pygame.transform.scale2x(surface)
sha256 = hashlib.sha256()
sha256.update(pygame.image.tobytes(scaled, "RGBA"))
digest = sha256.hexdigest()
assert digest == expected_hashes.pop(0)
hashes.append(digest)
print(time.time() - start)
print(hashes)
* optimized all Bpp cases of scale2x * finish 3Bpp case * removed restrict keyword, added Bpp var * format
This PR optimizes all bpp cases for the
transform.scale2x()
function. It does so through better use of pointers / switch to static inline functions instead of macros that repeat some calculations and one optimization suggestion in https://www.scale2x.it/algorithm.Performance improvements vary on a case by case basis depending on suface size and bpp, so I'll just leave my results here (top one is 32bpp):