Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PC-BASIC 2 graphics do not update between PSET calls #166

Closed
JDoucette opened this issue Nov 14, 2021 · 14 comments
Closed

PC-BASIC 2 graphics do not update between PSET calls #166

JDoucette opened this issue Nov 14, 2021 · 14 comments
Labels

Comments

@JDoucette
Copy link

Bug report

See: https://www.youtube.com/watch?v=9J9DtrcoDXc

Problem

Steps

  1. run below program
  2. expect graphics updates to happen in real-time
  3. they do not
  4. add A$=INKEY$ inside the loop, and it works

Program

10 SCREEN 1
20 FOR Y=0 TO 199:FOR X=0 TO 319
30 PSET(X,Y),INT(RND*4)
40 NEXT X:NEXT Y

Notes

PC-BASIC version: 2.0.4
Operating system version: Windows 10

@JDoucette
Copy link
Author

This also occurs in text mode.

5 SCREEN 0,0,0:WIDTH 80:CLS
10 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#):GOTO 10

Simple program that adds 0.1 repeatedly and draws the results.
The screen should be updating blazingly fast.
But it stalls, and updates randomly.

p.s. Unrelated to the bug: This reminiscent of a very old program I made as a kid where I was amazed that adding 0.1 over and over again would not result in accurate results, and furthermore, you could watch the velocity of the accumulative error, and the acceleration changes as it would jump into positive flow, then negative flow.

robhagemans added a commit that referenced this issue Jun 30, 2022
…nderstood workaround of double sleep(0) call
@robhagemans
Copy link
Owner

Thanks for reporting, fixed with issue 8d1ae6f. The fix is to call time.sleep(0) twice. I have no idea why this works or more precisely, why it is necessary.

@theruler
Copy link

The issue is still present when using more than one core due to, I think, python's GIL.
If you set the affinity to one or two cpu the bug is gone and the screen updates fast without the A$=INKEY$ workaround.
But if you leave default (ALL) CPU affinity, the update is "choppy" and have to use A$.

i76700 - 1 CPU, 4 CORES, 8 THREADS
Win 11 - pcbasic 2.0.6

@robhagemans
Copy link
Owner

I'm sorry @theruler can you please explain what you're doing? What is "affinity"? I do not see this problem anymore on my machine (which is also multi-core) so would need more information to be able to reproduce.

@theruler
Copy link

theruler commented Aug 24, 2022

Yes, sure.
Tried on three modern multicore PC with windows (2x win10 and 1x win11):
If I paste the code on PCBASIC 2.0.6 without the A$=INKEY$ workaround:

5 SCREEN 0,0,0:WIDTH 80:CLS
10 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#):GOTO 10

I get the same results of jdoucette, the output is slow and stalls updating randomly.

To solve the issue OS side, you have to open Task Manager, "details" tab, find pcbasicw.exe process, click the right mouse button on it and select "process affinity".
There you can allocate HW resurces (logic CPUs) for the process. If you set 1 CPU only, the problem is gone. On some PC you can set 2 logic CPU and still solve the issue and have a minor speed increase. 3 or more CPUs makes pcbasic "lag", I think due to Python's Global Interpreter Lock (GIL) feature.

Hope this was of help to you.

Regards
Stefano

@theruler
Copy link

theruler commented Aug 25, 2022

I did some timing tests with and without A$=INKEY$ and using single or multi cpu affinity.
i5-1135G7@2.4GHz
Windows 10
PC-BASIC 2.0.6

10 CLS:T=TIMER
20 FOR X= 1 TO 10000
30 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#)
40 NEXT
50 ? TIMER-T

multi: 18 sec (choppy output)
single: 35 sec (smooth output)

10 CLS:T=TIMER
20 FOR X= 1 TO 10000
30 I!=I!+.1:J#=J#+.1#:LOCATE 1,1:PRINT I!;J#;(I!-J#)
35 A$=INKEY$
40 NEXT
50 ? TIMER-T

multi: 87 sec (smooth output)
single: 122 sec (smooth output)

@robhagemans
Copy link
Owner

Thanks @theruler for the metrics.

So this is actually a slightly different issue as the original issue occurred on single-core as well, and led to a complete block of screen updates. That has been resolved, however what we're seeing here is choppy updates on multi-core machines, at least under Windows.

You are probably right that this is related to the GIL. It looks similar to this: https://www.rfk.id.au/blog/entry/a-gil-adventure-threading2/ . The analysis in that case is that when the threads execute on different cores, they race to acquire the GIL and this leads to slower performance on multiple cores than on single core. In-depth explanation in David Beazley's slides here http://www.dabeaz.com/python/UnderstandingGIL.pdf

However:

  • In contrast to these links, it seems that in our case the multi-core variant runs faster, but does not update the screen as much. Which seems to imply that the engine thread simply crowds out the display thread.
  • The links all discuss behaviour of Python 2, and indicate that this "GIL battle" should be improved in Python 3. I'm assuming you're running this on Python 3 - if it's the packaged Windows version it is Python 3.

The root cause of the issue I think is that (given the GIL in CPython) the display thread simply has too much CPU work to do to keep up with the engine thread, which unfortunately takes rather big changes to address and is something I'd defer until we have better test coverage.

It may be possible to work around the issue by explicitly setting the affinity from the code - however for some reason os.sched_setaffinity is only available on Unix so this likely requires linking this thing in kernel32.dll: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setprocessaffinitymask

@robhagemans robhagemans reopened this Aug 25, 2022
@robhagemans
Copy link
Owner

Further update

  • this is not Windows specific but also occurs on Linux (not tested on Mac but I assume it will occur as well)
  • the issue only occurs in Python 3, not in Python 2.

For reference, on Linux, these are "smooth":

python2 ./run-pcbasic.py update.bas 
taskset -c 0 python3 ./run-pcbasic.py update.bas

while this is "choppy":

python3 ./run-pcbasic.py update.bas

Here, update.bas is the two-line program above at #166 (comment)

@theruler
Copy link

theruler commented Aug 25, 2022

I'm assuming you're running this on Python 3 - if it's the packaged Windows version it is Python 3.

correct.

I am not concerned about speed per se, what I think pc-basic should achieve is the same speed as an old 386, on all system independently.
adding A$=INKEY$, surprisingly, make high computational program behave similarly on different machines.

@theruler
Copy link

theruler commented Aug 26, 2022

Further tests OS side (windows 10).
It's interesting how different behave python 3 and python 2.
I've used the same program here Single CPU affinity, with A$=INKEY$ (first half of the chart) and without (second half).
the darkest area is Windows KERNEL TIME.

windist - PC-basic 2.0.6 - Python 3:
kernel time p3
------- 122 sec ---- 35 sec

pure python PC-basic 2.0.6 - Python 3:
(python3 ./run-pcbasic.py update.bas)
purepython3
------- 167 sec ------ 12 sec
Here is an interesting behaviour:
the first part shows all kernel time and the execution is very slow, the worst performance actually.
In the second part, the one without the A$=INKEY$ workaround, as you noticed the first 7 seconds kernel time is not present, then it kicks in for the rest of the run. The program in the first part is VERY slow, then it speeds up to full speed, exactly when kernel time kicks in.

PC-basic 1.2.15 - Python 2:
kernel time p2
--------- 55 sec ------- 12 sec

quote from microsoft technet:

The kernel is that part of the operating system that acts on behalf of all the applications and services running on the system. The kernel is the only part that of the operating system that has access to the physical resources of the computer. Access to the kernel is handled through very specific, highly privileged, API calls.

The most common culprit of run-away kernel time is a poorly written device driver. Another could be a failing hardware device that is causing a flood of hardware interrupts - hardware interrupts are handled only by the kernel.

In a system that is not really running any applications, the kernel continues to perform its functions, which includes all sorts of 'house-keeping' tasks to ensure the system is running correctly. Therefore, if no user processes are running, kernel time will be a higher percentage of overall usage. But as the number of user processes kick in, you will see the percentage of kernel versus user time go down.

Lastly, if you have to ask what kernel time is, you are most likely not the person to 'debug' it. There are very few people who work in kernel development and troubleshooting.

robhagemans added a commit that referenced this issue Aug 28, 2022
@theruler
Copy link

theruler commented Aug 29, 2022

You've fixed it!
with your changes single and multi cpu speed are almost identical and the CPU load is back to normal.
multi 24.39 sec
single
single 23.75 sec

The output by the way features pseudo-random accelerations. You can tell looking at the counter running.

I think it is worth paste your code comment in eventcycle.py:

    # what I think is happening here is that the sdl2 interface thread,
    # in its loop to process a single queue item, calls C library functions
    # which do not need the GIL. so it releases it. this allows the engine thread to pick up
    # and produce more work for the interface thread. sleep(0) does not wait but it does release
    # the GIL back, so we need a few sleep(0) calls to allow the interface to get through
    # individual items.
    # this would not be a problem if the interface thread did not need the GIL at all
    # (perhaps with numba, nuitka, cython, pypy or jython)
    # but note that the video queue is a Python object so may require the GIL
    # or if it held the GIL for a full cycle
    # wait to for the queue to drain if it excceds a threshold value
    # this allows the interface to catch up with video updates

I've tried to run the code deleting the lib folder where SDL2.DLL and SDL2_gfx.DLL are, and it run anyways.
Does this mean that SDL is not loaded or pcbasic is just using SDL in other PATH folder?

here are the warnings:

c:\Intel\pcbasic-master>python run-pcbasic.py
[09:49:05.0646] WARNING: Failed to load library sdl2: could not find any library for SDL2 (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',))
[09:49:05.0646] WARNING: Failed to load library sdl2_gfx: could not find any library for SDL2_gfx (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',))
[09:49:05.0646] INFO: Could not initialise video plugin sdl2: Module sdl2 not found
[09:49:05.0646] WARNING: The pygame interface is deprecated, please use the graphical interface instead.
[09:49:05.0646] INFO: Could not initialise video plugin pygame: Module pygame not found
[09:49:05.0662] WARNING: Failed to load library sdl2: could not find any library for SDL2 (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',))
[09:49:05.0662] WARNING: Failed to load library sdl2_gfx: could not find any library for SDL2_gfx (looked in paths: ('C:\Intel\pcbasic-master\pcbasic\lib',))
[09:49:05.0662] INFO: Could not initialise audio plugin ansi: Module sdl2 not found

@robhagemans
Copy link
Owner

By the looks of that log, PC-BASIC would have ran in text mode without using SDL2. You'd notice though, it looks rather different, running in the command prompt window rather than opening a window of its own.

@theruler
Copy link

@robhagemans do you have any eta of a new version that would include this fix?

@robhagemans
Copy link
Owner

robhagemans commented Mar 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants