Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devdraw: acme missed screen updates #268

Closed
rsc opened this issue Jun 12, 2019 · 15 comments · Fixed by #270
Closed

devdraw: acme missed screen updates #268

rsc opened this issue Jun 12, 2019 · 15 comments · Fixed by #270

Comments

@rsc
Copy link
Contributor

rsc commented Jun 12, 2019

[Copied from https://github.com//pull/215#issuecomment-501052017.]

I can't yet tell if it's related to #225, but since this was merged into master, I've noticed that acme will sometimes not update the window after edits, or only update a portion of the window. It doesn't seem like it's reverting, or it's reverting so quickly that I can't see it as it happens. This happens in both windowed and fullscreen modes. For example (using | to represent the cursor and ---- for the top and bottom of the in-acme window):

----
line 1|
line 2
line 3
line 4

----

Typing Enter will sometimes not be reflected in the window until I click on the window (or scroll with the arrow keys, etc.). In other words, it looks the same, even though the buffer did change. It also happens when typing backspace with the cursor at the beginning of the line.

Another thing I've seen is that a narrower-than-correct "rectangle" of the window gets updated after typing. So for the example above, typing Enter could result in a window that looks like this:

----
line 1
|
line 2
line 4

----

Again, clicking or scrolling redraws everything properly. I've also seen these glitches for horizontally-oriented edits (e.g. adding or deleting a character), and things like win where I'll run a command and not see any of the output until I click or scroll.

edit: I should mention that I'm currently running p9p built from d4e16c8 earlier today

Originally posted by @mccoyst in #215 (comment)

@rsc
Copy link
Contributor Author

rsc commented Jun 12, 2019

I've been seeing exactly this too but I realized I have one laptop with #215 and one without, and I wasn't sure which one I have seen it on. I'm going to keep watching more closely.

Sometimes I see only the upper right triangle of a particular rectangle get redrawn too. That really confuses me. There isn't a triangle in sight in any of our code.

@jxy
Copy link
Contributor

jxy commented Jun 12, 2019

I started seeing this either after upgrading to macOS 10.14.5 or after a rebuilt against the new Xcode/SDK. Perhaps a combination of both.

@jacobvosmaer
Copy link
Contributor

jacobvosmaer commented Jun 12, 2019

IIRC the old devdraw used a refresh timer to periodically redraw regardless of whether there were changes. That feels like a silly brute force solution but it could be an option.

Edit: I was thinking of

static void
autoflushwin(int set)
{
static NSTimer *t;
if(set){
if(t)
return;
/*
* We need "NSRunLoopCommonModes", otherwise the
* timer will not fire during live resizing.
*/
t = [NSTimer
timerWithTimeInterval:0.033
target:[appdelegate class]
selector:@selector(callflushwin:) userInfo:nil
repeats:YES];
[[NSRunLoop currentRunLoop] addTimer:t
forMode:NSRunLoopCommonModes];
}else{
[t invalidate];
t = nil;
win.deferflush = 0;
}
}

@rsc
Copy link
Contributor Author

rsc commented Jun 13, 2019

I saw this triangular shearing again on my 10.14.5 mac without #215 applied. I typed enter a few times after the word func and the screen looked like this:

Screen Shot 2019-06-12 at 8 05 37 PM

but should have looked like this:

Screen Shot 2019-06-12 at 8 06 19 PM

Like I said, it's hard to see how we did that. The best theory I have is that there is a race in our screen updates and we are trying to do two at the same time, which might not work out well.

@jxy
Copy link
Contributor

jxy commented Jun 13, 2019

It looks like delaying the command buffer commit helps to alleviate the problem, as below.

diff --git a/src/cmd/devdraw/cocoa-screen-metal.m b/src/cmd/devdraw/cocoa-screen-metal.m
index 959df59b..c1cc19f7 100644
--- a/src/cmd/devdraw/cocoa-screen-metal.m
+++ b/src/cmd/devdraw/cocoa-screen-metal.m
@@ -810,6 +810,7 @@ - (void)display
                }else
                        LOG(@"command buffer finishes present drawable");
        }];
+       [NSThread sleepForTimeInterval:1e-2];
        [cbuf commit];
 
        LOG(@"display commit");

I thought it was a race condition between updating the texture and reading it in GPU.
However, adding a qlock does not help, as below.

diff --git a/src/cmd/devdraw/cocoa-screen-metal.m b/src/cmd/devdraw/cocoa-screen-metal.m
index 959df59b..6d315cbc 100644
--- a/src/cmd/devdraw/cocoa-screen-metal.m
+++ b/src/cmd/devdraw/cocoa-screen-metal.m
@@ -67,6 +67,7 @@ @interface DrawLayer : CAMetalLayer
 static id<MTLDevice> device;
 static id<MTLCommandQueue> commandQueue;
 static id<MTLTexture> texture;
+static QLock texturel;
 
 static Memimage *img = NULL;
 
@@ -788,6 +789,7 @@ - (void)display
 
        LOG(@"display got drawable");
 
+       qlock(&texturel);
        blit = [cbuf blitCommandEncoder];
        [blit copyFromTexture:texture
                sourceSlice:0
@@ -807,8 +809,10 @@ - (void)display
                if(cmdBuff.error){
                        NSLog(@"command buffer finished with error: %@",
                                cmdBuff.error.localizedDescription);
-               }else
+               }else{
                        LOG(@"command buffer finishes present drawable");
+                       qunlock(&texturel);
+               }
        }];
        [cbuf commit];
 
@@ -974,11 +978,13 @@ - (void)display
        }
 
        @autoreleasepool{
+               qlock(&texturel);
                [texture
                        replaceRegion:MTLRegionMake2D(r.min.x, r.min.y, Dx(r), Dy(r))
                        mipmapLevel:0
                        withBytes:byteaddr(img, Pt(r.min.x, r.min.y))
                        bytesPerRow:img->width*sizeof(u32int)];
+               qunlock(&texturel);
                [AppDelegate
                        performSelectorOnMainThread:@selector(callsetNeedsDisplayInRect:)
                        withObject:[NSValue valueWithRect:NSMakeRect(r.min.x, r.min.y, Dx(r), Dy(r))]

Now I'm very puzzled.

@mccoyst
Copy link
Contributor

mccoyst commented Jun 17, 2019

Here's another possible clue: I've been noticing this with ⌘-V pasting. When I replace some highlighted text, I sometimes see the original text get deleted (and any following text move over to the cursor position) but then don't see the new text until forcing an update.

@jacobvosmaer
Copy link
Contributor

I've been trying to make sense of this and I'm also very puzzled. This documentation page suggests the locking above is needed... https://developer.apple.com/documentation/metal/mtltexture/1515464-replaceregion?language=objc

I can't reproduce the problem on both my Macs; it happens with my "newer" macbook pro on its external low-dpi monitor, but not on my "older" macbook air. (The "newer" mac is a 2014 model so they're both old.)

A more or less reliable way to trigger the error is to click 'New' twice in an acme column. After you do this one of the new windows will be invisible. Drag it down just a little bit to make it visible; the lower border of the tag does not get drawn.

If I create three acme columns, and create a rendering problem in the right most one, then type some text in the left most column, the right most column gets fixed. That suggests that the write made it into the devdraw img buffer, but also into the MTLTexture texture. But then it never makes it onto the screen.??

Could something be coalescing the blitting commands?

@jxy
Copy link
Contributor

jxy commented Jun 18, 2019

I did some tests. I think it is just the [MTLTexture replaceRegion:...] and [MTLBuffer didModifyRange:...] (I tried to use buffer too) are not synchronized with subsequent command queue executions on the GPU. If I wait for more than 5 ms after replaceRegion and before commit the blit command, everything is fine. Otherwise, when there are large amount of memory to be transferred to GPU (replaceRegion supposedly should do it on the CPU side and transfer the updated memory to the GPU side, something like cudaMemcpy or omp_target_memcpy), the texture memory on the GPU does not get fully updated.

One simple workaround is that [NSThread sleepForTimeInterval:1e-2]; shown in my previous comment.

If you care about latency (10 ms is too much if we just want to update a small region like displaying a letter from a key press event), the following is another workaround that simply commit a new blit command and display again after 10 ms.

diff --git a/src/cmd/devdraw/cocoa-screen-metal.m b/src/cmd/devdraw/cocoa-screen-metal.m
index 959df59b..87e46d9c 100644
--- a/src/cmd/devdraw/cocoa-screen-metal.m
+++ b/src/cmd/devdraw/cocoa-screen-metal.m
@@ -812,6 +812,40 @@ - (void)display
        }];
        [cbuf commit];
 
+       dispatch_async(dispatch_get_main_queue(), ^(void){@autoreleasepool{
+               id<MTLCommandBuffer> c;
+               id<MTLBlitCommandEncoder> b;
+               id<CAMetalDrawable> d;
+
+               LOG(@"display again");
+               c = [commandQueue commandBuffer];
+               d = [layer nextDrawable];
+               b = [c blitCommandEncoder];
+               [b copyFromTexture:texture
+                       sourceSlice:0
+                       sourceLevel:0
+                       sourceOrigin:MTLOriginMake(0, 0, 0)
+                       sourceSize:MTLSizeMake(texture.width, texture.height, texture.depth)
+                       toTexture:d.texture
+                       destinationSlice:0
+                       destinationLevel:0
+                       destinationOrigin:MTLOriginMake(0, 0, 0)];
+               [b endEncoding];
+
+               [c presentDrawable:d];
+               d = nil;
+               [c addCompletedHandler:^(id<MTLCommandBuffer> cmdBuff){
+                       if(cmdBuff.error){
+                               NSLog(@"command buffer finished with error: %@",
+                                       cmdBuff.error.localizedDescription);
+                       }else
+                               LOG(@"command buffer finishes present drawable again");
+               }];
+               [NSThread sleepForTimeInterval:1e-2];
+               [c commit];
+               LOG(@"display commit again");
+       }});
+
        LOG(@"display commit");
 }
 

I'll try to figure out a better solution when I have time. (Perhaps we can work with textures for each Memdata like the X11 devdraw works with X pixmap, and offload all memdraw(3) to be based on blit commands and directly executed on GPU — Too much work though.)

At the mean time, please let me know if you have any other suggestions.

@jacobvosmaer
Copy link
Contributor

Some ideas:

@jacobvosmaer
Copy link
Contributor

I just upgraded a machine that did not have the problem from macos 10.14.3 to 10.14.5 and now it does have the problem. The other machine where I've had the problem before was already on 10.14.5.

@jacobvosmaer
Copy link
Contributor

jacobvosmaer commented Jun 18, 2019

(synchronously) calling replaceRegion on the main Cocoa thread

I tried this:

diff --git a/src/cmd/devdraw/cocoa-screen-metal.m b/src/cmd/devdraw/cocoa-screen-metal.m
index 959df59b..de11db7e 100644
--- a/src/cmd/devdraw/cocoa-screen-metal.m
+++ b/src/cmd/devdraw/cocoa-screen-metal.m
@@ -46,6 +46,7 @@ usage(void)
 + (void)makewin:(NSValue *)v;
 + (void)callkicklabel:(NSString *)v;
 + (void)callsetNeedsDisplayInRect:(NSValue *)v;
++ (void)callreplaceRegion:(NSValue *)v;
 + (void)callsetcursor:(NSValue *)v;
 @end
 @interface DevDrawView : NSView<NSTextInputClient>
@@ -221,6 +222,18 @@ threadmain(int argc, char **argv)
 	[myContent enlargeLastInputRect:r];
 }
 
++ (void)callreplaceRegion:(NSValue *)v
+{
+	NSRect r;
+
+	r = [v rectValue];
+	[texture
+		replaceRegion:MTLRegionMake2D(r.origin.x, r.origin.y, r.size.width, r.size.height)
+		mipmapLevel:0
+		withBytes:byteaddr(img, Pt(r.origin.x, r.origin.y))
+		bytesPerRow:img->width*sizeof(u32int)];
+}
+
 typedef struct Cursors Cursors;
 struct Cursors {
 	Cursor *c;
@@ -974,11 +987,10 @@ _flushmemscreen(Rectangle r)
 	}
 
 	@autoreleasepool{
-		[texture
-			replaceRegion:MTLRegionMake2D(r.min.x, r.min.y, Dx(r), Dy(r))
-			mipmapLevel:0
-			withBytes:byteaddr(img, Pt(r.min.x, r.min.y))
-			bytesPerRow:img->width*sizeof(u32int)];
+		[AppDelegate
+			performSelectorOnMainThread:@selector(callreplaceRegion:)
+			withObject:[NSValue valueWithRect:NSMakeRect(r.min.x, r.min.y, Dx(r), Dy(r))]
+			waitUntilDone:YES];
 		[AppDelegate
 			performSelectorOnMainThread:@selector(callsetNeedsDisplayInRect:)
 			withObject:[NSValue valueWithRect:NSMakeRect(r.min.x, r.min.y, Dx(r), Dy(r))]

The result was a deadlock (beachball) not long after using acme, and on at least one occasion I still saw a render error. So that is a dead end I think.

@jacobvosmaer
Copy link
Contributor

It feels wrong but the sleep approach seems like the pragmatic solution. I can't find any indication of what we are "doing wrong", or of what changed since macos 10.14.3 to cause synchronization to go bad.

One thing that might be nice is to not put a sleep in the main thread; we could maybe use dispatch_after instead?

diff --git a/src/cmd/devdraw/cocoa-screen-metal.m b/src/cmd/devdraw/cocoa-screen-metal.m
index 959df59b..5fb17053 100644
--- a/src/cmd/devdraw/cocoa-screen-metal.m
+++ b/src/cmd/devdraw/cocoa-screen-metal.m
@@ -766,6 +766,16 @@ struct Cursors {
 @implementation DrawLayer
 
 - (void)display
+{
+	// Delay GPU drawing because of memory sync bugs https://github.com/9fans/plan9port/issues/268
+	dispatch_time_t when = dispatch_time(DISPATCH_TIME_NOW, 5 * NSEC_PER_MSEC);
+
+	dispatch_after(when, dispatch_get_main_queue(), ^(void){
+		[self _display];
+	});
+}
+
+- (void)_display
 {
 	id<MTLCommandBuffer> cbuf;
 	id<MTLBlitCommandEncoder> blit;

I'm going to try this patch and see if the latency affects me. Acme latency is so poor to begin with that I'm not sure if I'll notice.

@jxy
Copy link
Contributor

jxy commented Jun 19, 2019

Thanks @jacobvosmaer, in that case, the following should have minimal performance impact.

diff --git a/src/cmd/devdraw/cocoa-screen-metal.m b/src/cmd/devdraw/cocoa-screen-metal.m
index 959df59b..1588992f 100644
--- a/src/cmd/devdraw/cocoa-screen-metal.m
+++ b/src/cmd/devdraw/cocoa-screen-metal.m
@@ -212,12 +212,19 @@ + (void)callkicklabel:(NSString *)s
 + (void)callsetNeedsDisplayInRect:(NSValue *)v
 {
        NSRect r;
+       dispatch_time_t time;
 
        r = [v rectValue];
        LOG(@"callsetNeedsDisplayInRect(%g, %g, %g, %g)", r.origin.x, r.origin.y, r.size.width, r.size.height);
        r = [win convertRectFromBacking:r];
        LOG(@"setNeedsDisplayInRect(%g, %g, %g, %g)", r.origin.x, r.origin.y, r.size.width, r.size.height);
        [layer setNeedsDisplayInRect:r];
+
+       time = dispatch_time(DISPATCH_TIME_NOW, 8 * NSEC_PER_MSEC);
+       dispatch_after(time, dispatch_get_main_queue(), ^(void){
+               [layer setNeedsDisplayInRect:r];
+       });
+
        [myContent enlargeLastInputRect:r];
 }
 

I'm going to make a PR with this.

@rsc
Copy link
Contributor Author

rsc commented Jun 19, 2019

Thanks for #270. All of us seem to be seeing this on 10.14.5. I wonder if it is just buggy, and I wonder if 10.14.6 will fix it. I have not had the best luck reporting bugs to Apple, unless I find a way to dress them up as security problems.

@rsc rsc closed this as completed in #270 Jun 19, 2019
@mccoyst
Copy link
Contributor

mccoyst commented Jun 21, 2019

👍 It's working for me, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants