Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport Surfman GL Backend to Hal 0.5 #3216

Closed

Conversation

zicklag
Copy link
Contributor

@zicklag zicklag commented Apr 16, 2020

This backports #3151 to 0.5. Only the changes to gfx-backend-gl have been backported. No changes have been made to the examples or to CI or the Makefile. The examples still work using the glutin backend like they are already doing. 🎉

PR checklist:

  • make succeeds (on *nix)
  • make reftests succeeds ( can't test because I don't have vulkan )
  • tested examples with the following backends: gl
  • rustfmt run on changed code

Copy link

@monocodus monocodus bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is autogenerated code-style review, new suggestions: 2

src/backend/gl/src/window/surfman.rs Show resolved Hide resolved
src/backend/gl/src/window/surfman.rs Show resolved Hide resolved
@zicklag
Copy link
Contributor Author

zicklag commented Apr 16, 2020

Oops, I just realized that I copied the whole tree over from 0.5 and therefore am including some commits from 0.6 in my name. One second while I fix that. I was trying to figure out how the line additions and deletions were higher than the PR on the other branch 🤔 LOL

@zicklag
Copy link
Contributor Author

zicklag commented Apr 16, 2020

NVM, actually I don't think there is any way to do that properly in Git without getting all the other changes to other files from 0.6. I guess its stuck that way.

@kvark
Copy link
Member

kvark commented Apr 16, 2020

Typically I just do it by cherry-picking the commits on top of the release branch:

git fetch upstream hal-0.5
git checkout hal-0.5
git log master
git checkout -b 0.5-your-work-name
git cherry-pick <hash1>
git cherry-pick <hash1>
...
<edit Cargo.toml to bump the patch version>
<edit CHANGELOG.md>
make
git push origin HEAD

Is there a reason you couldn't do that for this PR?

@zicklag
Copy link
Contributor Author

zicklag commented Apr 16, 2020

It's because those commits have changes to the examples in them, too, that go along with the changes to the backend. I could do an interactive rebase afterwards to get rid of the changes to the examples, though. 🤔 I'll try that.

@zicklag
Copy link
Contributor Author

zicklag commented Apr 16, 2020

Do you know a way to grab the contents of a file at a given commit in git history? NVM 🤦‍♂️ . git checkout <ref? <file> duh.

@zicklag zicklag force-pushed the gl-via-surfman-0.5-backport branch from ed10177 to 646ae26 Compare April 16, 2020 02:46
Copy link

@monocodus monocodus bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is autogenerated code-style review, new suggestions: 2

src/backend/gl/src/window/surfman.rs Show resolved Hide resolved
src/backend/gl/src/window/surfman.rs Show resolved Hide resolved
@zicklag zicklag marked this pull request as draft April 16, 2020 02:47
@zicklag
Copy link
Contributor Author

zicklag commented Apr 16, 2020

Well I did it, but don't merge it yet because it is panicking on wgpu examples now. I'm not sure if there was some other changes to the gl backend that needed to be in there that I missed from hal 0.6. That was more trouble that I thought it would be.

I'll try to finish it off tomorrow.

zicklag added 8 commits April 16, 2020 20:26
Adds a surfman backend to gfx-backend-gl and makes it the default
backend for unix platforms.
- Re-order dependencies in examples Cargo.toml
- Make sure that GL is tested in reftests
- Remove cfg_aliases `use` from `build.rs`
- Make surfman take precedence over glutin when both features are
  enabled.
- Remove unused winit dependency
- Only include winapi when using the `wgl` feature
- Take out DEPTH and STENCIL flags from GL context
- Remove unnecessary drop() call
- Remove unneeded GL `cfg_attr`'s from warden
Fixes a segfault that could happen when enumerating adapters and fixed
the issue where only one adapter could be listed.
Re-adds some "unused" re-exports that were removed in the surfman
update. Other than the default features, this makes the surfman update
backward compatible with `hal-0.5`.
Added the missing `Headless` re-export for the glutin backend.
This finishes of the backport of the surfman changes to hal-0.5.
@zicklag zicklag force-pushed the gl-via-surfman-0.5-backport branch from 646ae26 to 65ca71d Compare April 17, 2020 01:38
Copy link

@monocodus monocodus bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is autogenerated code-style review, new suggestions: 2

src/backend/gl/src/window/surfman.rs Show resolved Hide resolved
src/backend/gl/src/window/surfman.rs Show resolved Hide resolved
@zicklag
Copy link
Contributor Author

zicklag commented Apr 17, 2020

Ah, bummer, I just found out that my attempt at fixing the context issues where the correct context was not being set current was flawed somehow and now the WGPU examples will panic with an invalid command error. That means that the master version will need to be updated too.

I think I might have to figure out why the screen turns black when I set the context current on dereference. Either way, somehow I need to figure out a solid pattern on when to set which contexts to the current context. This could be a bit more work, but oh well, still more to learn! 😃

@kvark
Copy link
Member

kvark commented May 1, 2020

@zicklag how is it going? Anything we could help with?

@zicklag
Copy link
Contributor Author

zicklag commented May 2, 2020

I've been really busy last few weeks and had a couple other weekend projects I wanted to touch on, but I might be able to take a new look at this today or tomorrow.

I think what needs to happen is we need to set the context in the main GlContainer to the current context on dereference, but there is some point in backend where that should not happen and the context needs to be accessed directly to avoid making it current. I'm assuming that is the reason that the screen renders black when setting the context current on dereference.

I don't really know of a good way to figure out why the screen is rendering black or where to avoid making the context the current context.

My only thought is to set a breakpoint on the make_current function and check the backtrace each time it gets called to try and find a place where it shouldn't be called.

I could probably use some help figuring out where it is going wrong, but I might need to get the code back to where it is making the context current on dereference first.

@kvark
Copy link
Member

kvark commented May 2, 2020

When you had problems with the blit function, and I was looking into this, precisely this situation was a problem: main GL context used to set itself up as active on dereference, and the blit was broken because of it. Maybe it's just the same problem here? Should be fairly trivial to address: just in a few places (during the blit) we'd need to go directly into the context without making it active. I can help investigating if you are stuck.

@zicklag
Copy link
Contributor Author

zicklag commented May 2, 2020

Yeah, it is actually exactly the same problem as the black screen with the blit we had earlier. The solution I used at the time was just to not set the context current on dereference.

The issue with not setting it current on dereference is that we were running into segfaults if you did things in the wrong order because we weren't setting the context to current when we needed to.

So I think we need to bring back setting it current on derefernce, but find out where we shouldn't be doing it. Like you showed when we looked at this, we need to not do it during the blit, but there must be somewhere else as well that we need to not make it current so that the screen doesn't render black.

It would be great to get some help on this. 😃

@zicklag
Copy link
Contributor Author

zicklag commented May 2, 2020

I'll work right not on getting this PR to the point where I think it should be, other than the black screen which needs to be investigated.

@zicklag zicklag force-pushed the gl-via-surfman-0.5-backport branch from 65ca71d to cea1730 Compare May 2, 2020 17:14
@zicklag
Copy link
Contributor Author

zicklag commented May 2, 2020

OK, I just pushed a commit to get the PR where I want it ( other than the black screen ).

Because this is the backport branch the examples are still using the glutin backend, so you will have to grab the quad example from master ( which uses surfman ) and update the gl features in the Cargo.toml:

git checkout master examples/quad
[features]
# Change the gl features
gl = ["gfx-backend-gl", "gfx-backend-gl/surfman", "gfx-backend-gl/surfman-x11"]

Then you can run the quad example and see the black screen.

@kvark
Copy link
Member

kvark commented May 3, 2020

Looking at this code now. Have a few questions.

Since we have a separate GL context per surface, and we need to blit from one to another, we have to have a place somewhere at the surface creation where we say "please use shared resources from that other context", where the other context belongs to the instance. I'm not seeing where this happens, so maybe you don't have the contexts sharing anything in this PR?

The docs on surfman 0.2 appear to only describe a scenario where you'd want to get the results of a surface context into another one. Our scenario is the opposite though, it's sad that it's not covered in any way.

I do recall though that this missing piece did present in the code you merged to master. Could it have been lost somehow?

@kvark
Copy link
Member

kvark commented May 3, 2020

I tried to cleanup and re-implement the present_by_copy logic, and I found out that this line was particularly suspicious: https://github.com/gfx-rs/gfx/pull/3216/files#diff-11d4b77124a4a2d39bbc81d17b96e104R278

            gl.surfman_device
                .read()
                .present_surface(&gl.surfman_context.read(), &mut surface)
                .expect("TODO");

You are presenting surface with the context of the instance? Documentation clearly states that this has to be the same context that was used for surface creation, which is swapchain.context.read()

Interestingly, it doesn't work in my own try here - kvark@21edb4f#diff-11d4b77124a4a2d39bbc81d17b96e104R288
The present crashes with:

thread 'main' panicked at 'invalid context', /Users/dmalyshau/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/io-surface-0.12.1/src/lib.rs:141:17

I don't have an explanation for this. So that's the second mystery.

@zicklag
Copy link
Contributor Author

zicklag commented May 4, 2020

Thanks for taking a look at this.

Since we have a separate GL context per surface, and we need to blit from one to another, we have to have a place somewhere at the surface creation where we say "please use shared resources from that other context", where the other context belongs to the instance. I'm not seeing where this happens, so maybe you don't have the contexts sharing anything in this PR?

I think that the resources created by the same device are shared or something like that, maybe.

The surfman README says:

OpenGL's share context or share lists feature allows you to share textures across contexts. However, this often exposes driver bugs, and, even if it works, it causes most operations to take mutex locks. Efficient texture sharing requires the use of platform-specific APIs, which surfman abstracts over.

So it looks like it only allows you to share textures across contexts? I'm assuming that contexts created with the same surfman::Device are automatically configured with the share lists or whatever is necessary for resource sharing, but again, maybe that is only textures.

I do recall though that this missing piece did present in the code you merged to master. Could it have been lost somehow?

I don't think I lost anything. We'd have to double check. We could look back at the PR to master to see if anything is in there, but I don't think there is anything extra on the master one.

You are presenting surface with the context of the instance? Documentation clearly states that this has to be the same context that was used for surface creation, which is swapchain.context.read()

Yeah, that looks wrong, it should probably just be:

        // Present the surfman surface
        #[cfg(surfman)]
        {
            let mut surface = gl
                .surfman_device
                .read()
                .unbind_surface_from_context(&mut swapchain.context.write())
                .expect("TODO")
                .expect("TODO");
            gl.surfman_device
                .read()
                // Present the swapchain context instead, which is the one that
                // is created when `create_surface` is called.
                .present_surface(&swapchain.context.read(), &mut surface)
                .expect("TODO");
            gl.surfman_device
                .read()
                .bind_surface_to_context(&mut swapchain.context.write(), surface)
                .expect("TODO")
        }

When I do that, though, I still get a black screen.

@zicklag
Copy link
Contributor Author

zicklag commented May 4, 2020

Not discounting other potential problems, but I think that the we can essentially prove that the current issue with the black screen has solely to do with calling make_context_current with the proper timing.

In the current state of the PR, if you simply comment out this line:

https://github.com/gfx-rs/gfx/pull/3216/files#diff-b42bd9be6c2e5cf60482e1230ad3007cR152

impl Deref for GlContainer {
    type Target = GlContext;
    fn deref(&self) -> &GlContext {
        #[cfg(not(wasm))]
        //self.make_current();
        &self.context
    }
}

And add a line here: https://github.com/gfx-rs/gfx/pull/3216/files#diff-3b1612b9f8249907b53c59e1c50b70bdR305

    fn supports_queue_family(&self, _: &QueueFamily) -> bool {
        self.device.write().make_context_current(&self.context.read()).expect("TODO");
        true
    }

That will work regardless of the order of these two lines in the quad demo:

       let adapters = instance.enumerate_adapters();
       let surface = unsafe {
            instance
            .create_surface(&window)
            .expect("Failed to create a surface!")
        };

Without that line in the supports_queue_family function, though, the quad demo break if you re-order those lines and enumerate the adapters after creating a surface: 'Framebuffer does not have any image attached'.

Even with that line in the supports_queue_family function, though, the WGPU cube demo crashes with a segfault. The only way to fix the segfault in WGPU so far is to re-enable the make_current on GlContainer deref, which presents us with the black screen again.

@kvark
Copy link
Member

kvark commented May 4, 2020

Hmm, this all sounds like magic to me. I want to see where we are actually telling the contexts that they need to share resources.

@zicklag
Copy link
Contributor Author

zicklag commented May 4, 2020

We'll probably have to look at the surfman source code, because I don't think the surfman API even has anything to do explicit sharing. It is all automatic as far as I know.

@kvark
Copy link
Member

kvark commented May 4, 2020

Hmm. In glutin, if you want to make the context shared, you have to specify, which context you are sharing with - https://docs.rs/glutin/0.23.0/glutin/struct.ContextBuilder.html#method.with_shared_lists
Seems strange that surfman could imply that somehow.

@zicklag
Copy link
Contributor Author

zicklag commented May 4, 2020

Well you have to create an manage all contexts with a Device, so could it make sure that all contexts created by the device are shared with some internal context inside of device maybe?

@kvark
Copy link
Member

kvark commented May 7, 2020

@zicklag I played with it a bit more and made some progress, but haven't resolved the problem yet. Want to share it to see if you have any ideas.

The code is in https://github.com/kvark/gfx/tree/surfman-try
It does 3 interesting things:

  1. it refactors the lifecycle of the surfman surface. I don't think it fixes anything, but it's closer to what we want in the end. The surface lives on our Surface, and the procedure of binding it to surfman context, blitting into it, and then unbinding it, is totally local. So anywhere outside of the present_surface_by_copy the surface is not bound.
  2. when it blits the frame into the surface, it also clears a small rectangle with green color. This is to test if the presentation of the target framebuffer works. And it does! I think this is good news. Now we need to figure out why the blit doesn't result in anything.
  3. right before presenting, we are also clearing the swapchain's FBO with a small red rectangle. This is to make sure that the source of the blit contains something. It doesn't show up in the blit though!

My take from these experiments is that the contexts don't properly share resources, and thus the swapchain.fbos[0] changed in one context doesn't show the contents in another. Please have a look at try to make sense of it, maybe you'll be able to advance from there.

@zicklag
Copy link
Contributor Author

zicklag commented May 7, 2020

That sounds like great progress! I'll mess with it when I get the chance, hopefully over the weekend.

right before presenting, we are also clearing the swapchain's FBO with a small red rectangle. This is to make sure that the source of the blit contains something. It doesn't show up in the blit though!

Hmm, I wonder if what was actually happening this whole time when it was properly displaying was that the surface's context was the active context and all draw's were actually happening directly to the surface's context, leaving the blit to never actually do anything because what we were rendering was already being written to the surface.

Then when we actually fixed the blit and therefore covered up anything that we were rending directly to the surface and ended up with a blank screen. ( I'm not actually sure if that's how that works, just a thought. )


I wonder if by sharing textures like we thought about in the first place to avoid the blit would actually fix this issue...:thinking:

Anyway I'll check it out, thanks!

@kvark
Copy link
Member

kvark commented May 7, 2020

Alright, shortly after writing the findings I figured that the lack of sharing is the only explanation. And this is something I asked about earlier in #3216 (comment) . Inspecting surfman code showed that it doesn't even try to share anything, at least on macos/CGL. I prototyped the sharing in https://github.com/kvark/surfman/tree/share-surface and got it working:
Screen Shot 2020-05-07 at 11 23 12

I don't know if this is in scope of surfman, so I filed pcwalton/surfman#65 . Hoping to hear back from Patrick soon.

P.S. here are the relevant backend changes

@zicklag
Copy link
Contributor Author

zicklag commented May 7, 2020

Wow, that's sweet! I wonder if it is the same on X11. Or maybe your branch without surfman changes will work on X11. I'll try to test it tonight if I can.

@kvark
Copy link
Member

kvark commented Jun 12, 2020

@zicklag fwiw, surfman now contains the context sharing that I merged. We can make it work.
I think we should do so on master first though.

@zicklag
Copy link
Contributor Author

zicklag commented Jun 12, 2020

I think we should do so on master first though.

I agree. It'll probably be rather easy to update with the sharing because it is pretty much in place already, so I'll probably just do that sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants