Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API proposal : extension.getContexts() #334

Closed

Conversation

justinlulejian
Copy link

@hanguokai
Copy link
Member

this can be used to target messages to send using runtime.sendMessage()

Yes, this api is very useful in this scenario. Because sendMessage() send a message to all contexts but only get one(the first) response. In some use cases, I want to find a target first (usually a tab), then only communicate with it.

Currently, I use some workaround ways. For example,

// This page can be in tab mode or popup mode
chrome.runtime.onMessage.addListener(function(request, sender, sendResponse) {
  if(request.type == "SomeType") {
    chrome.tabs.getCurrent(tab => {
      if(tab) { // in tab mode
        sendResponse('ok');
      } else { // in popup mode

      }
    });
  }
});

Another example, find an extension tab from foreground by getViews and check location.pathname.

let tabWindows = chrome.extension.getViews({type: "tab"});
let win = tabWindows.find(w => w.location.pathname == '/a_specific_page.html');
if(win) {
  win.someMethod(); // call some method on it
} else {

}

A typical use case is that implement single tab mode: only open a page in one tab, if it is already open then active it (not open multiple same pages).

Another problem of current API is that chrome.tabs.query({url: 'a current extension page url'}) doesn't work without the "tabs" permission.

@hanguokai
Copy link
Member

So I suggest chrome.runtime.sendMessage() supports specifying a context as target to send.

@justinlulejian
Copy link
Author

So I suggest chrome.runtime.sendMessage() supports specifying a context as target to send.

@hanguokai Hi Jackie! Agreed that this could be a useful place to allow Contexts to be specified and something that should be done. I've committed a section to future work since it seems beyond scope for the initial implementation. PTAL and let me know if the wording accurately reflects your suggestion.

proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Show resolved Hide resolved
proposals/extension_get_contexts.md Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
Copy link
Member

@dotproto dotproto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review contains a number of suggestions to

  • normalize the use of HTML in GitHub Flavored Markdown
  • clean up minor grammar issues

More substantive issues will be addressed in separate comments.

proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
proposals/extension_get_contexts.md Outdated Show resolved Hide resolved
@Rob--W
Copy link
Member

Rob--W commented Dec 19, 2022

Here's my feedback, grouped by three themes:

  • Use cases?
  • Prior art
  • Properties and TOCTOU

Use cases?

What are the use cases depending on this proposal?
From the proposal I can extract 3 motivations:

  1. Primarily, an attempt to fill the gap left behind by the absence of extension.getViews in workers,
  2. A way to query the active views of an extension,
  3. A way to target them for messaging purposes.

1 and 2 are vague, and it would be nice to include more background on why such functionality is sufficiently important to be a necessary part of the extension APIs, and not already achievable in other (not super hacky) ways.

3 is useful on its own, and the need for a way to uniquely target a specific destination has been requested before (#294). It is even a prerequisite to improve the performance of extension messaging (elaborated at #293 (comment)).

Prior art

The proposal here is a successor to extension.getViews(), to fill a gap in Service worker-based extensions.
On the web platform, there is already an API designed for that (i.e. Client) to enable SW to communicate with web pages and other workers that share the same origin (the opposite direction is via navigator.serviceWorker). Despite the existence of these web platform APIs, I still see value in an extension-specific API to query the views for messaging purposes, for at least two reasons:

  1. the API works for extensions based on service workers and event page-based (Proposal: Limited Event Pages for MV3 #134) and
  2. allows for extension-specific restrictions/relaxations.
    For example, "incognito" (or private browsing) generally offers isolation between otherwise same-origin contexts. In extensions, this line is more blurry, and extensions may intentionally want to reach other contexts.

"Context" and "ContextType" are quite generic. There is even already a contextMenus.ContextType type. If possible it would be nice to have a more specific name, such as ExtensionContext and extensionContextType.

Properties and TOCTOU

The proposed API is async and resolves to an object whose properties are conceptually not immutable. Without clear use cases, this may be a recipe for TOCTOU bugs. So let's take a critical look at the fields to determine whether they are really necessary (which is dependent on the use cases).

Property name Immutable Always well-defined Alternative (in the target context)
tabId yes no, e.g. background contexts, potentially ambiguous for devtools/sidebar extension messaging + MessageSender or tabs.getCurrent()
windowId no, e.g. tab can be moved to different window. no, e.g. background contexts extension messaging + MessageSender or windows.getCurrent()
documentId yes no, e.g. workers webNavigation API (see Chrome blog post about Instant Navigation)
frameId yes no, e.g. workers extension messaging + MessageSender or even (synchronous) runtime.getFrameId() (MDN
contextType yes no, e.g. embedded options page, devtools_page, etc. Generally the (target) context knows where it is running, potentially by duck typing
url no no; "current URL of the tab." does not work for non-tab contexts. Mostly yes if "url" reflects the context's URL (mostly, because the URL can change at any point in time). web platform APIs: document.URL or location
origin yes, provided that the document did not unload, e.g. documentId did not change. yes web platform API: origin
incognito no yes extension.inIncognitoContext

@justinlulejian
Copy link
Author

justinlulejian commented Dec 22, 2022

@Rob--W Apologies I think by committing I prevented a directly reply to your comment, responding below

Here's my feedback, grouped by three themes:

  • Use cases?
  • Prior art
  • Properties and TOCTOU

Use cases: a possible example is to check that a popup is open. Otherwise the hacky way is to have Contexts send keepalive messages back to the background script which maintains a table of what is open or not at any given time. A popup is a common example I can think of, but any other Context an extension wants to often know is open without having to manually keep track of it would benefit from this. Unless there's another way of keeping track of the idea of a Context, every extension would implement this differently and doing such state tracking sounds prone to bugs when the tracked state doesn't match what's actually running. If that seems like sufficient use case I can add a section to the proposal to have that for future reference.

For uniquely targeting a specific destination, this brings us closer to that goal. Currently there's just broadcast, but now an extension could check if a Context is running before attempting to send a message to it to be a bit more efficient.

I'll change Context/ ContextType to ExtensionContext/ extensionContextType -- thank you for the suggestion that seems much less overloaded.

TOCTOU:
Since the API is asynchronous there is the chance for these bugs to occur because of that, but it is something to keep in mind. We'd perhaps have to consider how to handle the case where, for example, the windowId would be used in another API call but has since changed and very confusing to extension developers when it happens. Is there prior art of figuring this out in other parts of the extension's ecosystem since these are not new properties?

Something I probably didn't elaborate on when describing these properties is that they might be empty/not defined for the items where it doesn't make sense -- but altogether would cover the various extension contexts that might be returned.

tabId: for non-tab contexts (e.g. background contexts, and etc.) this would be empty
windowId: same as above
documentId: empty for workers
frameId: same as above
frameUrl: empty for background contexts

origin: With the web platform API origin property, this wouldn't be that useful until/if we support content scripts which would have an origin outside the extension. So this could be dependent on if we want to include them in a (future iteration?) of the proposal?

For incognito: can an ExtensionContext change it's private status? I thought of it as a static but I'm new to the extension ecosystem. The idea for this property came from the case of spanning mode where you might get multiple ExtensionContexts back in and out of private contexts and want to treat them differently.

@Rob--W
Copy link
Member

Rob--W commented Dec 27, 2022

@Rob--W Apologies I think by committing I prevented a directly reply to your comment

You didn't break anything; I posted a standalone comment because my feedback was not tied to one specific line.
I was asking about use cases because I felt that something was missing from the API. I'll respond to your last comments in line, and elaborate on the gaps in the end.

Use cases: a possible example is to check that a popup is open. Otherwise the hacky way is to have Contexts send keepalive messages back to the background script which maintains a table of what is open or not at any given time.

That hacky approach wouldn't even work because it implies some persistent in-memory state in the background, whereas service workers are forcibly shut down at some point (currently 5 minutes in Chrome).

A popup is a common example I can think of, but any other Context an extension wants to often know is open without having to manually keep track of it would benefit from this. Unless there's another way of keeping track of the idea of a Context, every extension would implement this differently and doing such state tracking sounds prone to bugs when the tracked state doesn't match what's actually running. If that seems like sufficient use case I can add a section to the proposal to have that for future reference.

Even without querying the state of the popup, the existing extension messaging APIs (runtime.sendMessage / runtime.connect) can already be used to communicate with the popup if desired (and even track whether the context disappears - with port.onDisconnect). It is not strictly necessary to call extension.getContexts() first.

A common use of extension.getViews() (and extension.getBackgroundPage()) is to get a direct reference to the window object of the background or other views, in order to directly manipulate the DOM or global (e.g. caching values, rendering DOM). This use case cannot be supported by the proposed extension.getContexts() method because the functionality is incompatible with service workers.

For uniquely targeting a specific destination, this brings us closer to that goal. Currently there's just broadcast, but now an extension could check if a Context is running before attempting to send a message to it to be a bit more efficient.

Non-broadcast semantics is desirable and requested at #294. It is currently missing a way to unambiguously describe any extension context (whether a tab, worker, content script, popup, sidebar/sidepanel, devtools panel, ...). Solving that would be valuable.

For incognito: can an ExtensionContext change it's private status? I thought of it as a static but I'm new to the extension ecosystem. The idea for this property came from the case of spanning mode where you might get multiple ExtensionContexts back in and out of private contexts and want to treat them differently.

A context cannot change its private status. It is fixed at the start of the context (dependent on where the context is loaded) and immutable afterwards. Note that the table from my previous comment (#334 (comment)) lists the APIs that the recipient of the message can use to get the information to respond to the sender.

Something I probably didn't elaborate on when describing these properties is that they might be empty/not defined for the items where it doesn't make sense -- but altogether would cover the various extension contexts that might be returned.

I asked for use cases because the value of the getContexts() method and returned properties is dependent on their use cases. The PR mentions the next use cases:

  1. (from PR) "reading and/or modifying a settings page" - this is is not supported by getContexts(), extensions would have to use extension messaging anyway because they cannot directly access the DOM of UI pages such as the popup, options_ui, tab, etc.
  2. (from PR) "determining if a toolbar popup is open" - requested at https://crbug.com/1322432.
  3. (from PR) OFFSCREEN_DOCUMENTS type - referenced at https://source.chromium.org/chromium/chromium/src/+/main:extensions/common/api/offscreen.idl;l=63-70;drc=bcffcd423659a28620ece7e155d05c6b82ed13db

    Determines whether the extension has an active document. (...) Instead of this, we should integrate offscreen documents into a service worker-compatible getViews() alternative.

  4. (from PR) "avoid various workarounds needed to target specific pages with messages."

Use case 2 and 3 do not need any properties.
The two examples for use case 4 do not involve service workers: example 1 can continue to use getViews(). Example 2 is about direct access to methods of window and not supported by getContexts(). The broader interpretation of use case 4 (the ability to target a specific context for messaging) is #294.

Other use cases that I have encountered in practice is determining whether there is already an instance of an extension tab, and if not open it (and if there is, focus it). There are already various UI-specific APIs for this scenario (designed and implemented after extension.getViews()): action.openPopup(), runtime.openOptionsPage() and tabs.query().

Gaps

If I rewind, and start from the premise of "What gaps does the absence of extension.getViews() leave", then there are mainly two gaps:

  1. extension.getViews() offered the ability to find all active extension contexts that can be interacted with.
  2. "interacted with" is broadly defined as invoking any DOM API or extension API in the target context.
  • with the additional note that the target context is unambiguously identifiable due to the (synchronous) access to all globals/DOM in that context.
  • with the small note that we cannot support synchronous access due to the desire to support the method in service workers.

The first one is easy to satisfy by merely introducing the extension.getContexts() method proposed here.
The second one is very broad, but the most generic way to cover that is to somehow exchange messages between the current context (e.g. background) and the target context (e.g. popup), so that the target context can do something useful on demand. And for that, we would need an unambiguous way to describe the target context (for use with #294). The proposal currently returns a bag of properties, but the connection between these properties and a mechanism to reach the target context is unclear. It is currently not directly possible to determine which of the contexts is the current context (where the script is executing; for comparison: getViews() can be: chrome.extension.getViews().filter(w => w !== window)).

An easy answer could be to include "contextId" in the property set, and then support contextId in runtime.sendMessage and/or runtime.connect. The advantage of this is that this can finally fill the gap in extension APIs of uniquely describing a specific extension context, as tabId+frameId / documentId cannot describe non-tab contexts such as workers, devtools pages, sidebar, etc. A (minor?) disadvantage is that we'd be introducing yet another way to describe a context (after the recent introduction of "documentId"), and the mapping between contextId and the actual context may not be very obvious. But if such a unique ID were to be introduced, the following use cases could easily be covered as a bonus:

A more complicated answer would be to extend APIs that target a context (runtime.sendMessage, etc.) with a targetContext option that takes a dictionary whose keys are a subset of the ExtensionContext (Context) type described here. This is more complex, because we would have to define the properties that can uniquely identify a single context. The main advantage of this mechanism is that it would also be usable outside of privileged extension contexts (i.e. content scripts).

I think that it's a must-have to define how extension messaging is intended to be used with this getContexts() method, in order to reach a state where the API serves a useful alternative to extension.getViews().

@hanguokai
Copy link
Member

I think the ultimate goal should be two parts:

  1. Provide a way to find the target.
  2. Communicate with this specific target.

Provide a way to find the target

My first example in #334 (comment) is that the service worker communicate with the popup page by sending a message(note: this popup page can also be opened as a tab). This is a workaround for the problem 1) can't use extension.getViews() in service worker. 2) runtime.sendMessage()'s broadcast behavior.

My second example is finding a specific extension page first, then communicate with it by calling a method on it's window object. Because if you get a window, it is unnecessary to send a message(and sendMessage is broadcast, not only to the target).

The point of these two example is on how to find a specific target, not how to communicate. Here communication is the means to find the target.

Current problem of finding the target

  • extension.getViews() can't be used in service worker
  • sendMessage() is not an effective means of finding the target.
  • runtime.openOptionsPage() only for the options page, can't be used for other extension pages.
  • browser.tabs.query({url}) by url can't work, if you don't have the "tabs" permission, but this permission should be unnecessary for finding an extension page itself.

I know Client API is another workaround for both finding the target and communicate with the target.

Communicate with this specific target

@Rob--W think this is unclear at present. He thinks this should be clarified with this proposal and proposes #294. I think the main purpose of finding a target is to communicate with it, so it should be clear how to do that in this proposal.

@rdcronin
Copy link
Collaborator

rdcronin commented Jan 5, 2023

Thank you all for the thoughtful feedback.

I agree with most of what's been said here. In my mind, I think there are two main benefits to this API:

  1. It provides a canonical way to determine if a given context is active. At the outset, a "context" here would be limited to contexts that commit to the extension origin (tabs, popups, background contexts, offscreen docs, etc), but we'd like to expand this in the future to include contexts like content scripts, as well.
  2. It paves the way (but does not itself directly implement) for different APIs to leverage a way to uniquely identify a context. The most obvious here is message passing (via sendMessage or connect), but there could be others in the future.

As has been mentioned, for 1), there are workarounds to this that exist today: predominantly, message passing itself (send a message from a created context and track them in the interested context; or, send out a message from the interested context and see if there's a reply). This, to me, seems hacky and undesirable — it pollutes the message channels (which are currently broadcast everywhere), it's a bad fit for ephemeral contexts like service workers and event pages (since a message port can extend the lifetime of that context, and any incoming messages could result in the background context being spun up, even if it's not needed), and just feels generally "clunky". getViews() can be used in some cases, but a) is only available in non-worker contexts and b) is fundamentally incompatible with any contexts that may run in different processes (content scripts, incognito).

There's also existing requests for a way to tell if an context is open, such as determining if an action popup is open and determining if an offscreen document is present. Rather than requiring either the use of message sending to determine if these contexts are active or introducing different methods for each individual context type we might be interested in (action.hasOpenPopup(), offscreen.hasDocument(), etc), it would be cleaner to introduce a canonical way to determine this.

This API is targeted at achieving 1).

After we have this API established, we'd like to use it to solve 2) — targeted messaging. I'd like to avoid blocking this proposal on fully spec'ing what that change looks like, because I think it has its own set of considerations (do we want to more fully differentiate between "broadcast" methods and non-"broadcast" methods? Are there other implications in the existing messaging APIs that we'd need to reconsider?).

Proposal: We introduce contextId to the bag-of-properties returned. I was originally thinking this would be done as part of 2) (since it's not necessary for 1)), but I can understand the desire for it to be added here. We agree that the messaging changes will, in essence, be that "we provide a way for an extension to specify a contextId to which a message should be sent", without firm commitments on the method signature or related aspects.

Given that, does this proposal seem reasonable at a high level? I.e., providing an asynchronous method to determine which contexts are active for an extension by providing a bag-of-properties descriptor for each, which can later be used to target specific messages.

If so, I agree there are other aspects we may want to discuss more. E.g.:

  • Rob called out (and Justin and I were also discussing) that URL is ambiguous. I think we'll probably want to change this to be documentUrl or similar, and have it be undefined for service worker contexts. (We could potentially introduce a scriptUrl for worker contexts, which could also be set for context script contexts in the future, if we felt that was valuable.) We'd probably want to do something similar for origin.
  • We may re-brand context type, if we feel it's ambiguous.
    We can definitely iterate on these, but I'd like to get consensus on the general shape of the API first.

Regarding TOCTOU:

In short, yes, there are TOCTOU risks. Given extension contexts can run on different threads (worker vs main) and in different processes (content scripts, incognito contexts), I think asynchronous behavior is largely unavoidable. (And, even if the extension were able to synchronously determine if a context were active, subsequent actions — like sending a message — are also async, and so have the same issue.) Rather than trying to fully avoid TOCTOU issues, I think it's better to ensure we provide developers with the proper tools to cope with asynchronous APIs and events. For instance, documentId and contextId can be used to help avoid issues that may arise.

@Rob--W
Copy link
Member

Rob--W commented Jan 12, 2023

In my mind, I think there are two main benefits to this API:

  1. It provides a canonical way to determine if a given context is active. At the outset, a "context" here would be limited to contexts that commit to the extension origin (tabs, popups, background contexts, offscreen docs, etc), but we'd like to expand this in the future to include contexts like content scripts, as well.

  2. It paves the way (but does not itself directly implement) for different APIs to leverage a way to uniquely identify a context. The most obvious here is message passing (via sendMessage or connect), but there could be others in the future.

(...)
This API is targeted at achieving 1).
(...)

Given that, does this proposal seem reasonable at a high level? I.e., providing an asynchronous method to determine which contexts are active for an extension by providing a bag-of-properties descriptor for each, which can later be used to target specific messages.

This looks reasonable at a high level.

If so, I agree there are other aspects we may want to discuss more. E.g.:

* Rob called out (and Justin and I were also discussing) that URL is ambiguous.  I think we'll probably want to change this to be `documentUrl` or similar, and have it be undefined for service worker contexts.

Agreed. "url" / "frameUrl" should be renamed to "documentUrl".

(We could potentially introduce a scriptUrl for worker contexts, which could also be set for context script contexts in the future, if we felt that was valuable.)

There may be multiple scripts running in a worker or content script world. The only stable choice would be the first script that is running in it.

We'd probably want to do something similar for origin.

I think that origin should reflect the self.origin value of that execution environment. For documents and workers that's already well-defined in the web platform. In content scripts it's the origin of the document.

So far we've mostly explored the returned properties. I'd also like to examine the input to getContexts. The current specified behavior is to "return all contexts, optionally with some filters".
When expanded to include content scripts, the API's default behavior would be very expensive to run. That issue could be solved by adding a parameter in getContexts() to opt in to including cross-origin results, and default to only return contexts that are same-origin with the extension.

With extension.getViews(), there was an unambiguous way to identify the current context. The current proposal does not have that, but the result can be approximated by the context properties (contextType, documentUrl, etc.). Do we consider that enough, or do we want to introduce an extra parameter to getContexts() (e.g. currentContext: true) or even a runtime.getContextId() method (also available to content scripts?).

proposals/extension_get_contexts.md Show resolved Hide resolved

```
extension.ContextProperties = {
tabId? = int, // Find context by tab id, omitted returns all tabs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rephrase "returns all tabs" - there are contexts that are not tabs. Similarly for windows".


`sandbox` documents and iframes have an
[opaque origin](https://html.spec.whatwg.org/multipage/browsers.html#concept-origin-opaque),
should we return them? What if they come from within the extension (or outside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO: no. Sandboxed documents have a "null" origin and don't have access to extension APIs. They're hardly different from regular sandboxed iframes from the web.

run in a separate
[Renderer](https://developer.chrome.com/blog/inside-browser-part3/) (and
process) from the extension process. This proposal would not return those
`Context`s since this adds significant complexity to the design in querying all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/design/implementation/.

@rdcronin
Copy link
Collaborator

Hey folks — just a quick status update here. This has been on the back burner for the last couple weeks, but we still plan on making more progress on it this quarter.

@rdcronin
Copy link
Collaborator

rdcronin commented Mar 3, 2023

I'm going to be taking over this work for now. Since Github doesn't (I think?) support taking over someone else's PR, there's a new PR for this here:

#358

@oliverdunk
Copy link
Member

Closing this per Devlin's comment.

@oliverdunk oliverdunk closed this Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants