Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need an attribute for direct interaction elements on a touch screen #1215

Open
minorninth opened this issue Mar 18, 2020 · 53 comments
Open

Need an attribute for direct interaction elements on a touch screen #1215

minorninth opened this issue Mar 18, 2020 · 53 comments
Assignees
Labels
feature may add new concept(s) to ARIA which will require implementations or APG changes NeedsExplainer In order to progress this a more detailed explainer needs to be created
Milestone

Comments

@minorninth
Copy link

Screen readers with touch screen support typically include a "touch exploration" mode where you can tap or slowly drag around the screen and listen to feedback on what you're touching, before it activates. To actually activate, you double-tap.

There are a few cases where this is undesirable - like a virtual keyboard, or a signature pad. In those cases you want gestures to be passed through directly.

Some native accessibility APIs already have a way to specify this, like UIAccessibilityTraitAllowsDirectInteraction on iOS.

We should have a similar ARIA role or attribute for this.

I think at one point I suggested role="key" for a keyboard key, but I think the concept is more generic. Besides the signature pad, it could also be useful for a musical instrument, a game, or many other things.

It seems related to aria-interactive. The idea behind aria-interactive is that the control has its own keyboard support on a desktop computer. This idea is that the control has its own touch event support on a mobile touch device. They're quite similar!

So one idea would be: aria-interactive="touch keyboard mouse", etc. where you choose from various tokens.

Or, we could make it separate, like:

role="touch"
role="directtouch"
aria-touch="announce" vs aria-touch="activate"
aria-touchactivate="true"

@jnurthen jnurthen added F2FCandidate Candidate topics for F2F (or Virtual F2F) meeting NeedsExplainer In order to progress this a more detailed explainer needs to be created labels Mar 18, 2020
@cookiecrook
Copy link
Contributor

cookiecrook commented Aug 14, 2020

IMO, this should not be a role... It would be on a container (with its own role) that may have one or more accessible elements inside the container.

One native example is the real-time instrumentation view of Garage Band on iOS. You can touch once with VoiceOver to explore the layout (e.g. spatial placement of piano keys or drums) then subsequent touches pass through directly to let you play the instrument in real time... Touching outside the container will reset the behavior, so that you need to select the container again in order to get the subsequent realtime touches passed through.

@minorninth
Copy link
Author

Great example!

Can you provide any more technical details on how GarageBand implements this? Does GarageBand just toggle UIAccessibilityTraitAllowsDirectInteraction after you explore the instrument the first time? Or is UIAccessibilityTraitAllowsDirectInteraction always set and it changes what events it fires? Or is there some more advanced attribute that makes VoiceOver behave that way?

Any ideas for the attribute name?

It seems like great low-hanging fruit to implement since it'd be relatively straightforward to implement and ship on both iOS and Android without requiring any new native APIs.

@cookiecrook
Copy link
Contributor

Can you provide any more technical details on how GarageBand implements this?

GarageBand just exposes the trait on the container (and leaves it), and VoiceOver does the rest.

@cookiecrook
Copy link
Contributor

Here's a video demo that might work as an explainer. https://youtu.be/P056zcubhxQ

@cookiecrook
Copy link
Contributor

cookiecrook commented Aug 14, 2020

Any ideas for the attribute name?

This interaction style is unlikely to be limited to touch (an eye-tracker pass-through for example), but I don't like any of the other names. aria-pointer? aria-manipulate? I'm hopeful a better name will arise.

aria-touch: undefined | direct | ... (... open ended for future expansion if needed)

We have a VoiceOver design principle of "safe exploration" so users don't accidentally trigger unwanted or unknown behavior. For example, I would still expect VoiceOver and other SRs to announce the element on first touch (e.g. hear "signature" the first time then touch again to sign). I wouldn't want authors to be able to bypass VoiceOver's "touch to explore" behavior without at least the initial selection confirmation.

We should also consider safety restrictions... For example, there's risk that a web dev could put this on the body element and therefore break the VO user's experience for the whole page. However, there might be some legitimate reason for doing that, if the application is entirely self voicing.

@cookiecrook
Copy link
Contributor

aria-manipulation is growing on me.

Some of this may be complementary to the "activation point" discussion in #788.

@cookiecrook cookiecrook self-assigned this Aug 15, 2020
@cookiecrook cookiecrook added this to the ARIA 1.3 milestone Aug 15, 2020
@carmacleod
Copy link
Contributor

carmacleod commented Aug 15, 2020

Please also read through the (very up-in-the-air, but has some points) discussion about aria-interactive in #746.
If it is possible to merge the ideas into one concept, then maybe that would be the most universally useful?

What role would that piano keyboard have in a web app? (Heh, "application"? With roledescription="keyboard"?) :)

@cookiecrook
Copy link
Contributor

What role would that piano keyboard have in a web app?

Probably a container role (main in the case of that specific UI) with individual buttons for each piano key.

@minorninth
Copy link
Author

I'm not convinced that the overlap with aria-interactive is that high. None of the use cases in the aria-interactive bug would likely need direct touch support.

aria-manipulation is an interesting idea for a name, what would the possible values be?

I think I'm feeling more strongly that either "touch" should be in the name, or the value. This really is specific to touch.

@cookiecrook
Copy link
Contributor

@minorninth wrote:

aria-manipulation is an interesting idea for a name, what would the possible values be?

Same as above?
aria-manipulation: undefined | direct | ...

Open ended values for future expansion if needed.... For example, iOS VO's keyboard typing modes are somewhat like variants of direct touch.

I think I'm feeling more strongly that either "touch" should be in the name, or the value. This really is specific to touch.

I think we could live with aria-touch, but is it really specific to touch? Electronic document signatures as a use case came up again recently (e.g. DocuSign)… Would use of a stylus or a laptop trackpad count as "touch"?

@cookiecrook
Copy link
Contributor

Or maybe aria-manipulate?

@cookiecrook
Copy link
Contributor

@jnurthen @carmacleod this issue has a "NeedsExplainer" label on it. What should that cover that isn't explained in the description? If this thread covers it sufficiently already, I can draft a PR, or we can make it an agenda item on an upcoming weekly call.

@jnurthen
Copy link
Member

jnurthen commented Dec 9, 2020

@cookiecrook if you think there is enough to draft a PR then please go ahead. I think it would be handy to have a little more detail so AT know what they would need to do with such a feature - but that can certainly be added later.

@minorninth
Copy link
Author

minorninth commented Dec 9, 2020 via email

@cookiecrook
Copy link
Contributor

@dlibby- mentioned considering this in the context of CSS touch-action https://developer.mozilla.org/en-US/docs/Web/CSS/touch-action

@cookiecrook
Copy link
Contributor

@cookiecrook for the case of DocuSign, can you think of a different behavior that you'd want to enable with any existing AT and a mode other than touch? For example, does VoiceOver on Mac have support for signing on the trackpad or anything like that?

VO on Mac has Trackpad Commander, which still uses touch, but works a little differently in that it's not tied to a finite spatial layout like a touch screen... The trackpad coordinates are relative; mapped to the coordinates of the element in the VO cursor, without regard to aspect ratio.

@cookiecrook
Copy link
Contributor

cookiecrook commented Dec 9, 2020

@minorninth wrote:

@cookiecrook for the case of DocuSign, can you think of a different
behavior that you'd want to enable with any existing AT and a mode other
than touch?

I thought of one more this morning... Switch Control on iOS has a freehand path feature that is somewhat deep in submenus by default, because its usage isn't common. Mainly used for drawing apps.

Surfacing the "direct touch" nature of an element would allow the AT to surface those lesser used AT features more conveniently. For example, the freehand (and multi-touch) options could be moved to a temp space in the main Switch Ccontrol menu, similar to how we surface Actions when available.

@cookiecrook
Copy link
Contributor

cookiecrook commented Dec 9, 2020

I haven't considered fully, but there may be a case for multiple mix-and-match values, and an all value (equivalent to direct). I'm not certain how the implementations would differ though, or if this is necessary.

aria-manipulate: undefined | freehand | multitouch | … | all

@ckundo
Copy link

ckundo commented Dec 10, 2020

another use case I wanted to share, maybe an extension of drawing, is dragging and transforming objects in a 2D or 3D canvas context. ideally the author would have keyboard handling for these kinds of operations as well, but on touch or for low vision users using screen readers, it'd be helpful to have this feature.

@fightliteracy
Copy link

Having a distinction between only one finger and multi-finger requirements would probably be useful. If an area requires multi-fingers, other system wide gestures have to be ignored.

Would freehand correspond to "single finger" mode?

@cookiecrook
Copy link
Contributor

cookiecrook commented Dec 11, 2020

Having a distinction between only one finger and multi-finger requirements would probably be useful. If an area requires multi-fingers, other system wide gestures have to be ignored.

Good point. Maybe single versus multi-touch is the only distinction that matters... I'm coming back around to Dominic's initial attribute name aria-touch... it's less likely to be misunderstood by web authors. Values might be undefined | single | multiple? multipoint?

Do we anticipate that any element implementing this should be self voicing? All the examples I can think of should "self voice" either through sound (e.g. the piano keys) or speech via an ARIA live region.

Also, draft should include a note reminding web authors to respect touchcancel events.

@cookiecrook
Copy link
Contributor

cookiecrook commented Dec 11, 2020

@ckundo wrote:

dragging and transforming objects in a 2D or 3D canvas context.

If object based (with sub-DOM notes or TBD AOM virtual notes), that could be a use case for the Issue #762 aka user actions. But yes, canvas would have to be entirely self implemented with current API so a "direct touch" equivalent could assist in that.

@fightliteracy
Copy link

Aria-touchpassthrough would be clearer to be

Single / multiple then makes sense (to me)

I think most applications would require some form of aria-live feedback but I can also imagine something that just needed you to enter a signature might not need any self voicing /aria-live

@cookiecrook
Copy link
Contributor

@minorninth @carmacleod @jnurthen What do you think about touchpassthrough? Verbose, but definitely the most author-understandable suggestion so far.

@minorninth
Copy link
Author

minorninth commented Dec 11, 2020 via email

@fightliteracy
Copy link

fightliteracy commented Feb 19, 2021 via email

@minorninth
Copy link
Author

Would there be a token that could account for all input passthroughs?

You mean, like aria-passthrough="all"?

I guess my question is, are there any platforms where mouse clicks are not currently passed through, even when a screen reader is running? I thought that only touch inputs were captured by the screen reader. I'm not as sure about stylus.

I think we should leave open the possibility of "mouse" or "all", but not actually specify something that we couldn't implement in practice yet, or that wouldn't actually be helpful or useful in practice.

@patrickhlauke
Copy link
Member

maybe describe more what the characteristic of the element is, rather than what the AT/UA should do? maybe something like aria-allowsdirectmanipulation="true" or something (i've been toying with calling this sort of thing "direct manipulation" over in the Pointer Events spec, FWIW...an imperfect name still, but a bit more generic)

@minorninth
Copy link
Author

One worry I'd have about that is that someone could plausibly argue that a slider supports direct manipulation. But in practice, what users might actually want is a way to set the slider to a discrete value in a well-controlled way - which is not necessarily via direct manipulation. Adding this attribute to a slider could actually make it less accessible because it'd be incredibly difficult for some users to focus it without accidentally modifying it.

So we only want this to apply to something that requires direct manipulation, not just something that supports it.

It is somewhat related to aria-interactive. One problem we were trying to solve there is that modal Windows screen readers often intercept keys, but some controls need all keys passed through. role=application is an imperfect solution; what's really needed sometimes is for a screen reader to treat it similarly to a text box or list box, where most keys still go to that control while it's focused, and it automatically enters "focus" mode / forms mode when focused.

I still think they're similar but not the same thing, but I'm open.

If we wanted to combine them, we could say:

aria-interactive="touch mouse keyboard", which would mean that touch, mouse, and keyboard events should not be intercepted by AT if at all possible and should be passed through to this element when it would be the event target.

Another difference, though, is that "touch" could be implemented now by user agents because many platforms already support some way to achieve touch passthrough, whereas aria-interactive=keyboard would require screen readers to buy in and choose to implement it.

@patrickhlauke
Copy link
Member

So we only want this to apply to something that requires direct manipulation, not just something that supports it.

yeah, it would be opt-in with the attribute. maybe dropping the s on allows and making it aria-allowdirectmanipulation ?

@minorninth
Copy link
Author

To be more concise, how about:

aria-directinput

So possible values might be:

aria-directinput="touch"
aria-directinput="keyboard"
aria-directinput="touch mouse keyboard"

@patrickhlauke
Copy link
Member

does it need to specify any input mechanism at all? (if so, there's also pen)
as an aside, for windows-based AT, this starts to sound a lot like role="application"'s effect for keyboard control.

@fightliteracy
Copy link

fightliteracy commented Feb 20, 2021 via email

@fightliteracy
Copy link

fightliteracy commented Feb 20, 2021 via email

@patrickhlauke
Copy link
Member

A difference is that this is usually used on a small region that’s not meant to impact the rest of the screen.

you'd likely want the same here as well, as otherwise you'd get in the way of users actually being able to confidently use touch-AT gestures?

@pkra pkra modified the milestones: ARIA 1.3, ARIA 1.4 Jan 10, 2022
@pkra
Copy link
Member

pkra commented Jan 10, 2022

Moving this issue to the 1.4 milestone (matching #1319)

mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this issue Oct 14, 2022
This is an experimental proposed new ARIA attribute that would allow
users to directly interact with something like a virtual keyboard or
signature pad via a direct tap or drag on a touch screen, rather than
the default indirect interaction via touch exporation.

Spec bug: w3c/aria#1215
Design doc: go/aria-touchpassthrough

This change just adds a runtime enabled feature flag and maps the ARIA
attribute to an AXNodeData boolean attribute. Follow-ups will complete
the implementation for Android and Chrome OS.

Bug: 1169823
AX-Relnotes: N/A (behind a flag)
Change-Id: I19777695eb27a6602e4d70fdccf79a0adf8d1139
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2645214
Reviewed-by: Kent Tamura <tkent@chromium.org>
Reviewed-by: Chris Hall <chrishall@chromium.org>
Reviewed-by: Mike West <mkwst@chromium.org>
Reviewed-by: Meredith Lane <meredithl@chromium.org>
Commit-Queue: Mike West <mkwst@chromium.org>
Auto-Submit: Dominic Mazzoni <dmazzoni@chromium.org>
Cr-Commit-Position: refs/heads/master@{#848068}
GitOrigin-RevId: 25d18060e1bab22e6547fb664c15824fc8d0f704
@cookiecrook
Copy link
Contributor

This stalled a bit, but in general I think aria-directinput and aria-passthrough are good names. Slight preference for "directinput" since my hunch is that it's less prone to typographical errors. To address the comment about how to include all the variants, the WG could consider a catch-all token like all or true

aria-directinput: [none] | all | touch | mouse | keyboard | …

@cookiecrook
Copy link
Contributor

The more I look at this, the more I think it's unlikely web authors will get the modality-specific values right, especially a catch-all. Let's reconsider whether we need this now, or if we do, consider a more general value.

aria-directinput: [ undefined | direct ]

@jnurthen jnurthen removed the F2FCandidate Candidate topics for F2F (or Virtual F2F) meeting label Mar 2, 2023
aarongable pushed a commit to chromium/chromium that referenced this issue Jun 1, 2023
This CL removes the aria-touchpassthrough logic, which did not make it
into the aria spec.

The touchpassthrough concept is interesting, but the traction on it has
stalled, and implementation details are still being debated. The logic
added in Android created a regression, and since it is not being used
anywhere, we have decided to remove it rather than trying to make it
fit into the current framework when we don't even know the final spec
for the attribute, if there ever is one.

aria ticket: w3c/aria#1215


AX-Relnotes: N/A
Bug: 1430202, b/265493191
Change-Id: I1d82ea384761db271abead5796171e7987ba9c76
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4544402
Reviewed-by: Aaron Leventhal <aleventhal@chromium.org>
Reviewed-by: David Tseng <dtseng@chromium.org>
Code-Coverage: Findit <findit-for-me@appspot.gserviceaccount.com>
Reviewed-by: Bo Liu <boliu@chromium.org>
Reviewed-by: Mustaq Ahmed <mustaq@chromium.org>
Reviewed-by: Theresa Sullivan <twellington@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Mark Schillaci <mschillaci@google.com>
Reviewed-by: Will Harris <wfh@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1151986}
@frastlin
Copy link

frastlin commented Aug 8, 2023

Hello,
We are currently needing to create an entire set of mobile apps just to add direct touch, which pretty much defeats the point of using the browser.
I think the role="application" should require mobile screen readers to display the direct touch option, or when you tap on an area with a role=application, it should activate the direct touch with a screen reader specific way of exiting the application. I really think VO and Talkback should allow the user to enter direct touch or passthrough gestures whenever the user wants, but that's unrelated.
I don't think aria-passthrough is needed above role="application". There's no reason for both to exist. I don't think there's a need for half of the gestures to work with a screen reader, and the other half not work. Just as long as it's easy to switch in and out of the direct input mode, it would be like the input mode that's activated on desktop.
Using role="application" also doesn't require the API to change, and should be a pretty easy fix for browser developers to implement. In fact, the documentation on role="application" mentions touch, but iOS, as of today, doesn't provide direct touch as an option in the rotor when an element in the application container is focused. Here is the description from MDN: "The application document structure role, indicates to assistive technologies that this part of the web content contains elements that do not conform to any other known HTML element or WAI-ARIA widget. Any sort of special interpretation of HTML structures and widgets should be suspended, and control should be completely handed over to the browser and web application to handle mouse, keyboard, or touch interaction."
It would be useful to reiterate that role="application" also includes touch on mobile devices.

@jnurthen jnurthen added the feature may add new concept(s) to ARIA which will require implementations or APG changes label Sep 12, 2023
@cookiecrook
Copy link
Contributor

cookiecrook commented Sep 21, 2023

We are currently needing to create an entire set of mobile apps just to add direct touch, which pretty much defeats the point of using the browser.

@frastlin Can you explain more about this use case? I understand the technical need (and gave a music app as a user interface example above), but I think it would help if you could explain the user interface need you are abstractly referencing… potentially with pointers to the specific apps, if appropriate.

@frastlin
Copy link

frastlin commented Sep 21, 2023

We are making an inclusive digital map viewer that's accessible to blind users, and currently it works with a Bluetooth keyboard attached to an iPhone, but we can't activate direct touch with Voice Over (VO) on iOS to pass touch gestures to our ap:
https://audiom.net
We would like to embed this component into iOS apps through a web view, but if we can't activate direct touch, we're going to need to make a special native app for each mobile platform just to interact with touch screens.
There are also browser games that have done different work-arounds to bypass the browser's lack of flexibility with touch gestures and VO:
https://www.iamtalon.me/cyclepath/
They use the gyroscope and accelerometer instead of the touchscreen for the actual game play, which is fine, but is not what someone wants when they're viewing a map. We've done co-designs with blind users and they want to just be able to tap in a direction and move in that direction (which VO doesn't allow).
Similarly, Google sheets and Google Docs are limited in their usefulness on touch devices for VO users because the VO experience for HTML tables and rich text areas is horrendous, and there's nothing developers can do to make the experience better through custom gestures.

Numerous applications that are built with native Swift or Objective C code on iOS have direct touch including:
The Invisible Puzzle
All The OBJECTIVE ED Games
All the Blindfold Games
MBraille
Ariadne GPS
etc. Let me know if you need more, these are just the apps I can think of that are on my phone.

@frastlin
Copy link

Hello,
Note that using NVDA, there is also no touch passthrough by default. In order to get touch passthrough, the user needs to go into preferences / Settings / Touch Interaction / uncheck Enable touch interaction support. There is no NVDA touch command to enable or disable this interaction. The user also needs to go into touch settings on windows and uncheck 3 and 4 finger touch gestures.
As a developer and user, I would assume all this would be done when I enter an application area where all keyboard input is being sent to the application. Why would I think touch gestures receive special treatment?
I haven't tried using Jaws yet, but I would assume it would be equal to or worse than NVDA for this.

@zphrs
Copy link

zphrs commented Feb 27, 2024

Hi!
I'm currently working on creating a universally accessible digital deck of cards using WebGL, the gyroscope, voice synthesis, and touch gestures. Overall after reading through this thread I agree with Dominic's proposal:

To be more concise, how about:

aria-directinput

So possible values might be:

aria-directinput="touch"
aria-directinput="keyboard"
aria-directinput="touch mouse keyboard"

I think that having a catch-all would lead to developers overstating their app's abilities, resulting in apps which claim support for "all" but might be missing support for certain input types. I also think that with the addition of new forms of interaction to the web (like gamepad controllers and apple's eye tracking), it would make sense for developers to manually opt into any interactions that the web developer supports, encouraging developers to think about whether their app actually properly supports that gesture type. To allow for a "catch-some" allowing something like aria-directinput="pointer", which would enable passthrough for all pointer events, could allow an opt-in for touch, mouse, and pen without the developer having to specify all three.

I also agree that one-finger gesture support is plenty to allow applications to do what they need to do. This would also mean that if a developer supports touch then they also essentially support mouse, pen, and any other single-pointer interface as well. One UI pattern that I could think of which would allow more gesture-based quick actions would be a press and drag to open a flower menu, where dragging outward from the initial touchpoint toward one of the 8 cardinal directions would select one of the options in the flower menu. A press and hold without a drag could activate the menu and read out the options as well as the corresponding direction which it is located at.

For my application for now I'm just going to recommend that people disable their accessibility tool before starting the app and ensure that I manually call dictation when needed. Ideally though I would use this new api to avoid having to tell users to disable their accessibility settings, even temporarily.

@frastlin
Copy link

"I also agree that one-finger gesture support is plenty to allow applications to do what they need to do."
I'm a Voice Over user and can tell you this is absolutely not the case and should never be considered. The point of having this direct input functionality is to override the existing Voice Over gestures e.g., 2 finger double tap, 3 finger single tap, swipe to the right with 1 finger and hold, and 4 finger triple tap. It's really difficult for a Voice Over user to use a touchscreen application with only 1 finger. It would be like having a single fingertip sized hole you can see the app interface through on the screen. You would need to move that fingertip around the entire screen every time to find the next control. That would take hours.

The problem with telling a Voice Over user to disable their screen reader is that the browser has all kinds of junk (Bookmarks, tabs, the address bar) on the top and bottom of the screen, and the user will tap on these junk areas without meaning to which will take them out of the app.

Our current work-around is to create an app with a native WebView with direct touch enabled for the view, and enter a URL into the web view. This allows both direct touch and non-direct touch within the web view, but it's a bit janky, as in order to exit out of direct touch, the user needs to start activating the home screen by swiping up from the bottom, then before lifting their finger, they need to start using the rotor to deactivate direct touch. Voice Over doesn't have a way for users to easily stop direct touch. On the keyboard capslock is used as the universal screen reader key where the user can activate or deactivate direct keyboard input to games and other applications.

@zphrs
Copy link

zphrs commented Feb 27, 2024

"I also agree that one-finger gesture support is plenty to allow applications to do what they need to do." I'm a Voice Over user and can tell you this is absolutely not the case and should never be considered. The point of having this direct input functionality is to override the existing Voice Over gestures e.g., 2 finger double tap, 3 finger single tap, swipe to the right with 1 finger and hold, and 4 finger triple tap. It's really difficult for a Voice Over user to use a touchscreen application with only 1 finger. It would be like having a single fingertip sized hole you can see the app interface through on the screen. You would need to move that fingertip around the entire screen every time to find the next control. That would take hours.

To address this point I proposed a flower-like UI element where essentially the UI moves to wherever you touch. Maybe I didn't explain that point enough. To enable the flower you tap and hold anywhere and a voice will read out commands associated with gestures. For my card game for instance, it might read out:
"Up - hear board, Down - hear hand, Left - flip card, Right - open settings. Drag your finger against the screen in one of the directions.".
Then if you drag your finger in a direction - maybe up - it will say:
"hear board - release finger to confirm".
Additionally using other input such as gyroscope and keyboard inputs can further improve degrees of freedom in input, as you stated above.

I agree that ideally web developers would have full access to multi-touch but I worry that could disorient users who use assistive technology without very careful effort on the developer's side to explain how else to tab away from the element with direct input enabled (such as swiping from the bottom). One similar case is where code mirror gives the option to override the tab button's behavior for a code editor to allow the user of the editor to hit tab to indent. To ensure this isn't an accessibility nightmare, the developer must add text to the page which instructs the user how to tab away from the text input - typically by pressing the escape key immediately followed by pressing either tab or shift+tab.

The problem with telling a Voice Over user to disable their screen reader is that the browser has all kinds of junk (Bookmarks, tabs, the address bar) on the top and bottom of the screen, and the user will tap on these junk areas without meaning to which will take them out of the app.

That's a great point and it's why I made my app a Progressive Web App which means you can add it to your home screen to remove all of the browser UI elements from the interface. I can still see the annoyances with not having notifications read out while playing the game and with the swipe up from bottom gesture (to exit the app) being more sensitive. Like I said, ideally direct input will get merged into the browser and I would be able to use this API.

Our current work-around is to create an app with a native WebView with direct touch enabled for the view, and enter a URL into the web view. This allows both direct touch and non-direct touch within the web view, but it's a bit janky, as in order to exit out of direct touch, the user needs to start activating the home screen by swiping up from the bottom, then before lifting their finger, they need to start using the rotor to deactivate direct touch. Voice Over doesn't have a way for users to easily stop direct touch. On the keyboard capslock is used as the universal screen reader key where the user can activate or deactivate direct keyboard input to games and other applications.

It's awesome that you figured out a solution which works for your use case by making a native app to wrap a web view. Unfortunately, I have ideological and logistical reasons as to why that solution doesn't work for me. I love developing on the web because whatever I make is intensely accessible. There is no app install process necessary (aside from optionally adding it to your home screen) and there is no third party store which imposes certain rules, restrictions, and approval processes. All I need to do as a developer is upload my code to a CDN and now my app is accessible to anyone who is able to visit my website. Separately the restrictions that the web browser imposes on apps with sandboxing means that users, including me, generally trust visiting a website way more than downloading an app.

I think that the cautious flexibility of the web is great and I think that if you need more control than what the browser gives you then you should create a dedicated app. I think that breaking built in VoiceOver multi-touch commands is not worth having multi-touch available to web developers. I totally understand that a map app usually has some form of multi-touch for zooming and panning, but I would argue that a map app should also probably support single-touch gestures, just in case you don't have two fingers free while navigating the world. Apple Maps for instance uses a tap quickly followed by a drag to support zooming in and out with one finger. I have been trying to think of places where multi-touch is absolutely a must-have and I've been generally drawing a blank. If you have an example of a feature that would require custom multi-touch for intuitive interaction in your map app please by all means do share, I would love to hear about it.

@frastlin
Copy link

There are no apps that only require one finger to use with Voice Over. A basic news app has 4-finger single tap to get to the top of the screen, swipe down with 2 fingers to read content, single tap with 2 fingers to stop speech, and you use the rotor to adjust your reading level which is rotate with 2 fingers. Obviously, if you're asking Voice Over to pass through its gestures, it is on you to provide the needed functionality for the Voice Over user to use your app. This feature should only be used for non-semantic interfaces, and Voice Over users should always have a way to exit out of direct touch.
I think an all-or nothing experience is very much desired for games, like your card app. Of course you're going to need to have a guide like the other 829 Audio Games do to help people learn your gestures, but that's normal and expected. If you want to have that functionality for your game where you swipe with one finger, then that's great. Our map has over 60 key commands, and power users are going to want access to those commands through custom gestures on the interface. Someone can slowly use our interface with one finger, but it's slow and people who use it frequently are going to want to have custom gestures. I even have blind power users who hate the fact that WCAG requires focus to be at the top of the page when you open dialogues. They want minimal keystrokes to do the most things as fast as possible.
Just as long as Voice Over users can exit direct touch mode at any time, there should be no problem with passing through almost all gestures to an app. This is what happens if you focus on a button in the browser now, all keystrokes get passed to that button. This is perfectly fine and our team of blind developers and testers want a way to do the same for touch gestures.

@zphrs
Copy link

zphrs commented Feb 28, 2024

Thank for for explaining the demand, the convenience, and the normality of supporting multi-touch inputs. Now I totally get where you're coming from and I support having multi-touch as part of the spec.

I guess my question then is what default gesture do you think could be a good alternative for "tab" to move focus away from the element for touch devices? Naturally tapping with one finger elsewhere on the display would work well for touch areas which do not take up the whole display and I assume that the typical tab button will suffice on keyboard devices to tab away from the direct input unless of course that tab key is listened to and preventdefault() is called on that keypress event - in which case the developer would need to call blur() on the directinput element to manually reimplement that tab away behavior.

You mentioned that in your app you use a swipe from the bottom combined with a rotor gesture. I guess another question I have for you is whether that secondary rotor gesture is also really needed to release input from a usability standpoint or whether just a swipe up could suffice to break the user out of that touch area.

I recognize that ultimately the implementation will come down to how android and iOS handle it but I think a possible good option is that whatever the typical "go home" gesture on the touch device is (in addition to the tab input for touch devices with a keyboard) will deactivate the direct input and resume treating inputs with the typical browser behavior. I also assume that developers could add additional triggers to re-enable typical pointer input by calling focus() on a different element or calling blur() on the directinput element. I also assume that similar to the keyboard, developers could call preventdefault() on the touch/pointer event if they want to prevent the default deactivation from swiping up from the typical "go home" gesture.

What are your thoughts on this proposal @frastlin, and is there a better option that you can think of for exiting the directinput mode?

@frastlin
Copy link

Entering and exiting direct touch should really be a screen reader's responsibility, but yes, I agree that the go home command would be a good exit direct touch mode gesture. Direct touch currently allows this already, but it will minimize the application. To enter direct touch, the user would just activate it from the rotor like normal.
Direct touch needs to be the entire screen, it can't be just one area, which is why it's important to have the easy way of entering and exiting direct touch. This is because there's no way for the blind user to know where the direct touch area is, especially since they're probably swiping right to the element instead of finding it on the screen. All Audio Games use the entire screen to handle gestures, so again, it's expected that direct touch will put all gestures into the target element, no matter where the gestures performed on the screen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature may add new concept(s) to ARIA which will require implementations or APG changes NeedsExplainer In order to progress this a more detailed explainer needs to be created
Projects
None yet
Development

No branches or pull requests