Prototype software using a new generation of hand tracking technology is showing promise for facilitating some forms of sign language learning and communication, regardless of physical distance.
Methods to track the movements of hands and fingers have been pursued for decades and the technology is seen as one of the most challenging problems to solve when it comes to consumer VR. The barriers to reliably, and affordably, translate these movements is staggering given the number of potential configurations for your digits and environmental conditions that could wreak havoc on hardware systems.
Recent implementations like Ultraleap (previously known as Leap Motion), HoloLens, and the Valve Index VR controllers recognize some of the movement of all five fingers with hardware priced to appeal to businesses or consumers. There’s also a supportive community of signers in VRChat who’ve adapted various handshapes to work around the grip of a held object.
Oculus Quest Hand Tracking ASL Test
The latest tests come by way of Facebook’s Oculus Quest, which started accepting store submissions for hand tracking apps at the end of May 2020. The standalone headset doesn’t require any external hardware to work, though that wasn’t entirely true when it originally launched in May 2019. That’s because it required a pair of Oculus Touch controllers held in each hand for input. Now, the headset can be operated with gestures like pinching and pointing with a handful of apps available for purchase using the bare-handed input system. Also, a free app was just made available on SideQuest designed to teach up to 23 handshapes from the German Sign Language’s manual alphabet.
Over the last two weeks I also used an experimental piece of software made by Daniel Beauchamp (aka @pushmatrix) to play rock paper scissors and explore the basics of social expression possible with my hands represented in VR. As soon as the video showing our initial interview hit the Internet, a number of people started asking about whether it might be useful for signed language conversations. My network of contacts put me in touch with a number of helpful folks, including three who met up in VR this week to test and offer feedback on the current state and future potential of the feature.
All three participants know American Sign Language and one is deaf. The deaf participant is Christopher Roe and he’s represented as a green sphere in the video below. He explained that he hasn’t been in close contact with other deaf people for many years so wasn’t recently practiced in ASL.
Based on the test, a number of severe limitations to the current state of the technology on Oculus Quest are clear. Hands in front of one another can block the view of the cameras on the headset used to track the hands, resulting in tracking loss or misrepresentation. Ambient lighting can affect the tracking quality, fingers on one hand can’t cross one another and a number of fundamental handshapes used to represent letters like P, Q, K, M, N, E were hard to sign or distinguish.
Roe expressed some of his feelings following the virtual meetup. He sent me the following comment:
“Even as rough as it was, it’s awesome. Sure, you could have some kind of group video chat or FaceTime dealie instead, but what VR brings is a sense of actual proximity and presence. That’s something I remember missing a lot in my youth, because I went to a residential school for the deaf in Riverside, CA.
“The student body there came from all over South California, and frequently, students are the only deaf people from their respective hometowns or only know a few deaf people in their hometowns. So, while you’re at school, you’ve got all these friends you see 5 days a week for 16 hours a day, and then you go home…and it’s mostly just family/hearing people/the couple other deaf people you may or may not know in your hometown. I hated weekends, they were boring and lonely and I couldn’t wait for the Sunday bus back to school so I could once again be around people I could actually communicate with and who understood me as a fellow deaf person rather than some broken handicapped object of pity.
“When I graduated, I had a moment of severe anxiety because where do you go from there? There weren’t any prospects for me in my hometown of Desert Hot Springs, and I certainly didn’t know any other deaf people there. I ended up moving to Manhattan Beach and later Torrance to be closer to the friends I’d made in school. This was in the ’90s, cellphones and FaceTime weren’t a thing.
“A fully functioning VR sign language chat system would make the world much smaller and far more comfortable for a lot of deaf people who grew up under similar circumstances. They’d get the feeling of being WITH people, not just signing at a Brady Bunch grid of choppy webcam streams on a tiny screen. Throw in customizable environments and stuff like that, and you’ve got a virtual party venue where deaf people can actually communicate as first citizens rather than struggling with awkward text inputs or being completely left out of spoken conversations because nobody else wants to mess with crappy virtual keyboards either.
“VRChat and Altspace make my communication anxiety peg HARD because I’m trapped in a world full of ambiguous avatars who I can’t lip read or sign to or easily communicate with, so it’s literally a personalized hell for me, and bringing sign language or even a better virtual keyboard that uses hand tracking would be a massive QOL improvement for deaf people in social VR apps.”
Another participant in the test, Shannon Putman, was able to recognize the third participant, Cy Wise, had learned to sign in Austin, Texas, spotting what was described as a kind of regional accent distinctive to the Austin signing community.
Over the course of their time in the software — spending less than an hour total — the participants started recognizing where the Quest’s outward-facing cameras would fall off in their ability to accurately recognize movements. This led to better expression by staying within those bounds. They also learned slowing down their signing could help with expression.
State of Research
The super simple avatar system in the informal test could convey some basic movements of the head — like nodding or shaking — with a shape for mouth movements driven by the volume of the voice speaking into the headset’s microphone. With no representation for the body, though, and no actual face tracking, there are severe limits set on large parts of signed language expression relying on those movements, “such as constructed actions, important grammar information in facial expressions and body movements, and the most unique aspect of signed languages, the spatial grammar,” explained PhD researcher Athena Willis.
A paper Willis shared with me from 2019’s ACM SIGACCESS Conference on Computers and Accessibility provides an interdisciplinary perspective on the challenges of sign language recognition, generation, and translation, pointing out that “in some cases the movement of the sign has a grammatical function. In particular, the direction of movement in verbs can indicate the subject and object of the sentence. Classifers represent classes of nouns and verbs – e.g., one handshape in ASL is used for vehicles, another for fat objects, and others for grabbing objects of particular shapes. The vehicle handshape could be combined with a swerving upward movement to mean a vehicle swerving uphill, or a jittery straight movement for driving over gravel. Replacing the handshape could indicate a person walking instead. These handshapes, movements, and locations are not reserved exclusively for classifers, and can appear in other signs. Recognition software must differentiate between such usages.”
When it comes to the consumerization of VR and AR technology, major platform companies have deep pockets with well-paid research teams working on a wide range of ideas from predictive keyboards that work by touching your thumb to different fingers, facial expression-sensing headset liners or cameras directed at the face, and hyper-realistic and full-body avatars.
Sony, for example, recently showed a hand tracking system with a lot of subtle expression seemingly conveyed.
Facebook also employs some of the world’s leaders in machine learning that could be critical to filling in gaps in tracking data allowing for better communication via hands even when faced with strict battery power constraints and varying environmental conditions.
In particular, Facebook recently leaked the existence of a prototype headset codenamed “Del Mar” that’s likely a next generation Quest. Indications found in code suggest it might have cameras that sample its surroundings at a higher rate than the current generation hardware. This could lead to better hand tracking. Infrared light emitted in a pattern toward an area directly in front of a headset wearer in future hardware might also enable more robust hand tracking absent visible light.
Tech giants are also competing to lower the cost, weight and power consumption of future VR and AR systems. Facebook, for example, shows reluctance to deploy features like eye tracking that could enable huge leaps in comfort and visual detail in VR, but which might not yet be reliable or fast enough to work all the time. At the same time, some of Facebook’s VR headsets like Oculus Go and Rift S feature lenses set at a fixed width that fit the average distance between people’s eyes, a limitation that may make the image blurry or less comfortable to people outside the average range.
So there are competing priorities weighing on researchers and designers building next generation VR (and AR) systems, and it is an open question whether robust expression for signing languages like ASL are near the top of those lists.
Is Sign Language A Priority At Major Tech Companies?
I reached out to Facebook to ask whether it is a goal to robustly support signing languages like ASL (there are hundreds of other signing languages according to the World Federation of the Deaf) in future headsets, or in upcoming software updates to the current Quest. I also asked whether they have any deaf employees working on the hand tracking team. Lastly, I asked if the team responsible for this feature is consulting with members of that community to consider better support for signing in VR and in Facebook’s forthcoming VR-based social network, Horizon.
Facebook sent over the following prepared statement in response:
“We believe that by enabling people to use their real hands in VR, hand tracking opens up many new capabilities and use cases. We’ve been glad to see how the community has been experimenting with hand tracking to prove out these new possibilities, including several interesting projects related to ASL or sign language…We have an Accessibility Task Force within AR/VR at Facebook that was created by individuals across the business who want to improve the accessibility of our products, software, and content while making them more inclusive of our community, including those who are hearing impaired. There are people with disabilities who participate in this taskforce. No particular product plans to share right now related to sign language, but as we’ve said, we’re very invested in driving new modes of input and social interaction in VR, so we think it’s an area worth exploring.”
While that statement seems generally pretty supportive of the efforts, and the comment is somewhat non-specific so as not to reveal future product plans, the phrasing “worth exploring” doesn’t establish the priority level Facebook is assessing sign language at with regard to future designs.
“When the movies made the switch from silent films to talkies, deaf community lost broad and convenient access to movie theaters until the recent decade,” Willis explained in a direct message. “Now a new generation of technology, especially virtual agents and smart home agents, are threatening to leave signers behind again.”
Beauchamp plans to make the software he made for this test part of a sample scene for Normcore, the networking software development kit he used to build the project for Oculus Quest.