Multimodal Appearance-based Gaze-Controlled Virtual Keyboard with Synchronous-Asynchronous Interaction for Low-Resource Settings

Salvi, Manish

doi:10.1109/SMC58881.2025.11342549

Multimodal Appearance-based Gaze-Controlled Virtual Keyboard with Synchronous-Asynchronous Interaction for Low-Resource Settings

Source

IEEE International Conference on Systems, Man, and Cybernetics (SMC 2025)

ISSN

1062922X

Date Issued

2025-01-01

Author(s)

Meena, Yogesh Kumar

Salvi, Manish

DOI

10.1109/SMC58881.2025.11342549

Abstract

Over the past decade, the demand for communication devices has increased among individuals with mobility and speech impairments. Eye-gaze tracking has emerged as a promising solution for hands-free communication; however, traditional appearance-based interfaces often face challenges such as accuracy issues, involuntary eye movements, and difficulties with extensive command sets. This work presents a multimodal appearance-based gaze-controlled virtual keyboard that utilises deep learning in conjunction with standard camera hardware, incorporating both synchronous and asynchronous modes for command selection. The virtual keyboard application supports menu-based selection with nine commands, enabling users to spell and type up to 56 English characters - including uppercase and lowercase letters, punctuation, and a delete function for corrections. The proposed system was evaluated with twenty able-bodied participants who completed specially designed typing tasks using three input modalities: (i) a mouse, (ii) an eye-tracker, and (iii) an unmodified webcam. Typing performance was measured in terms of speed and information transfer rate (ITR) at both command and letter levels. Average typing speeds were 18.3±5.31 letters/min (mouse), 12.60±2.99 letters/min (eye-tracker, synchronous), 10.94±1.89 letters/min (webcam, synchronous), 11.15±2.90 letters/min (eye-tracker, asynchronous), and 7.86 ± 1.69 letters/min (webcam, asynchronous). ITRs were approximately 80.29±15.72 bits/min (command level) and 63.56±11 bits/min (letter level) with webcam in synchronous mode. The system demonstrated good usability and low workload with webcam input, highlighting its user-centred design and promise as an accessible communication tool in low-resource settings.

Unpaywall

URI

https://repository.iitgn.ac.in/handle/IITG2025/34944