Users spontaneously suggested gesture controls and a 
remote control built into a smartphone app could give 
immediate and familiar controls to smartphone users. 
Since  the  gateway  could  accept  multiple  input 
mechanisms for a media player a smartphone app 
receiving gesture controls could connect at the same 
time as Alexa. This would enable media discovery 
commands  (such  as  “Alexa,  Ask  ImAc  to  list 
content”) which are suited to a voice control interface 
to go through Alexa and media control inputs (such 
as play, pause, skip forward, skip backward) to come 
from a smartphone app or remote control. 
Using  a  gateway  architecture  enables  an 
abstraction  layer  to  allow  personalised  control 
interfaces or access technology to be used without the 
media  player  needing  to  be  aware  of  the  precise 
control mechanism. This includes but is not limited to 
joystick  control,  sip-and-puff  systems,  eye  gaze, 
single-button  interfaces,  sign  language  or  basic 
manual gestures or even EEGs (brainwave detection). 
It  also  enables  future  technologies  to  be  developed 
and  used  to  control  a  media  player  which  has  no 
knowledge of them.  
This abstraction layer also provides other options. 
Compound  controls  could  use  a  single  input 
trigger from the  user to command multiple devices. 
This  could  mean  when  the  main  content  is  played 
lights  are  dimmed,  phones  put  on  mute  and  access 
services  could  be  downloaded  and  streamed 
synchronously from a companion device.  
Machine  to  machine  interactions  could  allow 
trusted devices such as phones or doorbells to pause 
the main content.  
Interpreted  commands  could  enable  people  to 
watch content personalised to them. Access services 
(such as subtitles or AD) could be always enabled or 
disabled depending on the user, access controls (such 
as  age  restrictions)  could  be  put  in  place  or  which 
device the output is shown on could depend on which 
is  closest  or  the  user  preference  (TV,  tablet  or  VR 
headset). 
The  idea  of  such  an  abstraction  layer  is  in  line 
with work being done at the W3C to enable a Web of 
Things (WOT)
13
. A Thing Description (TD)
14
 which 
detailed  the  API  of  the  media  player  would  be 
published  either  by  the  media  player  or  by  the 
gateway  on  its  behalf.  Authenticated  WOT  aware 
controllers  or  device  chains  could  then  send 
commands to the media player via the gateway. The 
gateway and media player do not need to know where 
the  command  originated  or  how,  only  that  the 
command is valid and authorised. 
                                                           
13
 https://www.w3.org/WoT/ 
ACKNOWLEDGEMENTS 
This work  has  been  conducted as  part  of  the ImAc 
project,  which  has  received  funding  from  the 
European  Union’s  Horizon  2020  research  and 
innovation  programme  under  grant  agreement 
761974. 
REFERENCES 
Agulló B, Montagud M, Fraile I (2019). Making interaction 
with  virtual  reality  accessible:  rendering  and  guiding 
methods for subtitles. AI EDAM (Artificial Intelligence 
for Engineering Design, Analysis and Manufacturing) 
doi: 10.1017/S0890060419000362 
Agulló B, Matamala A (2019) The challenge of subtitling 
for  the  deaf  and  hard-of-hearing  in  immersive 
environments: results from a focus group. The Journal 
of  Specialised  Translation  32,  217–235 
http://www.jostrans.org/issue32/art_agullo.php  
Greco,  G.  (2016).  “On  Accessibility  as  a  Human  Right, 
with  an  Application  to  Media  Accessibility.”  Anna 
Matamala  and  Pilar  Orero  (eds)  (2016).  Researching 
Audio Description New Approaches. London: Palgrave 
Macmillan, 11-33. 
Hughes,  CJ,  M.  Montagud,  Peter  tho  Pesch.(2019) 
“Disruptive  Approaches  for  Subtitling  in  Immersive 
Environments.”  Proceedings  of  the  2019  ACM 
International Conference on Interactive Experiences for 
TV  and  Online  Video  –  TVX  ’19. 
10.1145/3317697.3325123 
Kelly, J.E. and Hamm, S. ( 2013). Smart Machines: IBM's 
Watson and the Era of Cognitive Computing. Columbia 
Business School Publishing 
Lamere,  P.,  Kwok,  P.,  Gouvêa,  E.,Raj,  B.,  Singh,  R. 
Walker,  W.,  Warmuth,  M.  and  Wolf,  P.  (2003)  “The 
CMU SPHINX-4 Speech Recognition System" 
Montagud,  M.,  I.  Fraile,  E.  Meyerson,  M.  Genís,  and  S. 
Fernández  (2019).  “ImAc  Player:  Enabling  a 
Personalized  Consumption  of  Accessible  Immersive 
Content“. ACM TVX 2019, June, Manchester (UK) 
Montagud, M., Orero, P. and Matamala, A. (2020) "Culture 
4 all: accessibility-enabled cultural experiences through 
immersive  VR360  content".  Personal  and  Ubiquitous 
Computing: 1-19 
Remael,  A.,  Orero,  P.  and  Mary  C.  (eds)  (2014). 
Audiovisual Translation and Media Accessibility at the 
Crossroads. Amsterdam/New York: Rodopi. 
Romero-Fresco, P. (2013) “Accessible filmmaking: Joining 
the dots between audiovisual translation, accessibility 
and  filmmaking.”  The  Journal  of  the  Specialised 
Translation 20, 201-223. 
14
 https://www.w3.org/TR/wot-thing-description/