MANAGING MULTIPLE MEDIA STREAMS IN HTML5

The IEEE 1599-2008 Case Study

S. Baldan, L. A. Ludovico and D. A. Mauro

LIM - Laboratorio di Informatica Musicale, Dipartimento di Informatica e comunicazione (DICo)

Universit´a degli Studi di Milano, Via Comelico 39/41, I-20135 Milan, Italy

Keywords:

HTML5, Web player, Media streaming, IEEE 1599.

Abstract:

This paper deals with the problem of managing multiple multimedia streams in a Web environment. Multime-

dia types to support are pure audio, video with no sound, and audio/video. Data streams refer to the same event

or performance, consequently they both have and should maintain mutual synchronization. Besides, a Web

player should be able to play different multimedia streams simultaneously, as well as to switch from one to

another in real time. The clarifying example of a music piece encoded in IEEE 1599 format will be presented

as a case study.

1 INTRODUCTION

The problem of managing digital multimedia docu-

ments including audio and video in a Web environ-

ment is still a challenging matter. Often this problem

is solved by requiring the installation of extensions

and third-party plugins into Web browsers. As ex-

plained in Section 2, the draft of HTML5 tries to face

the problem by adopting a multimedia-oriented archi-

tecture based on ad hoc tags and APIs. It is worth

recalling that currently HTML5 is not a standard but

a draft, however it is under development by a W3C

working group and the most recent versions of the

main browsers available on the marketplace already

support its key features. Among others, we can cite

Google Chrome, Mozilla Firefox 4, Opera 11, and

Microsoft Internet Explorer 9.

HTML5 provides Web designers with some syn-

tactical instruments to support multimedia streams

natively. An interesting aspect that has not been

deeply explored yet is the simultaneous transmission

over the Web of a number of multimedia streams,

all related to the same object and having constraints

of mutual synchronization. On one hand, traditional

players for audio/video streams implemented through

HTML5 are already available, as the environment

offered by this (future) standard makes this activity

easy. On the other hand, the problem analyzed here

goes one step beyond, as the multimedia player has to

support multiple synchronized streams and it has to

implement advanced seek functions.

2 RELATED WORKS

In this section some related works based both on

HTML 5 or other platforms are presented. Consider-

ing the relative novelty of HTML5, most approaches

focus on the use of competing technologies such as

Adobe Flash and Microsoft Silverlight platforms.

• In (Daoust et al., 2010) the authors ﬁrst ana-

lyze the new possibilities opened by <video>

tag, then they focus on the different strategies for

streaming media on the Web, such as HTTP Pro-

gressive Delivery, HTTP Streaming, and HTTP

Adaptive Streaming;

• (Harjono et al., 2010) studies the matter in depth,

pointing out how open standards can reduce the

dependencyon third party products such as Adobe

Flash or Microsoft Silverlight;

• MMP Video Editor is a browser-based video edi-

tor that consumes all Silverlight supported codecs

and produces a project ﬁle that can be transformed

into an EDL (Edit Decision List) or into a Smooth

Streaming Composite Stream Manifest (CSM). It

relies on Microsoft components such as IIS (In-

ternet Information Services) and Windows Media

Services in order to stream media. For instance,

this software framework was used in an initiative

where videos of theatrical performances are pre-

sented in a synchronized fashion with related ma-

terials (Barat´e et al., 2011);

• (Faulkner et al., 2010) presents a survey on Web

193

Baldan S., A. Ludovico L. and A. Mauro D..

MANAGING MULTIPLE MEDIA STREAMS IN HTML5 - The IEEE 1599-2008 Case Study.

DOI: 10.5220/0003651401930199

In Proceedings of the International Conference on Signal Processing and Multimedia Applications (SIGMAP-2011), pages 193-199

ISBN: 978-989-8425-72-0

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

video editors, comparing in particular EasyClip to

Youtube Video Editor;

• (Vaughan-Nichols, 2010) is a general introduction

to the new possibilities offered by HTML5;

• In (Laiola Guimar˜aes et al., 2010) the main con-

cern is the annotation within videos. For our

goals, this feature can be used for the presentation

of lyrics synchronized with audio/video contents.

The paper compares the possibilities of HTML5

with respect to NCL and SMIL;

• (Pfeiffer and Parker, 2009) poses the accent on the

accessibility of tag <video>;

What clearly emerges from this overview is the

absence of scientiﬁc works focusing on multiple me-

dia streams, e.g. the simultaneous presence of an au-

dio and a video track to be synchronized within a me-

dia player as well as the management of a number of

alternative media.

3 THE IEEE 1599 FORMAT

IEEE 1599-2008 is a format to describe single music

pieces. For example, an IEEE 1599 document can

be related to a pop song, to an operatic aria, or to a

movement of a symphony.

Based on XML (eXtensible Markup Language),

it follows the guidelines of IEEE P1599, “Recom-

mended Practice Dealing With Applications and Rep-

resentations of Symbolic Music Information Using

the XML Language”. This IEEE standard has been

sponsored by the Computer Society Standards Activ-

ity Board and it was launched by the Technical Com-

mittee on Computer Generated Music (IEEE CS TC

on CGM) (Baggi, 1995).

The innovative contribution of the format is pro-

viding a comprehensive description of music and

music-related materials within a unique framework.

In fact, the symbolic score - intended here as a se-

quence of music symbols - is only one of the many

descriptions that can be provided for a piece. For in-

stance, all the graphical and audio instances (scores

and performances) available for a given piece are fur-

ther descriptions; but also text elements (e.g. cata-

logue metadata, lyrics, etc.), still images (e.g. photos,

playbills, etc.), and moving images (e.g. video clips,

movies with a soundtrack, etc.) can be related to the

piece itself. Please refer to (Haus and Longari, 2005)

for a complete treatment of the subject. Among other

applications, such a rich description allows the design

and implementation of advanced browsers.

Figure 1: The XML stub corresponding to the IEEE 1599

multi-layer structure.

3.1 Deﬁnition of Music Event

The mentioned comprehensiveness in music descrip-

tion is realized in IEEE 1599 through a multi-layer

environment. The XML format provides a set of rules

to create strongly structured documents. IEEE 1599

implements this characteristic by arranging music and

music-related contents within six layers (Ludovico,

2008):

• General - music-related metadata, i.e. catalogue

information about the piece;

• Logic - the logical description of score in terms of

symbols;

• Structural - identiﬁcation of music objects and

their mutual relationships;

• Notational - graphical representations of the

score;

• Performance - computer-based descriptions and

executions of music according to performance

languages;

• Audio - digital or digitized recordings of the piece.

In IEEE 1599 code, this 6-layers layout corresponds

to the one shown in Figure 1, where the root element

IEEE 1599

presents 6 sub-elements.

Since contents are displaced over various levels,

what is the device that keeps heterogeneous descrip-

tions together and allows to jump from one descrip-

tion to another? The Logic layer contains an ad hoc

data structure that answers the question. When a user

encodes a piece in IEEE 1599 format, he/she must

specify a list of music events to be organized in a lin-

ear structure called “spine”. Please refer to Figure 2

for a simpliﬁed example of spine. Inside this struc-

ture, music events are uniquely identiﬁed by the

attribute, and located in space and time dimensions

through

hpos

and

timing

attributes respectively.

Each event is “spaced” from the previous one in

a relative way. In other words, a 0 value means si-

SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications

194

multaneity in time and vertical overlapping in space,

whereas a double value means a double distance from

the previous music event with respect to a virtual unit.

The measurement units are intentionally unspeciﬁed,

as the logical values expressed in spine for time and

space can correspond to many different absolute val-

ues in the digital objects available for the piece.

Let us consider the example shown in Figure 2, in-

terpreting it as a music composition. Event

forms

a chord together with

, belonging either to the same

or to another part/voice, as the attributes’ values of the

former are 0s. Similarly, we can afﬁrm that event

happens after

(and

), as

occurs after 2 time

units whereas

(and

) occurs after only 1 time

unit. For further details please refer to the ofﬁcial

document about IEEE 1599 standard (IEEE TCCGM,

2008).

In conclusion, the role of the structure known as

spine is central for an IEEE 1599 encoding: it pro-

vides a complete and sorted list of events which will

be described in their heterogeneous meanings and

forms inside other layers. Please note that only a

correct identiﬁcation inside spine structure allows an

event to be described elsewhere in the document, and

this is realized through references from other layers to

its unique

(see Section 3.2). Inside the spine struc-

ture only the entities of some interest for the encoding

have to be identiﬁed and sorted, ranging from a very

high to a low degree of abstraction.

In the context of music encoding in IEEE 1599,

how can be a music event deﬁned from a semantic

point of view? One of the most relevant aspects of

the format, which confers both descriptive power and

ﬂexibility, consists in the loose but versatile deﬁni-

tion of event. In the music ﬁeld, which is the typi-

cal context where the format is used, a music event

is a clearly recognizable music entity, characterized

by well-deﬁned features, which presents aspects of

interest for the author of the encoding. This deﬁni-

tion is intentionally vague in order to embrace a wide

range of situations. A common case is represented by

a score where each note and rest are considered music

events. The corresponding spine will list such events

by as many XML sub-elements (also referred as spine

events).

However, the interpretation of the concept of mu-

sic event can be relaxed. A music event could be the

occurrence of a new chord or tonal area, in order to

describe only the harmonic path of a piece instead of

its complete score, made of notes and rests.

Figure 2: An example of simpliﬁed spine.

3.2 Events in a Multi-layer

Environment

After introducing the concept of spine event, and after

the creation of the spine structure, events are ready to

be described in the multi-layer environment provided

by IEEE 1599.

This section will show that the concept of hetero-

geneous description is implemented in IEEE 1599 by

heterogeneous descriptions of each event contained in

spine. As stated in Section 3, the format includes six

layers. While heterogeneity is supported by the vari-

ety of layers, inside each layer we ﬁnd homogeneous

contents, namely contents of the same type. The Au-

dio layer, for example, can link n different perfor-

mances of the same piece. In order to obtain a valid

IEEE 1599 document, not all the layers must be ﬁlled;

however their presence provides richness to the de-

scription.

In musical terms, the layer-based mechanism al-

lows heterogeneous descriptions of the same piece.

For a composition, not only its logical score, but also

the corresponding music sheets, performances, etc.

can be described. In this context instead, heterogene-

ity is employed in order to provide a wide range of au-

dio descriptions of the same environment. This con-

cept will become clear in the following.

Now let us focus on the presence and meaning of

events inside each layer. The General layer contains

mainly catalogue metadata that are not referable to

single music events (e.g. title, authors, genre, and so

on).

The Logic layer, which is the core of the format,

provides music description from a symbolic point of

view: it contains both the spine, i.e. the main time-

space construct aimed at the localization and synchro-

nization of events, and the symbolic score in terms of

pitches, durations, etc.

MANAGING MULTIPLE MEDIA STREAMS IN HTML5 - The IEEE 1599-2008 Case Study

195

Originally, the Structural layer was designed to

contain the description of music objects and their

causal relationships, from both the compositional and

the musicological point of view. This layer is aimed at

the identiﬁcation of music objects as aggregations of

music events and it deﬁnes how music objects can be

described as a transformation of previously described

objects. As usual, eventlocalization in time and space

is realized through spine references.

For the remaining layers, the meaning of events is

more straightforward. The Notational layer describes

and links the graphical implementations of the logic

score, where music events - identiﬁed by their spine

id - are located on digital objects by absolute space

units (e.g. points, pixels, millimeters, etc.). In the case

of environmental sounds, the places where they are

recorded can be identiﬁed over a map. These maps

can be the counterpart of the graphical scores as re-

gards our work.

The Performance layer is devoted to computer-

based performances of a piece, typically in sub-

symbolic formats such as Csound, MIDI, and

SASL/SAOL. This layer is not used for our goals.

In the Audio layer events are described and linked

to audio digital objects. Multiple audio tracks and

video clips, in a number of different formats, are sup-

ported. The device used to map audio events is based

on absolute timing values expressed in milliseconds,

frames, or other time units.

Finally, let us concentrate layer by layer on the

cardinalities supported for events. In the Logic/Spine

sub-layer, the cardinality is 1 - namely the presence

is strictly required - as all the events must be listed

within the spine structure. In the Audio layer, on the

contrary, the cardinality is [0..n] as the layer itself can

be empty (0 occurrences), it can encode one or more

partial tracks where the event is not present (0 oc-

currences), it can link a complete track without rep-

etitions (1 occurrence), a complete track with repeti-

tions (n occurrences), and ﬁnally a number of differ-

ent tracks with or without repetitions (n occurrences).

Similarly, the Notational layer supports [0..n] occur-

rences.

In the case study described in this paper, we will

concentrate on an IEEE 1599 document having mul-

tiple videos, namely multiple instances inside the Au-

dio layer.

4 THE IEEE 1599 WEB PLAYER

The IEEE 1599 format has been already used with

success to synchronize multiple and heterogeneous

media streams within domain-speciﬁc applications, in

an ofﬂine scenario. In this chapter we will see how

Javascript and HTML5 together (especially the new

audio and video APIs) can be used to build a stream-

ing Web player, which is able to support advanced

multimedia synchronization features in a Web envi-

ronment, running inside a common browser.

4.1 Preparing IEEE 1599 for the Web

The IEEE 1599 standard supports a huge variety of

container formats and codecs for its media streams,

greater than what can be handled by any HTML5

browser implementation. An almost complete list

of what is currently supported by the most popular

browsers can be found in (Pilgrim, 2010). For this

reason, audio and video streams referred inside an

IEEE 1599 ﬁle must be ﬁrst converted into an appro-

priate format to be consumed by browsers.

For this particular case study a Python script has

been written to load the IEEE 1599 ﬁle inside a DOM

(Document Object Model). It looks for any refer-

ence to audio or video ﬁles and automatically con-

verts them into Theora video and Vorbis audio format

inside an Ogg container. This particular combination

has been chosen because its components are all open

standards and also because it is well supported inside

browsers such as Mozilla Firefox, Google Chrome

and Opera. Other combinations may be supported

in the next future to extend compatibility to a wider

range of browsers.

Other related media are not converted, because

they mainly consist of images (e.g. booklets, music

sheets, etc.) which have been supported by almost ev-

ery browser for quite a long time. All media ﬁles are

then organized inside a uniform folder structure and

their references inside IEEE 1599 document are up-

dated. Broken references are deleted, as well as the

parts of the IEEE 1599 speciﬁcation which are not

useful for this application.

IEEE 1599 ﬁles can be huge: most of them are

several megabytes in size and contain thousands lines

of XML tags. This is a problem for a streaming player

both if we parse the whole document at once and store

it inside a DOM and if we use event-driven parsing

techniques such as SAX: in the former case we have

to download the whole IEEE 1599 ﬁle, causing an an-

noying delay before actually playing something; in

the latter we can stream the XML together with me-

dia, but this adds a big amount of overhead and re-

duces considerably the bandwidth available to actual

contents.

Compression can help us solve the problem. XML

is an extremely verbose and redundant text-based for-

mat, so dictionary-based compression works great

SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications

196

on this kind of information (usually more than 95%

compression rate). Every modern browser support-

ing HTML5 also supports HTTP compression, so ex-

changing compressed data needs little effort. During

the preparation phase explained above, IEEE 1599

ﬁles are gzip-compressed and stored with a custom

ﬁle extension (.xgz). The web server is conﬁgured to

associate the right MIME-type (text/xml) and content

encoding (x-gzip) to that ﬁle extension. When a client

requests an IEEE 1599 ﬁle, the server HTTP response

instructs the browser to enable its HTTP compression

features in order to decompress the ﬁle before using

it. In this way all the advantages of DOM parsing can

be exploited and no overhead is added to the stream,

at the cost of a little initial delay.

4.2 Infrastructure and Application

Design

After preparing the material to make it usable by

a HTML5 browser, let us choose how to make it

available over the net. There are two main alterna-

tives: either using a common Web server and adopt-

ing the progressive download approach, like almost

every “streaming” player for the Web, or setting up

a full ﬂedged streaming server. While the latter op-

tion permits the use of protocols speciﬁcally designed

for streaming (like RTP and RTSP) and may therefore

be more ﬂexible, our choice has fallen on the former

one because it is effective for our purposes, easier to

implement and widely used in similar application do-

mains. Moreover, HTTP/TCP trafﬁc is usually bet-

ter accepted by the most common ﬁrewall conﬁgu-

rations, and less subject to NAT traversal problems.

Finally, Web users seem to be less annoyed by some

little pauses during the playback rather than by quality

degradation or loss of information.

On the client side, the fundamental choice is what

media streams to request, when to request them and

how to manage them without clogging the wire or the

buffer. Three possible cases have been studied:

• One Stream at a Time. Among all the available

contents, just the stream currently chosen by the

user for watching or listening is requested and

buffered. When another stream is selected, the

audio (or video) buffer is emptied and the new

stream is loaded. The main advantage of this solu-

tion is that only useful data are sent on the wire: at

every time, the user receives just the stream he/she

requested. The principal drawback, on the other

side, is that every time the user decides to watch or

listen to other media streams, he/she has to wait a

considerable amount of time for the new contents.

• All the Streams at the Same Time. All the available

contents are requested by the client and sent over

the net. This approach drastically reduces delays

when jumping from one media stream to another,

at the cost of a huge waste of bandwidth caused

by the dispatch of unwanted streams. In order to

reduce network trafﬁc, contents which are not cur-

rently selected by the user may be sent in a low-

quality version, and upgraded to full quality only

when selected. With this approach, a smooth tran-

sition occurs: when the user selects a new media

stream, the client instantly plays the degraded ver-

sion, and switches to full quality as soon as possi-

ble, namely when the buffer is sufﬁciently full.

• Custom Packetized Streams. Borrowing some

principles from the piggyback forward error cor-

rection technique (Perkins et al., 1998), streams

can be served all together inside a single packet,

containing the active streams in full quality and

the inactive ones in low quality. This implies the

existence of a “smart” server, which does all the

synchronization and packing work, and a “dumb”

client which does not even need to know anything

about IEEE 1599 and its structure.

Even if the last option presents a certain interest, it

requires a custom server-side application and burdens

the server with lots of computation for each client.

In this paper we will focus on the ﬁrst two scenar-

ios, using the ﬁrst (which is also the simplest) to draw

our attention on the synchronization aspects, evolving

then to the second to support multiple media streams

simultaneously.

4.3 Audio and Video Synchronization

One of the key features of the IEEE 1599 format is the

description of information which can be used to syn-

chronize otherwise asynchronous and heterogeneous

media. As presented in Section 3.1, every musical

event of a certain interest (notes, time/clef/key sig-

nature changes etc.) should have its own unique id

inside the spine. Those identiﬁers can be used to ref-

erence the occurrence of a particular event inside the

various resources available for the piece: the area cor-

responding to a note inside an image of the music

sheet, a word in a text ﬁle representing the lyrics, a

particular frame of a video capturing the performance,

a given instant in an audio ﬁle, and so on.

For the IEEE 1599 streaming Web player, the Au-

dio layer is the most interesting. Each related audio

or video stream is represented by the tag <track>,

whose attributes give information about its URI and

encoding format. Inside each <track> there are many

<track event> tags, which are the actual references

MANAGING MULTIPLE MEDIA STREAMS IN HTML5 - The IEEE 1599-2008 Case Study

197

to the events in the spine. Every <track event> has

two main attributes: the unique id of the related event

in the spine, and the time (usually expressed in sec-

onds) at which that event occurs inside the audio or

video stream. A known limitation of the application

is the following: each event must be referenced inside

each track, even if it does not actually take place in

that track, otherwise the synchronization mechanism

will not work properly (as explained later). For exam-

ple, the piano reduction track of a symphonic piece

should contain the references to all the events related

to the orchestra, even if they do not actually happen in

that track. A check can be enforced server-side during

the preparation step in order to guarantee this condi-

tion.

The HTML5 audio and video API already offers

all the features needed to implement a synchronized

system: media ﬁles can be play/paused and sought.

On certain implementations (later versions of Google

Chrome, for example) there is even the possibility

to modify the playback rate, thus slowing down or

speeding up the track performance. It is sufﬁcient use

the information stored inside the IEEE 1599 ﬁle to

drive via JavaScript the audio and video objects in-

side the Web page.

The currently active media stream is used as the

master timing source: while it is playing, the client

constantly keeps trace of the last event occurred inside

the track. When another stream is selected the client

loads it, looks for the event of the new track with the

same id as the last one occurred, then uses the time

encoded for this event to seek the new stream. Conse-

quently, the execution resumes exactly from the same

logical point where we left it, and media synchroniza-

tion is achieved. Since event identiﬁers are used as a

reference to jump to the exact instant inside the new

stream, it should now be clear why all the events re-

lated to the whole piece should be present inside each

track. In other case, we could fall back to the pre-

vious (or next) event in common to both tracks, po-

tentially very distant from the exact synchronization

point. Please note that such a common event could

not exist.

As stated before the event list could be large, so

we have to use efﬁcient algorithms to navigate it and

ﬁnd the events we need in a small time amount. De-

pending on the situation, certain techniques may be

more convenient than others:

• During continuous playback of a single stream, a

linear search over the event list is maybe the most

appropriate for keeping trace of occurring events.

Of course, the list has to be sorted by the occurring

instant of each event;

• When the user seeks, movements inside the

stream become non-linear, thus making linear

search inefﬁcient. In this case, a binary search

is preferable. It is rare to ﬁnd an event exactly

at the seeking position, so event search must be

approximated to the event immediately before (or

immediately after) the current instant;

• When changing media streams, events are

searched by their unique identiﬁer. Therefore, a

dictionary using it as its key becomes really use-

ful.

After the IEEE 1599 ﬁle has been downloaded and

stored as a DOM by the player, an event list is pop-

ulated for each track and quick-sorted by occurrence

time. Then, for every list, an associative array is used

to bind the index of each event to its unique identi-

ﬁer. This data structure enables the use of all the three

search techniques mentioned above, thus making the

retrieval of events efﬁcient in every possible case.

4.4 Handling Simultaneous Media

The principles explained in the previous subsection

can also be applied when handling and playing multi-

ple media streams at a time. In particular, we are re-

ferring to a number of video or audio tracks to be per-

formed at the same time. In this case, the track which

acts as master will be played smoothly and without

interruption, while every other slave track will be pe-

riodically sought and/or stretched to keep it synchro-

nized with the master. The choice of the master track

can deeply inﬂuence the quality of the playback. In

particular, it is well known that human perception is

more disturbed by discontinuities in acoustic rather

than in visual information. If both of them are present

at the same time, it is better to choose one of the audio

tracks as master.

Resynchronization of slave tracks can happen at

ﬁxed intervals or when needed. In the former case a

timer is set and all the slave tracks are re-aligned when

it expires, following the same procedure explained

above for changing media streams. In the latter case,

whenever an event occurs in the master track, the cor-

responding event is searched in all the slave tracks.

Repositioning of a slave track occurs when the abso-

lute difference between its playing position and the

occurring time of the related event is greater than a

certain given threshold.

SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications

198

5 CONCLUSIONS AND FUTURE

WORKS

In this paper we have illustrated an application to

manage multiple media streams through HTML5 in

a synchronized way. The IEEE 1599 format rep-

resented for us a test case with demanding require-

ments, such as the presence of heterogeneous mate-

rials to keep mutually synchronized. For this kind

of applications, our approach based on HTML5 and

JavaScript has proved to be effective.

As regards future works, multiple media repro-

duction in the IEEE 1599 Web player is not supported

yet. An experiment of ﬁxed interval repositioning has

been made, choosing a period of 2 seconds. While the

results of resynchronization of video over a continu-

ous and smooth audio playback are quite acceptable,

the audio over audio case produced glitches which

are annoying to listen. This problem could be re-

duced by using the volume control in the HTML5 au-

dio and video API in order to rapidly fade out just

before a repositioning and fade in again immediately

after. Another improvement may be obtained by dy-

namically changing the playback rate of the slaves

(through the implementations that support this fea-

ture). Playback rate can be estimated by comparing

the time difference between the next and last event in

the master with the time difference between the two

corresponding events in the slave.

Other future works may include the implementa-

tion of further features like synchronizing and high-

lighting graphic elements (musical sheets, lyrics etc.),

or using information of the General layer for music

information retrieval and linking purposes.

ACKNOWLEDGEMENTS

The authors wish to acknowledge the partial support

of this work by the International Scientiﬁc Cooper-

ation project EMIPIU (Enhanced Music Interactive

Platform for Internet User).

REFERENCES

Baggi, D. L. (1995). Technical committee on computer-

generated music. TC, 714:821–4641.

Barat´e, A., Haus, G., Ludovico, L. A., and Mauro, D. A.

(2011). A web-oriented multi-layer model to interact

with theatrical performances. In International Work-

shop on Multimedia for Cultural Heritage (MM4CH

2011), Modena, Italy.

Daoust, F., Hoschka, P., Patrikakis, C., Cruz, R., Nunes,

M., and Osborne, D. (2010). Towards video on the

web with HTML5. NEM Summit 2010.

Faulkner, A., Lewis, A., Meyer, M., and Merchant, N.

(2010). Creating an HTML5 web based video editor

for the general user. CGT 411, Group 13.

Harjono, J., Ng, G., Kong, D., and Lo, J. (2010). Build-

ing smarter web applications with HTML5. In Pro-

ceedings of the 2010 Conference of the Center for

Advanced Studies on Collaborative Research, pages

402–403. ACM.

Haus, G. and Longari, M. (2005). A multi-layered, time-

based music description approach based on XML.

Computer Music Journal, 29(1):70–85.

Laiola Guimar˜aes, R., Cesar, P., and Bulterman, D. (2010).

Creating and sharing personalized time-based annota-

tions of videos on the web. In Proceedings of the 10th

ACM symposium on Document engineering, pages

27–36. ACM.

Ludovico, L. A. (2008). Key concepts of the IEEE 1599

standard. In Proceedings of the IEEE CS Conference

The Use of Symbols To Represent Music And Multime-

dia Objects, IEEE CS, Lugano, Switzerland.

Perkins, C., Hodson, O., and Hardman, V. (1998). A sur-

vey of packet loss recovery techniques for streaming

audio. Network, IEEE, 12(5):40–48.

Pfeiffer, S. and Parker, C. (2009). Accessibility for the

HTML5 <video> element. In Proceedings of the

2009 International Cross-Disciplinary Conference on

Web Accessibililty (W4A), pages 98–100. ACM.

Pilgrim, M. (2010). HTML5: Up and running. Oreilly &

Associates Inc.

IEEE TCCGM (2008). IEEE Std 1599-2008 - IEEE Rec-

ommended Practice for Deﬁning a Commonly Accept-

able Musical Application Using XML. The Institute of

Electrical and Electronics Engineers, Inc.

Vaughan-Nichols, S. (2010). Will HTML 5 restandardize

the web? Computer, 43(4):13 –15.

MANAGING MULTIPLE MEDIA STREAMS IN HTML5 - The IEEE 1599-2008 Case Study

199