HTML5, MICRODATA AND SCHEMA.ORG
Towards an Educational Social-semantic Web for the Rest of Us?
Lars Johnsen
Institute of Business Communication and Information Science, University of Southern Denmark
Engstien 1, 6000 Kolding, Denmark
Keywords: Metadata, HTML5, Microdata, Schema.org, The Educational Social-semantic Web.
Abstract: The aim of this short position paper is to argue that the combination of the HTML5 platform, embedded
microdata and schema.org vocabularies may pave the way for an educational social-semantic web, a
network of structured learning content and supporting compliant software to which “the rest of us” may
contribute.
1 THE EDUCATIONAL
SEMANTIC WEB
For some years now the vision of the educational
semantic web has been a popular topic in some parts
of the e-learning community. The educational
semantic web may be described as a subset of the
semantic web for educational purposes comprising a
web of structured data, rather than hyperlinked
documents, and services exposing these data to
various kinds of software such as search engines,
virtual agents, etc. The idea of a semantic web of
learning content and supporting software is
attractive to many educational technologists because
of its potential benefits: improved discoverability of
learning resources, re-use of learning objects, more
sophisticated content adaptation based on user
models and user actions, seamless data integration
across disparate platforms and systems, etc.
Until now, work in the field has mainly focused
on two areas:
The application of semantic technologies
like RDF, Topic Maps and Linked Data to
e-learning resources, tools and systems
The design, construction and publication of
learning metadata models, taxonomies and
ontologies such as LOM, SCORM or
ALOCOM
Although much useful work has been done and
considerable progress made, the vision of the
educational semantic web is still to a great extent
confined to academic environments. To the average
teacher or professor the idea of semantically
encoding his or her teaching materials and exposing
them via a web service to compliant software is still
somewhat alien. There are obviously many reasons
for this, but one key factor is, I believe, the
conceptual and technical barriers inherent in current
semantic web models, technologies and tools. This
in turn means that a critical mass of semantically
enriched learning content on the web, let alone an
educational semantic web, is nowhere near its
realization.
The aim of this short position paper is to argue
that the introduction of HTML5, microdata, a
standard for marking up structured data in HTML5
documents, and the launch of schema.org, a set of
general purpose vocabularies for semantic
annotation of web content, may well be just the
thing needed to kick-start developments towards an
educational semantic web for “the rest of us” – a
kind of social-semantic web to which the ordinary
teacher or professor can contribute.
2 HTML5, MICRODATA AND
SCHEMA.ORG
HTML5 will no doubt be the presentation format of
most web-based learning resources in the years to
come. This latest version of HTML has improved
functionality to embed multimedia objects like video
and audio files and natively supports various types
of interactivity, for example drag-and-drop actions.
101
Johnsen L..
HTML5, MICRODATA AND SCHEMA.ORG - Towards an Educational Social-semantic Web for the Rest of Us?.
DOI: 10.5220/0003895901010104
In Proceedings of the 4th International Conference on Computer Supported Education (CSEDU-2012), pages 101-104
ISBN: 978-989-8565-06-8
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
Likewise, it is likely to play a major role in m-
learning as it can be employed for the creation of
mobile web apps through the application of
Javascript libraries like jQuery Mobile. As the
format becomes accepted as the standard for user-
oriented web publishing, it will be underpinned by
an increasing number of development tools ranging
from simple text editors, many accessible online,
open source plug-ins facilitating design activities
such as canvas drawing to full-fledged development
environments aimed at widget construction or the
like.
To make possible simple forms of semantic
encoding within HTML5 documents, a syntax called
microdata has been developed and published
alongside HTML5. Called HTML5’s “best-kept
secret” by one blogger (Gilbertson, 2010), microdata
have been relatively unknown beyond web
developer circles until recently when Google,
Microsoft and Yahoo jointly announced schema.org,
a set of standardized categories and properties for
identifying and describing general purpose
semantics in web pages using microdata (events,
persons, organization, places, etc.). The aim of
schema.org is primarily to enhance the performance
and presentation capabilities of the three major
search engines, thereby hopefully improving user
experiences and satisfaction (and ultimately, no
doubt, increasing revenues).
Both the microdata standard and schema.org
have been met with a certain amount of criticism.
For example, it can be argued that microdata lack
the expressiveness and flexibility of RDFa, an
already existing standard for encoding meaning in
web documents and does not therefore constitute any
real added value in the context of semantic mark-up.
And as for schema.org, the entire project may in a
way be seen as a step away from open standards
with their insistence on implementation in open
forums and permanent availability. For instance, the
three companies in question can alter, delete parts
of, or altogether remove the documentation at
schema.org at any time if they choose to do so.
Nonetheless, the combination of microdata and
schema.org vocabularies have the potential, in my
view, to become the tool of the trade for many
learning content creators willing to add semantic
metadata to their web-based materials but reluctant
to delve into the finer details of specialized learning
metadata models often couched in slightly arcane
XML dialects. Here are some reasons why I think
this is so:
3 WHY USE MICRODATA?
Firstly, microdata (based on schema.org
vocabularies) are simple and relatively easily
learned. Microdata encode so-called items, entities
or objects, categorize them in one or more classes
and assign property values to them. The content of
an HTML5 element (section, paragraph, heading,
etc.) may thus be marked up to indicate that it deals
with, say, an item of the type “person” which has the
name property of “Shakespeare”:
<p itemscope
itemtype="http://schema.org/Person">
<span
itemprop="name">Shakespeare</span> was
born in …
</p>
Because microdata are embedded directly, but
unobtrusively, in HTML5 elements, they are also
accessible to local programming scripts and may in
this way be used for content adaptation or user
interaction purposes. A simple example would be
the visual foregrounding or extraction of all items of
a certain semantic type or the assignment of
behaviours to items with certain semantic properties.
But this is not all. Since semantic encoding is done
in a standardized way, useful scripts may be
developed, shared and employed on a global scale,
and, equally significantly, across disparate subject
matters, disciplines and subjects. For instance,
learning content developers embedding microdata
about places and locations in their materials might
be be able to download, or point to, scripts, such as
jQuery files, mapping these microdata to Google
Maps while authors textually describing subject
matter concepts and concept relations might be able
to utilize available plug-ins to visualize these as
concept maps.
Arguably, this is a somewhat novel way of
thinking about learning content metadata: Here
metadata is not conceived of as ancillary
information, detached from the content itself, but as
an integral part of its actual learning design,
possibly initially hidden from the user but ready to
be “activated” for specific communicative or
didactic purposes, such as catering for different
learning styles among users. In semiotic terms,
microdata may thus be characterized as embedded
resources or vehicles for making meaningful
changes in learning material. They may aid in
transforming learning content, i.e. making changes
in the same representational mode, say text, or
CSEDU2012-4thInternationalConferenceonComputerSupportedEducation
102
contribute to the transduction of learning content,
i.e. the transfer of material across representational
modes, say from text to visual (see Bezemer &
Kress, 2008). Metaphorically speaking, microdata
may act as “semiotic enzymes” in learning designs.
Secondly, microdata allow authors to mark up
digital and non-digital entities alike. This means that
a teacher or professor may not only attach metadata
to learning objects such as videos, graphics and
images providing information about their production
history, copyright, intended audience or context of
use, etc. but may also specify in some detail what
these learning objects are really about. In other
words, learning content developers can link learning
objects to learning topics and more generally
documents to domains:
<div itemscope
itemtype="http://schema.org/ImageObj
ect">
<img
src="upload.wikimedia.org/wikipedia/
commons/a/a2/Shakespeare.jpg"
itemprop="contentURL" />
<p itemprop="about" itemscope
itemtype="http://schema.org/Person">
<span
itemprop="name">Shakespeare</span>
</p>
</div>
In this simple example, it is indicated that an
item of the type “image object” located at a specific
web address is about an item of the “person” type
having the property value of “Shakespeare”. But of
course much richer semantic descriptions can be
attached to the central topic of the image (or any
other learning object) if need be. One obvious thing
to do is to assign taxonomic classes to domain topics
(“Shakespeare is a playwright”) and/or to relate
domain topics in conceptual structures using
relevant associative relations (“Shakespeare wrote
Hamlet”).
Thirdly, microdata can refer to any vocabulary,
taxonomy or ontology class or property with a
unique URL on the web and even mix different ones
in the same document. So, for instance, one could
point to the ALOCOM vocabulary to label the
didactic functions of the individual document
components of an HTML5 file (glossary, exercise,
learning objective and so on) while referring to
schema.org for the identification of the domain
categories, properties and topics of the subject
matter.
This functionality makes it possible, at least in
theory, to mark up a variety of meaning types, their
representations, and the way these interact. Authors
may encode ideational semantics, i.e. what the text
is about (“the subject matter”), textual semantics, i.e.
the communicative functions of individual document
parts (“introduction”, “abstract”, “summary”, etc.)
and interpersonal semantics, i.e. meanings relating
to the relationship between the writer and his or her
audience (“claim”, “argument”, “evidence” and so
on). To make explicit the interaction of ideational,
textual and interpersonal meaning in learning
materials, a relational genre model along the lines of
Martin & Rose (2008) might be applied. In their
approach, so-called educational micro-genres
recurrent goal-oriented configurations of meaning in
larger texts (aka macro-genres) – can be defined and
categorized, in part, according to the way they
communicate about “entities” and “events”
respectively. While stories and histories are
examples of event-oriented micro-genre families,
reports of various types classify, describe and
explain objects and phenomena, real or imagined.
By encoding and exposing micro-genres, their
subject matter, and their modes (text, image, video,
etc.), we may, eventually, not only be able to search
for embedded learning content about a particular
topic but also specify its representational
characteristics, its didactic function and design and
the learning or teaching activities it is intended to
support.
4 WHY USE SCHEMA.ORG?
The immediate attraction of schema.org for learning
content developers, besides of course the obvious
fact that it is supported by the major search engines,
is no doubt that it emerges, at least at first glance, as
a one-stop shop for descriptive tools. The average
teacher or professor is unlikely to want to spend a lot
of time trawling the web for learning content
metadata schemes as well as vocabularies for
detailing what the metadata is actually about. But if
he or she only has to look in one place, the task
seems manageable. However, the categories for
labelling and describing things and events currently
provided by schema.org are, needless to say, very
general in nature and as such do not meet the
requirements of most disciplines within the field of
education: There is neither a vocabulary for
specifying properties of chemical substances nor one
for encoding the military rank of historical persons.
There are ways of alleviating such problems, though.
HTML5,MICRODATAANDSCHEMA.ORG-TowardsanEducationalSocial-semanticWebfortheRestofUs?
103
In addition to “importing” properties from other
vocabularies, schema.org classes can be specialized.
For example, it might not be sufficient for a
historian to categorize George Armstrong Custer as
an instance of http://schema.org/Person in his or her
textbook on American history. In this case, the
author is free to expand the type specification into,
say, http://schema.org/Person/Officer or even http://
schema.org/Person/Officer/MajorGeneral. These
new categories are unknown to search engines but
may be of use to other processing software such as
local presentation scripts.
Equally importantly, work is under way to
support educational metadata at schema.org. The
Learning Resource Metadata Initiative
(http://www.lrmi.net/), co-led by the Association of
Educational Publishers and Creative Commons,
have drafted a specification comprising a limited
number of properties describing web resources
specifically designed for learning, teaching and
education. Some of these properties focus on the
intended use of these resources – assignment,
exercise or group work, say - while others are
centred round the task of aligning learning content
with competencies (one piece of content may require
a certain competency while another may teach a
specific one). Once in place, these properties should
be attached to existing schema.org vocabularies
denoting general web resources like pages, videos,
images and so on.
Some final remarks: by applying HTML5,
microdata and pointers to shared terminologies at
schema.org to our web-based learning materials, we
positively respond to the call for the “unbundling” of
learning resources on the web, i.e. the release of
“micro pieces of knowledge into the open Net”
(Breck, 2008). By marking up “didactic data”, so to
speak, in our web-based materials, we can better
unlock content and make it findable and accessible
for reuse and repurposing in new contexts and across
multiple platforms. And if microdata and
vocabularies like those published at schema.org are
widely adopted in the field of education, an
increasing amount of structured data sets will
eventually be exposed on the web and consumed by
an increasing number of e-learning systems and
tools in real educational settings. This does not, of
course, happen overnight. In the shorter term,
however, we can hope to see better search results
when we look for learning resources on the web
using our favourite search engine. And this is in
itself not such a bad start. Also, it goes without
saying that the combination of HTML5, microdata
and schema.org is no panacea and there are
challenges on the educational semantic web which
are clearly better addressed using core semantic
technologies like RDF, Linked Data, Topic Maps,
OWL, etc. Issues pertaining to subject identity,
inference and data integration may be a case in
point. But even in “traditional” semantic web circles
the impact of microdata and schema.org is being
felt. Only very recently, for example, RDFa Lite, a
less complex version of RDFa was released. RDFa
Lite is a lightweight syntax for embedding RDF
data, so-called triples, in web pages. It is expected
that RDFa Lite will be supported by schema.org in
due course giving content developers the choice
between microdata or RDF. Such developments
surely help lower the barriers of the educational
semantic web – especially for the rest of us.
ACKNOWLEDGEMENTS
I am greatly indebted to four anonymous reviewers
for their valuable comments.
REFERENCES
Allsopp, J., 2009. Developing with Web Standards.
Pearson Education.
Bezemer, J., Kress, G., 2008. Writing in Multimodal
Texts: A Semiotic Account of Designs for Learning.
Written Communication. Vol. 25, No. 2.
Breck, J., 2008. Unbundling Online Educational
Resources. In: Bruck, P.A. & Lindner, M. (eds.) 2008.
Microlearning and Capacity Building: Proceedings of
the 4th International Microlearning 2008 Conference.
Innsbruck University Press.
Gilbertson, S., 2010. Microdata: HTML5’s Best-Kept
Secret. http://www.webmonkey.com/2010/09/
microdata-html5s-best-kept-secret/ (visited 2011-11-
11).
Martin, J.R., Rose, D., 2008. Genre Relations. Mapping
Culture, Equinox Publishing Ltd
Pilgrim, M., 2010. HTML5: Up and Running, O’Reilly
Media, Inc.
Robertson, R.J., 2011. Technical standards in education,
Part 5: Take advantage of metadata. http://
www.ibm.com/developerworks/industry/library/ind-
edustand5/index.html?cmp=dw (visited 2011-11-11).
CSEDU2012-4thInternationalConferenceonComputerSupportedEducation
104