FRAMEWORK FOR REPRESENTING AND PROCESSING

ARBITRARY MATHEMATICS

Arnold Neumaier and Peter Schodl

Fakult

at f

ur Mathematik, University of Vienna,Nordbergstr. 15, A-1090 Wien, Austria

Keywords:

Knowledge representation, Human-machine coorperation.

Abstract:

While mathematicians already beneﬁt from the computer as regards numerical problems, visualization, sym-

bolic manipulation, typesetting, etc., there is no common facility to store and process information, and math-

ematicians usually have to communicate the same mathematical content multiple times to the computer. We

are in the process of creating and implementing a framework that is capable of representing and interfacing

optimization problems, and we argue that this framework can be used to represent arbitrary mathematics and

contribute towards a universal mathematical database.

1 INTRODUCTION

Mathematicians nowadays rely heavily on comput-

ers. They use them to communicate with colleagues,

search the web for information, create documents

they want to publish, perform numerical and symbolic

computations, check their proofs, store the work they

have done, etc. However, since mathematicians ad-

dress very diverse parties with their writing, a mathe-

matician usually has to formulate the same idea (e.g.,

a proof, a numerical problem, a conjunction) multi-

ple times, depending on the recipient: for a student

in great detail, for a foreign colleague working in the

same ﬁeld in less detail but a common language, for a

publication in a document markup language, for a nu-

merical solver in an algebraic modeling language, for

a proof checker in a special language and at a tremen-

dous level of detail.

Our vision is that, when represented in an adequate

way, the same mathematical content only needs to be

communicated to the computer once, and the machine

can then extract the information in different formats,

depending on the addressee. We have achieved some

promising results representing and reformulating op-

timization problems into different formal and (con-

trolled) natural languages, and we envision that our

framework used for optimization problems is general

enough for representing and communicating arbitrary

mathematics.

If a general representation with rich possibilities to

interface the information proves feasible, it may also

contribute to a huge electronic database containing

essential amounts of the known mathematics. This

vision is not new, it dates back at least to the QED

project (Boyer, 1994). Its goal was to represent all im-

portant mathematical knowledge, conforming to the

highest standards of mathematical rigor. Another vi-

sion in this direction was the universal automated in-

formation system for all sciences (Andrews, 2003).

This is even more ambitious than a universal math-

ematical database, but the prominent role of mathe-

matics in such a system, also discussed by Andrews,

would make a mathematical database probably a cor-

ner stone of such a system and could be a starting

point.

2 THE SEMANTIC MEMORY

The semantic memory is the data structure that is,

in our opinion, most adequate to represent arbitrary

mathematics. It is not a ﬁxed ﬁle format like an Excel-

sheet or an XML-ﬁle, but rather an abstract concept

of a data structure, like a binary search tree or a heap.

The semantic entities we want to refer to are called

objects. The set of objects is not ﬁxed, but frequently

used mathematical notions like set, function, deriva-

tive, manifold, R, π etc. may be objects. All the in-

formation is then stored in a way that is akin to the

semantic web, namely we store relations of the form

object1.object2 = object3 (1)

with the only restriction that

476

Neumaier A. and Schodl P..

A FRAMEWORK FOR REPRESENTING AND PROCESSING ARBITRARY MATHEMATICS.

DOI: 10.5220/0003119104760479

In Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD-2010), pages 476-479

ISBN: 978-989-8425-29-4

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

object1.object2 = object3 and

object1.object2 = object4 implies

object3 = object4.

One possible way to store such relations is in the form

of a matrix, where the relation (1) is represented as

an entry

object3

in the row with name

object1

and

column with name

object2

. We call this matrix the

semantic matrix.

Such relations can also be stored and visualized in

a labeled, directed graph, where the relation (1) is

represented as an arc from vertex

object1

to ver-

tex

object3

, labeled with

object2

. Such a directed

graph will be called a semantic graph.

For example, the information 7 + 5 = 12 may be rep-

resented by the following set of relations:

equation1.RHS = 12

equation1.LHS = term_lhs

equation1.OP = EQUAL

term_lhs.1 = 7

term_lhs.2 = 5

term_lhs.OP = PLUS

This corresponds to the semantic matrix given in Fig-

ure 1 and to the semantic graph in Figure 2.

LHS RHS OP 1 2

equation1 term lhs 12 EQUAL

term lhs PLUS 7 5

Figure 1: 7 + 5 = 12 represented in a semantic matrix.

equation1

term_lhs

LHS

EQUAL

RHS

PLUS

Figure 2: 7 + 5 = 12 represented in a semantic graph.

A semantic graph is a special case of a concept map,

a graphical tool to organize and represent knowledge,

but that is not deﬁned rigorously (Novak and Ca

nas,

2006).

3 THINGS THAT ALREADY

EXIST

We list some of the facilities that already provide

valuable aid for mathematicians. Interfaces to these

systems would be desirable.

Semantic Markup Languages for Mathematics.

This is a group of languages that represent mathemat-

ics as annotated text. They are used to communicate

formulas between machines, display formulas on the

web, etc. Examples are OMDoc (Kohlhase, 2001) and

OpenMath (Abbott et al., 1996).

Computer Algebra Systems. These are programs

that are able to perform symbolic manipulations on

terms, solve equations, plot graphs, etc. Examples

for widely used CAS’s are Mathematica, Maple and

Maxima.

Algebraic Modeling Languages. Numerical opti-

mization problems can be expressed quite comfort-

able in algebraic modeling languages. The machine

reads the description of the problem and the numeri-

cal data and is able to interface a variety of solvers.

Examples are AMPL, GAMS, GLPK and NOP-2

(Kallrath, 2004).

Proof Assistants. A proof assistant is software that

checks the validity of a proof, expressed in a special,

highly detailed and annotated language. Since trans-

lating a proof into such a language is a lot of work,

proof checkers are not widely used among mathemati-

cians.

4 NATURAL LANGUAGE IN- AND

OUTPUT

For being attractive for a working mathematician, the

ability to interface existing systems is one key fea-

ture, another one is communication in an almost nat-

ural language.

There is a consensus among mathematicians and lin-

guists that the communication of mathematics to a

computer is much easier than the communication of

arbitrary content. This has several reasons:

• Mathematical discourse has a well-deﬁned do-

main, is highly structured, and has relatively small

set of discourse relations. The reasoning patterns

applied in mathematics are widely studied and un-

derstood (Zinn, 2004). Building an ontology for,

say, number theory, is much easier than for a nat-

ural domain, because mathematicians deﬁne con-

cepts before they use them. It was even claimed

that “[. . . ] if we fail to construct an understander

for mathematical discourse, then we will also fail

to write one for other (non-trivial) domains”, see

p. 8 in (Zinn, 2004).

A FRAMEWORK FOR REPRESENTING AND PROCESSING ARBITRARY MATHEMATICS

477

• Due to the fact that mathematicians want to com-

municate unambiguously, they tend to use a rela-

tively small set of phrases to express their ideas,

and there is a standard interpretation for these

phrases. About 700 phrases sufﬁce for the es-

sential part of mathematics (deﬁnitions, theorems,

proofs, etc.) but this does not include the more in-

formal motivational part (Trzeciak, 1995).

• Mathematicians use words and phrases in a very

rigid way. The language of mathematics is sim-

ple: very few variety in time, person, etc. (Gane-

salingam, 2009).

• Another reason why mathematics is apt to be

represented by a machine is that in mathematics

we are in the (probably unique) position that ev-

ery meaningful rigorous statement can, at least

in principle, be translated into a formal language.

Therefore, it is possible for a machine to faithfully

represent the complete content of an arbitrary (but

meaningful) mathematical statement.

However, we do not intend to allow general natural

language as input, even though we expect only rela-

tively simple sentences, but we intend to exploit the

fact that mathematical language is simple by deﬁning

a controlled natural language (CNL) that is expressive

enough to fulﬁll the needs of mathematicians, while

still sounding like natural language.

For formulas, since L

X has been de facto-standard

in the mathematical community for decades, we en-

vision a reasonable subset of L

X as the main input

format.

5 THINGS ALREADY

IMPLEMENTED

• Optimization problems can be represented in the

semantic memory, and a description of the prob-

lem can then be automatically generated in the al-

gebraic modeling language AMPL and in almost

natural language. Below is an example of a simple

optimization problem, in Figure 3 formulated as a

mathematician would do, and then in in Figure 4

in the AMPL-format. Both texts have been gener-

ated automatically from a common representation

in the semantic memory comprising about 550 re-

lations of the form of (1).

• For the semantic memory we have two implemen-

tations, one written in Matlab where the seman-

tic memory is a sparse matrix and the objects are

natural numbers, and another one in Soprano, a

framework for RDF data.

Multi-dimensional knapsack.

Let integer N be number of contract , let integer M

be number of budget, let c

be contract volume of

project j for j = 1, . . . , N, let A

i, j

be estimated cost

of budget i for project j for i = 1, . . . , M and j =

1, . . . , N, let B

be available amount of budget i for

i = 1, . . . , M and let x

= 1 if project j is selected,

and let x

= 0 otherwise for j = 1, . . . , N.

Problem : Given integer N, integer M , vector c,

matrix A and vector B ﬁnd binary vector x such that

∑

j=1

is maximal under the constraint

∑

j=1

i, j

≤ B

for i = 1, . . . , M.

Figure 3: The knapsack-problem in (almost) natural mathe-

matical language.

param N ;

param M ;

param c

{

j in 1..N

}

;

param A

{

i in 1..M , j in 1..N

}

;

param B

{

i in 1..M

}

;

var x

{

j in 1..N

}

binary ;

maximize target : sum

{

j in 1..N

}

(c[j]

* x[j]);

subject to constraint 3014

{

i in 1..M

}

: sum

{

j in 1..N

}

(A[i , j] * x[j]) <=

B[i];

Figure 4: The knapsack-problem in AMPL.

• We created an interface to the controlled natu-

ral language of the Naproche project (K

uhlwein

et al., 2009), a project carried out at the univer-

sity of Bonn that enables proof checking of proofs

written in a controlled natural language.

• Creation of L

X-output of simple general mathe-

matical text represented in the semantic memory:

basic forms of deﬁnitions, assumptions, interfer-

ences, etc.

• Grammatically correct text-output is generated

via an interface to the Grammatical Framework

(Ranta, 2004), a programming language and soft-

ware package for multilingual grammar applica-

tions.

• We implemented a parser for problem ﬁles of

the TPTP (Thousands of Problems for Theorem

Provers, available at

http://www.tptp.org/

and parsed and represented large parts in the se-

mantic memory, adding up to several thousand

KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development

478

formulas. These have been represented in the se-

mantic memory and written into L

X-ﬁles.

6 CONCLUSIONS

We gave our arguments why we think that our frame-

work to represent optimization problems can be a

starting point for a more general system to repre-

sent and process arbitrary mathematics. We think that

one of the key features of a universal mathematical

database, that will actually be used, is the retrieval

of information in many different forms, together with

the ability to communicate with the user in a way nat-

ural to the human. There is already a huge amount

of mathematical knowledge on the web, e.g., in on-

line encyclopedias, in archives for scientiﬁc papers,

libraries for proof assistants etc., but as long as they

only serve single purposes, they will stay separated.

ACKNOWLEDGEMENTS

Support by the Austrian Science Fund (FWF) under

contract number P20631 is gratefully acknowledged.

REFERENCES

Abbott, J., D

ıaz, A., and Sutor, R. (1996). A report on open-

math: a protocol for the exchange of mathematical in-

formation. In SIGSAM Bulletin 30 Nr. 1. ACM.

Andrews, P. (2003). A universal automated information sys-

tem for science and technology. In First Workshop

on Challenges and Novel Applications for Automated

Reasoning.

Boyer, R. e. a. (1994). The qed manifesto. In Automated

Deduction–CADE 12. Springer.

Ganesalingam, M. (2009). The Language of Mathematics.

PhD thesis, University of Cambridge.

Kallrath, J. e. (2004). Modeling languages in mathematical

optimization (Applied Optimization Vol. 88). Kluwer

Academic Publisghers, Boston, Dordrecht, London.

Kohlhase, M. (2001). Omdoc: Towards an internet stan-

dard for the administration, distribution, and teaching

of mathematical knowledge. In Artiﬁcial Intelligence

and Symbolic Computation. Springer.

uhlwein, D., Cramer, M., Koepke, P., and Schr

oder, B.

(2009). The naproche system. In Intelligent Computer

Mathematics. Springer.

Novak, J. and Ca

nas, A. (2006). The theory underlying con-

cept maps and how to construct and use them. In Tech-

nical report IHMC CmapTools 2006-01. Florida Insti-

tute for Human and Machine Cognition, Pensacola Fl.

Ranta, A. (2004). Grammatical framework. In Journal of

Functional Programming, 14 Nr. 2. Cambridge Uni-

versity Press.

Trzeciak, J. (1995). Writing mathematical papers in En-

glish: a practical guide. Gda

nsk Teacher’s Press,

Gda

nsk.

Zinn, C. (2004). Understanding informal mathematical dis-

course. PhD thesis, University of Erlangen-N

urnberg.

A FRAMEWORK FOR REPRESENTING AND PROCESSING ARBITRARY MATHEMATICS

479