Numl: A Strongly Typed Language

for Numerical Accuracy

†

Matthieu Martel

Laboratoire de Math

ematiques et Physique (LAMPS),

Universit

e de Perpignan Via Domitia, France

matthieu.martel@univ-perp.fr

Abstract. It is well-known that numerical computations may sometimes lead to

wrong results because of roundoff errors. We propose an ML-like type system

(strong, implicit, polymorphic) for numerical computations in ﬁnite precision,

in which the type of an expression carries information on its accuracy. We use

dependent types and a type inference which, from the user point of view, acts

like ML type inference. Basically, our type system accepts expressions for which

it may ensure a certain accuracy on the result of the evaluation and it rejects

expressions for which a minimal accuracy on the result of the evaluation cannot

be inferred. The soundness of the type system is ensured by a subject reduction

theorem and we show that our type system is able to type implementations of

usual simple numerical algorithms.

1 Introduction

It is well-known that numerical computations may sometimes lead to wrong results

because of the accumulation of roundoff errors [10]. Recently, much work has been

done to detect these accuracy errors in ﬁnite precision computations [1], by static [8,

11, 24] or dynamic [9] analysis, to ﬁnd the least data formats needed to ensure a certain

accuracy (precision tuning) [14, 16, 23] and to optimize the accuracy by program trans-

formation [7, 20]. All these techniques are used late in the software development cycle,

once the programs are entirely written.

In this article, we aim at exploring a different direction. We aim at detecting and

correcting numerical accuracy errors at software development time, i.e. during the pro-

gramming phase. From a software engineering point of view, the advantages of our

approach are many since it is well-known that late bug detection is time and money

consuming. We also aim at using intensively used techniques recognized for their abil-

ity to discard run-time errors. This choice is motivated by efﬁciency reasons as well as

for end-user adoption reasons.

We propose an ML-like type system (strong, implicit, polymorphic [21]) for numer-

ical computations in which the type of an arithmetic expression carries information on

†

This work is supported by the Ofﬁce for Naval Research Global under Grant NICOP

N62909-18-1-2068 (Tycoon project). https://www.onr.navy.mil/en/Science-Technology/ONR-

Global

Martel, M.

Numl - A Strongly Typed Language for Numerical Accuracy.

DOI: 10.5220/0008862701770200

In OPPORTUNITIES AND CHALLENGES for European Projects (EPS Portugal 2017/2018 2017), pages 177-200

ISBN: 978-989-758-361-2

177

its accuracy. We use dependent types [22] and a type inference which, from the user

point of view, acts like ML [17] type inference [21] even if it slightly differs in its im-

plementation. While type systems have been widely used to prevent a large variety of

software bugs, to our knowledge, no type system has been targeted to address numeri-

cal accuracy issues in ﬁnite precision computations. Basically, our type system accepts

expressions for which it may ensure a certain accuracy on the result of the evaluation

and it rejects expressions for which a minimal accuracy on the result of the evaluation

cannot be inferred.

In our type system, uniﬁcation necessitates to solve sets of constraints made of

propositional logic formulas and relations between afﬁne expressions over integers (and

only integers). Indeed, these relations remain linear even if the term to be typed contains

non-linear computations. As a consequence, these constraints can be easily checked by

a SMT solver (we use Z3 in practice [18]).

Let us insist on the fact that we use a dependent type system. Consequently, the

type corresponding to a function of some argument x depends on the type of x itself.

The soundness of our type system relies on a subject reduction theorem introduced in

Section 4. Based on an instrumented operational semantics computing both the ﬁnite

precision and exact results of a numerical computation, this theorem shows that the error

on the result of the evaluation of some expression e is less than the error predicted by

the type of e. Obviously, as any non-trivial type system, our type system is not complete

and rejects certain programs that would not produce unbounded numerical errors. Our

type system has been implemented in a prototype language Numl and we show that, in

practice, our type system is expressive enough to type implementations of usual simple

numerical algorithms [2] such as the ones of Section 6. Let us also mention that our type

system represents a new application of dependent type theory motivated by applicative

needs. Indeed, dependent types arise naturally in our context since accuracy depends on

values.

This article is organized as follows. Section 2 introduces informally our type system

and shows how it is used in our implementation of a ML-like programming language,

Numl. The formal deﬁnition of the types and of the inference rules are given in Section

3. Section 3.1 introduces the type system itself while Section 3.2 introduces the types of

the primitives of the language. A soundness theorem is given in Section 4. The imple-

mentation of the type system is discussed in Section 5. Sections 5.1 and 6 present the

uniﬁcation algorithm and Section 6 presents experimental results Section 7 discuss the

special case of the IEEE754 ﬂoating-point arithmetic [1]. Section 8 describes related

work and Section 9 concludes.

2 Programming with Types for Numerical Accuracy

In this section, we present informally how our type system works throughout a pro-

gramming sequence in our language, Numl. First of all, we use real numbers r{s, u, p}

where r is the value itself, and {s, u, p} the format of r. The format of a real number is

made of a sign s ∈ Sign and integers u, p ∈ Int such that u is the unit in the ﬁrst place

of r, written ufp(r) and p the precision (i.e. the number of digits of the number). For

inputs, p is either explicitely speciﬁed by the user or set by default by the system. For

178

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

178

Format Name p e bits e

min

max

Binary16 Half precision 11 5 −14 +15

Binary32 Single precision 24 8 −126 +127

Binary64 Double precision 53 11 −1122 +1223

Binary128 Quadruple precision 113 15 −16382 +16383

Fig. 1. Basic binary IEEE754 formats.

outputs, p is inferred by the type system. We have Sign = {0, ⊕, , >} and sign(r) = 0

if r = 0, sign(r) = ⊕ if r > 0 and sign(r) =  if r < 0. The set Sign is equipped

with the partial order relation ≺⊆ Sign × Sign deﬁned by 0 ≺ ⊕, 0 ≺ , ⊕ ≺ > and

 ≺ >. The ufp of a number x is

ufp(x) = min



i ∈ N : 2

i+1

> x



= blog

(x)c . (1)

The term p deﬁnes the precision of r. Let ε(r) be the absolute error on r, we assume

that ε(r) < 2

u−p+1

. The errors on the numerical constants arising in programs are

speciﬁed by the user or determined by default by the system. The errors on the computed

values can be inferred by propagation of the initial errors. Similarly to Equation (1), we

also deﬁne the unit in the last place (ulp) used later in this article. The ulp of a number

of precision p is deﬁned by

ulp(x) = ufp(x) − p + 1 . (2)

For example, the type of 1.234 is real{+, 0, 53} since ufp(1.234) = 0 and since

we assume that, by default, the real numbers have the same precision as in the IEEE754

double precision ﬂoating-point format [1] (see Figure 1). Other formats may be spec-

iﬁed by the programmer, as in the example below. Let us also mention that our type

system is independent of a given computer arithmetic. The interpreter only needs to

implement the formats given by the type system, using ﬂoating-point numbers, ﬁxed-

point numbers [12], multiple precision numbers

, etc in order to ensure that the ﬁnite

precision operations are computed exactly. The special case of IEEE754 ﬂoating-point

arithmetic, which introduces additional errors due to the roundoff on results of opera-

tions can also be treated by modifying slightly the equations of Section 3.

> 1.234 ;; (

precision of 53 bits by default

)

- : real{+,0,53} = 1.234000000000000

> 1.234{4};; (

precision of 4 bits specified by the user

)

- : real{+,0,4} = 1.2

Notice that, in Numl, the type information is used by the pretty printer to display only

the correct digits of a number and a bound on the roundoff error.

Note that accuracy is not a property of a number but a number that states how closely

a particular ﬁnite-precision number matches some ideal true value. For example, using

the basis β = 10 for the sake of simplicity, the ﬂoating-point value 3.149 represents π

https://gmplib.org/

179

Numl - A Strongly Typed Language for Numerical Accuracy

179

with an accuracy of 3. It itself has a precision of 4. It represents the real number 3.14903

with an accuracy of 4. As in ML, our type system admits parameterized types [21].

> let f = fun x -> x + 1.0 ;;

val f : real{’a,’b,’c} -> real{<expr>,<expr>,<expr>} = <fun>

> verbose true ;;

- : unit = ()

> f ;;

- : real{’a,’b,’c} -> real{(SignPlus ’a ’b 1 0),((max ’b 0) +_ (sigma+ ’a 1)),

((((max ’b 0) +_ (sigma+ ’a 1)) -_ (max (’b -_ ’c) -53))-_ (iota (’b -_ ’c) -53))} = <fun>

In the example above, the type of f is a function of an argument whose parame-

terized type is real{’a, ’b, ’c}, where ’a, ’b and ’c are three type variables. The

return type of the function f is Real{e

} where e

, e

and e

are arithmetic

expressions containing the variables ’a, ’b and ’c. By default these expressions are

not displayed by the system (just like higher order values are not explicitly displayed

in ML implementations) but we may enforce the system to print them. In Numl, we

write +, -,

and / for the operators over real numbers. Integer expressions have type

int and we write + , - ,

and / for the elementary operators over integers. The

expressions arising in the type of f are explained in Section 3. As shown below, various

applications of f yield results of various types, depending on the type of the argument.

> f 1.234 ;;

- : real{+,1,53} = 2.234000000000000

> f 1.234{4} ;;

- : real{+,1,5} = 2.2

If the interpreter detects that the result of some computation has no signiﬁcant digit,

then an error is raised. For example, it is well-known that in IEEE754 double precision

(10

+ 1) − 10

= 0. Our type system rejects this computation.

> (1.0e15 + 1.0) - 1.0e15 ;;

- : real{+,50,54} = 1.0

> (1.0e16 + 1.0) - 1.0e16 ;;

Error: The computed value has no significant digit. Its ufp is 0 but

the certified value is 1

Last but not least, our type system accepts recursive functions. For example, we have:

> let rec g x = if x < 1.0 then x else g (x

0.07) ;;

val g : real{+,0,53} -> real{+,0,53} = <fun>

> g 1.0 ;;

- : real{+,0,53} = 0.07000000000000

> g 2.0 ;;

Error: This expression has type real{+,1,53} but an expression was

expected of type real{+,0,53}

In the above session, the type system uniﬁes the return type of the function with

the type of the conditional. The types of the then and else branches also need to be

uniﬁed. Then the return type is real{+,0,53} which corresponds to the type of the

180

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

180

value 1.0 used in the then branch. The type system also uniﬁes the return type with

the type of the argument since the function is recursive. Finally, we obtain that the type

of g is real{+,0,53} -> real{+,0,53}. As a consequence, we cannot call g

with an argument whose ufp is greater than ufp(1.0) = 0. To overcome this limitation,

we introduce new comparison operations for real numbers. While the standard com-

parison operator < has type ’a -> ’a -> bool, the operator <{s,u,p} has type

real{s,u,p} -> real{s,u,p} -> bool. In other words, the compared value

are cast in the format {s, u, p} before performing the comparison. Now we can write

the code:

> let rec g x = if x <{

,10,15} 1.0 then x else g (x

0.07) ;;

val g : real{

,10,15} -> real{

,10,15} = <fun>

> g 2.0 ;;

- : real{

,10,15} = 0.1

> g 456.7 ;;

- : real{

,10,15} = 0.1

> g 4567.8 ;;

Error: This expression has type real{+,12,53} but an expression was

expected of type real{

,10,15}

Interestingly, unstable functions (for which the initial errors grow with the number

of iterations) are not typable. This is a desirable property of our system.

> let rec h n = if (n=0) then 1.0 else 3.33

(h (n -_ 1)) ;;

Error: This expression has type real{+,-1,-1} but an expression was

expected of type real{+,-3,-1}

Stable computations should be always accepted by our type system. Obviously, this

is not the case and, as any non-trivial type system, our type system rejects some cor-

rect programs. The challenge is then to accept enough programs to be useful from an

end-user point of view. We end this section by showing another example representa-

tive of what our type system accepts. More examples are given later in this article,

in Section 6. The example below deals with the implementation of the Taylor series

1−x

n≥0

. The implementation gives rise to a simple recursion, as shown in

the programming session below.

> let rec taylor x{

,-1,25} xn i n = if (i > n) then 0.0{

,10,20}

else xn + (taylor x (x

xn) (i +_ 1) n) ;;

val taylor : real{

,-1,25} -> real{

,10,20} -> int -> int -> real{

,10,20} = <fun>

> taylor 0.2 1.0 0 5;;

- : real{

,10,20} = 1.249

Obviously, our type system computes the propagation of the errors due to ﬁnite pre-

cision but does not take care of the method error intrinsic to the implemented algorithm

(the Taylor series instead of the exact formula

1−x

in our case.) All the programming

sessions introduced above as well as the additional examples of Section 6 are fully inter-

active in our system, Numl, i.e. the type judgments are obtained instantaneously (about

0.01 second in average following our measurements) including the most complicated

ones.

181

Numl - A Strongly Typed Language for Numerical Accuracy

181

3 The Type System

In this section, we introduce the formal deﬁnition of our type system for numerical

accuracy. First, in Section 3.1, we deﬁne the syntax of expressions and types and we in-

troduce a set of inference rules. Then we deﬁne in Section 3.2 the types of the primitives

for the operators among real numbers (addition, product, etc.) These types are crucial

in our system since they encode the propagation of the numerical accuracy information.

3.1 Expressions, Types and Inference Rules

In this section, we introduce the expressions, types and typing rules for our language.

For the sake of simplicity, the syntax introduced hereafter uses notations

a la lambda

calculus instead of the ML-like syntax employed in Section 2. In our system, expressions

and types are mutually dependent. They are deﬁned inductively using the grammar of

Equation (3).

Expr 3 e ::= r{s, u, p} ∈ Real

u,p

| i ∈ Int | b ∈ Bool | id ∈ Id

| if e

then e

else e

| λx.e | e

| rec f x.e | t

Typ 3 t ::= | int | bool | real{i

, i

} | α | Πx : e

IExp 3 i ::= | int | op ∈ Id

| α | i

(3)

In Equation (3), the e terms correspond to expressions. Constants are integers i ∈ Int,

booleans b ∈ Bool and real numbers r{s, u, p} where r is the value itself, s ∈ Sign is

the sign as deﬁned in Section 2 and u, p ∈ Int the ufp (see Equation (1)) and precision

of r. For inputs, the precision p is given by the user by means of annotations or chosen

by default by the system. Then p is inferred for the outputs of programs. The term p

deﬁnes the precision of r. Let ε(r) be the absolute error on r, we assume that

ε(r) < 2

u−p+1

. (4)

The errors on the numerical constants arising in programs are speciﬁed by the user or

determined by default by the system. The errors on the computed values can be inferred

by propagation of the initial errors.

In Equation (3), identiﬁers belong to the set Id and we assume a set of pre-deﬁned

identiﬁers +, −, ×, ≤, =, . . . related to primitives for the logical and arithmetic opera-

tions. We write +, −, × and ÷ the operations on real numbers and + , − , × and ÷

the operations among integers. The language also admits conditionals, functions λx.e,

applications e

and recursive functions rec f x.e where f is the name of the func-

tion, x the parameter and e the body. The language of expressions also includes type

expressions t deﬁned by the second production of the grammar of Equation (3).

The deﬁnition of expressions and type is mutually recursive. Type variables are

denoted α, β, . . . and Πx : e

is used to introduce dependent types [22]. Let us

notice that our language does not explicitly contain function types t

→ t

since they

are encoded by means of dependent types. Let ≡ denote the syntactic equivalence, we

have

→ t

≡ Πx : t

with x not free in t

. (5)

182

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

182

Γ ` i : int

(INT)

Γ ` b : bool

(BOOL)

sign(r) ≺ s ufp(r) ≤ u

Γ ` r{s,u,p} : real{s, u, p}

(REAL)

Γ (id) = t

Γ ` id : t

(ID)

Γ ` e

: bool Γ ` e

: t

Γ ` e

: t

t = t

t t

Γ ` if e

then e

else e

: t

(COND)

Γ, x : t

` e : t

Γ ` λx.e : Πx : t

(ABS)

Γ, x : t

, f : Π.y : t

` e : t

y not free in t

Γ ` rec f x.e : Πx : t

(REC)

Γ ` e

: Πx : t

Γ ` e

: t

v t

Γ ` e

: t

[x 7→ e

]

(APP)

Fig. 2. Typing rules for our language.

For convenience, we also write λx

. . . x

.e instead of λx

.λx

. . . λx

.e and Πx

: t

. . . x

: t

.e instead of Πx

: t

.Πx

: t

. . . Πx

: t

.e.

The types of constants are int, bool and real{i

, i

} where i

, i

and i

are integer expressions denoting the format of the real number. Integer expressions of

IExpr ⊆ Expr are a subset of expressions made of integer numbers, integer primitives of

⊆ Id (such as + , × , etc.), type variables and applications. Note that this deﬁnition

restricts signiﬁcantly the set of expressions which may be written inside real types.

The typing rules for our system are given in Figure 2. These rules are mostly classi-

cal. The type judgment Γ ` e : t means that in the type environment Γ , the expression

e has type t. A type environment Γ : Id → Typ maps identiﬁers to types. We write

Γ x : t the environment Γ in which the variable x has type t. The typing rules (INT)

and (BOOL) are trivial. Rule (REAL) states that the type of a real number r{s,u,p}

is real{s, u, p} assuming that the actual sign of r is less than s and that the ufp of

r is less than u. Following Rule (ID), an identiﬁer id has type t if Γ (id) = t. Rules

(COND), (ABS) and (REC) are standard rules for conditionals and abstractions respec-

tively. The rule for application, (APP), requires that the ﬁrst expression e

has type

Πx : t

(which is equivalent to t

→ t

if x is not free in t

) and that the argument

has some type t

v t

. The sub-typing relation @ is introduced for real numbers.

Intuitively, we want to allow the argument of some function to have a smaller ulp than

what we would require if we used t

= t

in Rule (APP), provided that the precision p

remains as good with t

as with t

. This relaxation allows to type more terms without

invalidating the type judgments. Formally, the relation v is deﬁned by

real{s

, u

, p

} v real{s

, u

, p

} ⇐⇒ s

v s

∧ u

≥ u

∧ p

≤ u

−u

(6)

In other words, the sub-typing relation of Equation (6) states that it is always correct to

add zeros before the ﬁrst signiﬁcant digit of a number, as illustrated in Figure 3.

3.2 Types of Primitives

In this section, we introduce the types of the primitives of our language. As mentionned

earlier, the arithmetic and logic operators are viewed as functional constants of the

language. The type of a primitive for an arithmetic operation among integers ∗ ∈

183

Numl - A Strongly Typed Language for Numerical Accuracy

183

0 0

Fig. 3. The sub-typing relation v of Equation (6).

{+ , − , × , ÷ } is

∗

= Πx : int.y : int.int . (7)

The type of comparison operators on∈ {=, 6=, <, >, ≤, ≥} are polymorphic with the

restriction that they reject the type real{s, u, p} which necessitates special comparison

operators:

= Πx : α.y : α.bool α 6= real{s, u, p} . (8)

For real numbers, we use comparisons at a given accuracy deﬁned by the operators

{u,p}

∈ {<

{u,p}

, >

{u,p}

}. We have

{u,p}

= Πs : int, u : int, p : int.real{s, u, p + 1} → real{s, u, p + 1} → bool .

Notice that the operands of a comparison on

{u,p}

must have p + 1 bits of accuracy.

This is to avoid unstable tests, as detailed in the proof of Lemma 3 in Section 4. An

unstable test is a comparison between two approximate values such that the result of the

comparison is altered by the approximation error. For instance, if we reuse an example

of Section 2, in IEEE754 double precision, the condition 10

+ 1 = 10

evaluates

to true. We need to avoid such situations in our language in order to preserve our

subject reduction theorem (we need the control-ﬂow be the same in the ﬁnite precision

and exact semantics). Let us also note that our language does not provide an equality

relation =

{u,p}

for real values. Again this is to avoid unstable tests. Given values x

and y of type real{s, u, p}, the programmer is invited to use |x− y| < 2

u−p+1

instead

of x = y in order to get rid of the perturbations of the ﬁnite precision arithmetic.

The types of primitives for real arithmetic operators are fundamental in our system

since they encode the propagation of the numerical accuracy information. They are

deﬁned in ﬁgures 4 and 5. The type t

∗

of some operation ∗ ∈ {+, −, ×, ÷} is a pi-type

with takes six arguments s

, u

, p

, s

, u

and p

of type int corresponding to the sign,

ufp and precision of the two operands of ∗ and which produces a type

real{s

, u

, p

} → real{s

, u

, p

} → real{S

∗

, s

), U

∗

, u

, s

, u

), P

∗

, p

, u

, p

)}

(9)

where S

∗

, U

∗

and P

∗

are functions which compute the sign, ufp and precision of the

result of the operation ∗ in function of s

, u

, p

, s

, u

and p

. These functions extend

the functions used in [16].

The functions S

∗

determine the sign of the result of an operation in function of the

signs of the operands and, for additions and subtractions, in function of the ufp of the

operands. The functions U

∗

compute the ufp of the result. Notice that U

and U

−

use

184

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

184

∗

= Πs

: int, u

: int, p

: int, s

: int, u

: int, p

: int.

real{s

, u

, p

} → real{s

, u

, p

}

→ real{S

∗

, u

, s

, u

), U

∗

, u

, s

, u

), P

∗

, u

, p

, s

, u

, p

)}

, u

, s

, u

)) = max(u

, u

) + σ

, s

)

, u

, p

, s

, u

, p

) = max(u

, u

) + σ

, s

)−

max(u

− p

, u

− p

) − ι(u

− p

, u

− p

)

−

, u

, s

, u

)) = max(u

, u

) + σ

−

, s

)

−

, u

, p

, s

, u

, p

) = max(u

, u

) + σ

−

, s

)−

max(u

− p

, u

− p

) − ι(u

− p

, u

− p

)

, u

, s

, u

)) = u

+ u

+ 1

, u

, p

, s

, u

, p

) = u

+ u

+ 1−

max(u

+ u

+ 1 − p

, u

+ u

+ 1 − p

) − ι(p

, p

)

, u

, s

, u

)) = u

− u

+ 1

, u

, p

, s

, u

, p

) = P

, p

, u

, p

) − 1 ι(x, y) =



1 if x = y,

0 otherwise.

Fig. 4. Types of the primitives corresponding to the elementary arithmetic operations ∗ ∈

{+, −, ×, ÷}. The functions S

∗

and σ

∗

are deﬁned in Figure 5.

the functions σ

and σ

−

, respectively. These functions are deﬁned in the bottom right

corner of Figure 5 to increment the ufp of the result of some addition or subtraction

in the relevant cases only. For example if a and b are two positive real numbers then

ufp(a + b) is possibly max



ufp(a), ufp(b)



+ 1 but if a > 0 and b < 0 then ufp(a + b)

is not greater than max



ufp(a), ufp(b)



. The functions P

∗

compute the precision of the

result. Basically, they compute the number of bits between the ufp and the ulp of the

result.

We end this section by exhibiting some properties of the functions P

∗

. Let ε(x)

denote the error on x ∈ Real

u,p

. We have ε(x) < 2

u−p+1

= ulp(x). Let us start with

addition. Lemma 1 relates the accuracy of the operands to the accuracy of the result of

an addition between two values x and y. Lemma 2 is similar to Lemma 1 for product.

Lemma 1. Let x and y be two values such that ε(x) < 2

−p

and ε(y) < 2

−p

Let z = x + y,

u = U

, u

, s

, u

)

p = P

, u

, p

, s

, u

, p

) .

(10)

Then ε(z) < 2

u−p+1

Proof. The errors on addition may be bounded by e

= ε(x)+ε(y). Then the most sig-

niﬁcant bit of the error has weight ufp(e

) and the accuracy of the result is p = ufp(x+

y) − ufp(e

). Let u = ufp(x + y) = max(u

, u

) + σ

, s

) = U

, u

, s

, u

We need to over-approximate e

in order to ensure p. We have ε(x) < 2

−p

and

ε(y) < 2

−p

and, consequently, e

< 2

−p

+ 2

−p

. We introduce the

function ι(x, y) also deﬁned in Figure 4 and which is equal to 1 if x = y and 0 other-

wise. We have

ufp(e

) < max(u

− p

+ 1, u

− p

+ 1) + ι(u

− p

, u

− p

)

≤ max(u

− p

, u

− p

) + ι(u

− p

, u

− p

)

185

Numl - A Strongly Typed Language for Numerical Accuracy

185

and S

0 + − >

+ + +

+ if u

< u

− if u

< u

> otherwise

− −

+ if u

< u

− if u

< u

> otherwise

− >

> > > >

0 + − >

0 0 0 0

0 + − >

−

0 − + >

0 > > >

−

0 + − >

0 − + >

+ +

− if u

< u

+ if u

< u

> otherwise

+ >

− − −

− if u

< u

+ if u

< u

> otherwise

> > > >

0 + − >

0 0 0 0 0

+ 0 1 0 1

− 0 0 1 1

> 0 1 1 1

−

0 + − >

0 0 0 0 0

+ 0 0 1 1

− 0 1 0 1

> 0 1 1 1

Fig. 5. Operators used in the types of the primitives of Figure 4.

Let us write p = max(u

−p

, u

−p

)−ι(u

−p

, u

−p

) = P

, u

, p

, u

, p

We conclude that u = U

, u

, s

, u

), p = P

, u

, p

, u

, p

) and ε(z) <

u−p+1

. 

Lemma 2. Let x and y be two values such that ε(x) < 2

−p

and ε(y) < 2

−p

Let z = x × y, u = U

, u

, s

, u

) and p = P

, u

, p

, s

, u

, p

). Then

ε(z) < 2

u−p+1

Proof. For product, we have p = ufp(x × y) − ufp(e

) with e

= x · ε(y) + y · ε(x) +

ε(x) · ε(y). Let u = u

+ u

+ 1 = U

, u

, s

, u

). We have, by deﬁnition of ufp,

≤ x < 2

and 2

≤ y < 2

. Then e

may be bounded by

< 2

· 2

−p

+ 2

· 2

−p

+ 2

−p

· 2

−p

= 2

−p

+ 2

−p

+ 2

−p

(11)

Since u

−p

+2 < u

−p

+2 and u

−p

+2 < u

−p

+2,

we may get rid of the last term of Equation (11) and we obtain that

ufp(e

) < max(u

+ u

− p

+ 2, u

+ u

− p

+ 2) + ι(p

, p

)

≤ max(u

+ u

− p

+ 1, u

+ u

− p

+ 1) + ι(p

, p

) .

Let us write p = max(u

−p

+1, u

−p

+1)−ι(p

, p

) = P

, u

, p

). Then u = U

, u

, s

, u

), p = P

, u

, p

, u

, p

) and ε(z) < 2

u−p+1



186

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

186

Note that, by reasoning on the exponents of the values, the constraints resulting

from a product become linear. The equations for subtraction and division are almost

identical to the equations for addition and product, respectively. Note that the result of

a division has one less bit than the result of a product. This is due to the fact that, even

if the operands are ﬁnite numbers, the result of the division may be irrational and pos-

sibly needs to be truncated. We conclude this section with the following theorem which

summarize the properties of the types of the result of the four elementary operations.

Theorem 1. Let x and y be two values such that ε(x) < 2

−p

and ε(y) <

−p

and let ∗ ∈ {+, −, ×, ÷} be an elementary operation. Let z = x ∗ y, let

u = U

∗

, u

, s

, u

) ,

p = P

∗

, u

, p

, s

, u

, p

) .

(12)

Then ε(z) < 2

u+p−1

Proof. The cases of addition and product correspond to Lemma 1 and Lemma 2, respectively.

The cases of subtraction and division are similar. 

Numl uses a modiﬁed Hindley-Milner type inference algorithm. Linear constraints

among integers are generated (even for non linear expressions). They are solvd using

a SMT solver. For space limitation reasons, the details of this algorithm are out of the

scope of this articl e.

4 Soundness of the Type System

In this section, we introduce a subject reduction theorem proving the consistency of

our type system. We use two operational semantics →

and →

for the ﬁnite precision

and exact arithmetics, respectively. The exact semantics is used for proofs. Obviously,

in practice, only the ﬁnite precision semantics is implemented. We write → whenever

a reduction rule holds for both →

and →

(in this case, we assume that the same

semantics →

or →

is used in the lower and upper parts of the same sequent). Both

semantics are displayed in Figure 6. They concern the subset of the language of Equa-

tion (3) which do not deal with types.

EvalExpr 3 e ::= r{s, u, p} ∈ Real

u,p

| i ∈ Int | b ∈ Bool | id ∈ Id

| if e

then e

else e

| λx.e | e

| rec f x.e| e

∗ e

(13)

In Equation (13), ∗ denotes an arithmetic operator ∗ ∈ {+, −, ×, ÷, + , − , × ,

÷ }. In Figure 6, Rule (FVAL) of →

transforms a syntactic element describing a real

number r{s, u, p} in a certain format into a value v

. The ﬁnite precision value v

is an

approximation of r with an error less than the ulp of r{s, u, p}. In the semantics →

the real number r{s, u, p} simply produces the value r without any approximation by

Rule (RVal). Rules (Op1) and (Op2) evaluate the operands of some binary operation

and Rule (Op) performs an operation ∗ ∈ {+, −, ×, ÷, + , − , × , ÷ } between two

values v

and v

187

Numl - A Strongly Typed Language for Numerical Accuracy

187

|r − v

| < 2

u−p+1

ufp(r) ≤ u sign(v

) ≺ s

r{s, u, p} →

(FVal)

= r

r{s, u, p} →

(RVal)

→ e

∗ e

→ e

∗ e

(Op1)

→ e

v ∗ e

→ v ∗ e

(Op2) ∗ ∈ {+, −, ×, ÷, + , − , × , ÷ }

v = v

∗ v

→ v

(Op) ∗ ∈ {+, −, ×, ÷, + , − , × , ÷ } rec f x.e → λx.ehrec f x.e/f i (REC)

→ e

on e

→ e

on e

(Cmp1)

→ e

v on e

→ v on e

(Cmp2) on∈ {<

{u,p}

, >

{u,p}

, <, >}

b = (2

u−p+1

on v

− v

)

{u,p}

→

(FCmp)

b = (v

on v

)

{u,p}

→

(RCmp) on∈ {<

{u,p}

, >

{u,p}

}

→ e

(App1)

→ e

v → e

(App2) (λx.e) v → ehv/xi (Red)

b = v

on v

→ b

on∈ {=, 6=, <, >, ≤, ≥}

→ e

if e

then e

else e

→ if e

then e

else e

(Cond)

v = true

if v then e

else e

→ e

(CondTrue)

v = false

if v then e

else e

→ e

(CondFalse)

Fig. 6. Operational semantics for our language.

Rules (Cmp1), (Cmp2) and (ACmp) deal with comparisons. They are similar to

Rules (Op1), (Op2) and (Op) described earlier. Note that the operators <, >, =, 6=

concerned by Rule (ACmp) are polymorphic except that they do not accept arguments

of type real. Rules (FCmp) and (RCmp) are for the comparison of real values.

Rule (FCmp) is designed to avoid unstable tests by requiring that the distance between

the two compared values is greater than the ulp of the format in which the comparison

is done. With this requirement, a condition cannot be invalidated by the roundoff errors.

Let us also note that, with this deﬁnition, x <

u,p

y 6⇒ y >

u,p

x or x >

u,p

y 6⇒ y <

u,p

x. For the semantics →

, Rule (RCmp) simply compares the exact values.

The other rules are standard and are identical in →

and →

. Rules (App1), (App2)

and (Red) are for applications and Rule (Rec) is for recursive functions. We write

ehv/xi the term e in which v has been substituted to the free occurrences of x. Rules

(Cond), (CondTrue) and (CondFalse) are for conditionals.

The rest of this section is dedicated to our subject reduction theorem. First of all,

we need to relate the traces of →

and →

. We introduce new judgments

Γ |= (e

, e

) : t . (14)

Intuitively, Equation (14) means that expression e

simulates e

up to accuracy t. In

this case, e

is syntactically equivalent to e

up to the values which, in e

, are approxi-

mations of the values of e

. The value of the approximation is given by type t.

188

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

188

Γ |= (i, i) : int

(INT)

Γ |= (b, b) : bool

(BOOL)

Γ (id) = t

Γ |= (id, id) : t

(ID)

sign(r) ≺ s ufp(r) ≤ u

Γ |= (r{s,u,p}, r{s,u,p}) : real{s, u, p}

(SREAL)

− v

| < 2

u−p+1

Γ |= (v

, v

) : real{s, u, p}

(VREAL)

Γ |= (e

, e

) : real{s

, u

, p

} Γ |= (e

, e

) : real{s

, u

, p

} ∗ ∈ {+, −, ×, ÷}

Γ |= (e

∗ e

, e

∗ e

) : real{S

∗

, u

, s

, u

), U

∗

, u

, s

, u

), P

∗

, u

, p

, s

, u

, p

)}

(ROP)

Γ |= (e

, e

) : real{s

, u, p + 1} Γ |= (e

, e

) : real{s

, u, p + 1} ∗ ∈ {<, >}

Γ |= (e

u,p

, e

u,p

) : bool

(RCMP)

Γ |= (e

, e

) : int Γ |= (e

, e

) : int ∗ ∈ {+ , − , × , ÷ }

Γ |= (e

∗ e

, e

∗ e

) : int

(INTOP)

Γ |= (e

, e

) : t Γ |= (e

, e

) : t t 6= real{s, u, p} on∈ {=, 6=, <, >, ≤, ≥}

Γ |= (e

on e

, e

on e

) : bool

(ACMP)

Γ |= (e

, e

) : bool Γ |= (e

, e

) : t

Γ |= (e

, e

) : t

t = t

t t

Γ |= (if e

then e

1 else e

, if e

then e

else e

) : t

(COND)

Γ, x : t

|= (e

, e

) : t

Γ |= (λx.e

, λx.e

) : Πx : t

(ABS)

Γ, x : t

, f : Π.y : t

|= (e

, e

) : t

Γ |= (rec f x.e

, rec f x.e

) : Πx : t

(REC)

Γ |= (e

, e

) : Πx : t

Γ |= (e

, e

) : t

v t

Γ |= (e

, e

) : t

[x 7→ e

]

(APP)

Fig. 7. Simulation relation |= used in our subject reduction theorem.

Formally, |= is deﬁned in Figure 7. These rules are similar to the typing rules of

Figure 2 excepted that they operate on pairs (e

, e

). They are also designed for the

language of Equation (13) and, consequently, deal with the elementary arithmetic oper-

ations +, −, × and ÷ as well as the comparison operators. The difference between the

rules of Figure 2 and Figure 7 is in Rule (VReal) which states that a real value v

correctly simulated by a value v

up to accuracy real{s, u, p} if |v

− v

| < 2

u−p+1

It is easy to show, by examination of the rules of Figure 2 and Figure 7 that

Γ |= (e

, e

) : t =⇒ Γ ` e

: t . (15)

We introduce now Lemma 3 which asse rts the soundness of the type system for

one reduction step. Basically, this lemma states that types are preserved by reduction

and that concerning the values of type real, the distance between the ﬁnite precision

value and the exact value is less than the ulp given by the type.

Lemma 3 (Weak subject reduction). If Γ |= (e

, e

) : t and if e

→

and

→

then Γ |= (e

, e

) : t.

Proof. By induction on the structure of expressions and case examination on the possible tran-

sition rules of Figure 6.

189

Numl - A Strongly Typed Language for Numerical Accuracy

189

– If e

≡ e

≡ r{s, u, p} then Γ |= (r{s,u,p}, r{s,u,p}) : real{s, u, p} and, from

the reduction rules (FVal) and (RVal) of Figure 6, r{s, u, p} →

and r{s, u, p} →

with |v

− v

| < 2

u−p+1

. So Γ |= (v

, v

) : real{s, u, p}.

– If e

≡ e

∗ e

and e

≡ e

∗ e

then several cases must be distinguished.

• If e

≡ v

∗ v

and e

≡ v

∗ v

then, by induction hypothesis, Γ |= (v

, v

) :

real{s

, u

, p

}, Γ |= (v

, v

) : real{s

, u

, p

} and, consequently, from Rule

(VREAL),

− v

| < 2

−p

and |v

− v

| < 2

−p

. (16)

Following Figure 4, the type t of e is

t =



Πs

: int, u

: int, p

: int, s

: int, u

: int, p

: int.

real{s

, u

, p

} → real{s

, u

, p

} →

→ real{S

∗

, u

, s

, u

), U

∗

, u

, s

, u

), P

∗

, u

, p

, s

, u

, p

)}



= real{S

∗

, u

, s

, u

), U

∗

, u

, s

, u

), P

∗

, u

, p

, s

, u

, p

)}

= real{s, u, p}

By Rule (OP), e →

and e →

and, by Theorem 1, with the assumptions of

Equation (16), we know that |v

− v

| < 2

u−p+1

. Consequently, Γ |= (v

, v

) :

real{s, u, p}.

• If e

≡ v

∗ v

and e

≡ v

∗ v

with Γ |= (v

, v

) : int then, by Rule

(Op), e → (v, v) and, by Equation (7), Γ ` v : int. If e ≡ e

∗ e

then, by Rule

(Op1), e → e

∗ e

and we conclude by induction hypothesis. The case e ≡ e

∗ v

similar to the former one.

– If e

≡ e

u,p

and e

≡ e

u,p

then several cases have to be examined.

• If e

≡ v

u,p

and e

≡ v

u,p

then by rules (FCmp) and (RCmp)

→

, e

→

with b

= v

− v

{u,p}

u−p+1

and b

= v

− v

{u,p}

0. By rule (RCmp) of Figure 7, Γ |= (v

, v

) : real{s, u, p} and Γ |= (v

, v

) :

real{s, u, p}. Consequently, |v

− v

| < 2

u−p+1

and |v

− v

| < 2

u−p+1

. By

combining the former equations, we obtain that |(v

− v

) − (v

− v

)| < 2

u−p

Consequently, b

= b

and we conclude that Γ |= (b

, b

) : bool.

• The other cases for e

≡ e

u,p

are similar to the cases e

≡ v

∗ v

examined

previously.

– The other cases simply follow the structure of the terms, by application of the induction

hypothesis. 

Let →

∗

(resp. →

∗

) denote the reﬂexive transitive closure of →

(resp. →

). Theo-

rem 2 expresses the soundness of our type system for sequences of reduction of arbitrary

length.

Theorem 2 (Subject reduction). If Γ |= (e

, e

) : t and if e

→

∗

and e

→

∗

then Γ |= (e

, e

) : t.

Proof. By induction on the length of the reduction sequence, using Lemma 3. 

Theorem 2 asserts the soundness of our type system. It states that the evaluation of

an expression of type real{s, u, p} yields a result of accuracy 2

u−p+1

190

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

190

let rec unifyReal s

= match (!s

,!u

,!p

) with

(int(s

),int(u

),int(p

)) →

(match (!s

,!u

,!p

) with

(int(s

),int(u

),int(p

)) →

let s = if (s

) then s

else 2 in

let u = max u

let p = if (u

>=u

) then min p

- u

+ p

)

else min p

- u

+ p

)

in if (p>0) then

:= int(s) ; s

:= int(s) ; u

:= int(u) ;

:= int(u) ; p

:= int(p) ; p

:= int(p))

else raise (Error ("Type "

ˆ(printExpr (TFloat(s

)))ˆ" is not compatible

with type "ˆ(printExpr (TFloat(s

))) ) )

| (TypeVar(refS,strS),TypeVar(refU,strU),

TypeVar(refP,strP)) → refS := Some(!s

) ;

refU := Some(!u

) ; refP := Some(!p

)

| _ → solveLT !s

)

| (TypeVar(refS,strS),TypeVar(refU,strU),

TypeVar(refP,strP)) →

((match !refS with

None → refS := Some(!s

)

| Some(s

) → unify s

) ;

(match !refU with

None → refU := Some(!u

)

| Some(u

) → unify u

) ;

(match !refP with

None → refP := Some(!p

)

| Some(p

) → unify p

)

| _ → (match (!s

,!u

,!p

) with

(TypeVar(refS,strS),TypeVar(refU,strU),

TypeVar(refP,strP)) →

similar to previous case

| _ → if ((s

) && (u

) && (p

)) then ()

else solve !s

)

Fig. 8. Uniﬁcation procedure for types real.

5 Type System Implementation

In this section, we give some details about the implementation of our type system in

Numl. Section 5.1 deals with the uniﬁcation algorithm and Section 6 presents examples

of typable programs in complement to the introductory examples of Section 2.

191

Numl - A Strongly Typed Language for Numerical Accuracy

191

Fig. 9. The supremum operator t of Equation (17).

5.1 Uniﬁcation Algorithm

In this section, we describe how the type system introduced in Section 3 is implemented.

Basically, we use a uniﬁcation-based type inference in which type variables are repre-

sented by reference cells. The type real also stores the format {s, u, p} into reference

cells, so that it can be modiﬁed when unifying two terms of type real.

The type inference and uniﬁcation algorithms are classical excepted for the uniﬁ-

cation of two real types, done by the function unifyReal displayed in Figure 8

and which requires, in certain cases, a call to a SMT solver (in practice we use Z3

[18]). The function unifyReal takes as arguments the formats φ

= {s

, u

, p

} and

= {s

, u

, p

} of the types to be uniﬁed.

The function unifyReal calls in a mutually recursive way the function unify

on terms. It also refers to type variables correspndig the constructor TypeVar. The

ﬁelds of TypeVar are the value itself and a string corrsponding to the name of the

variable. The value may be either None when the type variable is not constrained or

some reference to an expression when a type has been given to the variable by uniﬁca-

tion. The function solve performs a partial evaluation of the expressions occurring in

the equations, in order to simplify them, translates them for Z3, calls the SMT solver

and then assign the values of the solution to the relevant type variables. The function

solveLT acts just like the function solve but requires that the precision of the sec-

ond expression is greater than or equal to the precision of the ﬁrst expression instead of

a strict equality. Several cases are distinguished in the function unifyReal of Figure

– If φ

and φ

are fully instantiated, i.e. s

, u

and p

, 1 ≤ i ≤ 2 are integers then we

assign φ = φ

tφ

to φ

and φ

. The supremum t refers to the order v introduced

in Equation (6). Formally, we have:

t φ



] s

, max(u

, u

), p



with p =



min(p

, u

− u

+ p

) if u

≥ u

min(p

, u

− u

+ p

) otherwise .

(17)

In Equation (17), ] computes the supremum of to values of Sign. Figure 9 illus-

trates the effect of the operator t.

– If φ

is fully instantiated and φ

is made of three type variables then φ

is assigned

to φ

192

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

192

– If φ

is fully instantiated and φ

is neither fully instantiated or a triple of type

variables then φ − 2 is made of three integer expressions containing type variables,

= {e

, e

}. We have to solve the system

(S) :







= e

. (18)

We call the SMT solver Z3 to solve this system of equation. Recall that e

, e

and

come from the types of primitives introduced in Figure 4. These expressions are

linear and are easy to solve for an SMT solver.

– If both φ

and φ

are made of type variables then we identify them in a pairwise

manner.

– If both φ

and φ

are integer expressions then φ

= {e

, e

} and φ

= {e

, e

We have to solve the system

(S) :







= e

. (19)

Again, we call Z3 to solve this system of linear equations.

– The other cases are symmetric to the ones detailled formerly, they are treated simi-

larly.

For example, the equations sent to the SMT solver for the call newton 9.0 0.0

g gprime ;; to the newton function of Section 2 are given in Equation (20).

(S) :











(

a, max(

b, −7) + S

(

a, 1), 1, −7) = 2

(max(

b, −7) + S

(

b, 1, −7), 1) = 10

max(max(

b, −7) + S

(

b, 1, −7), 1), −7)

(

a, max(

b, −7) + S

(

a, 1), 1, −7), 1)

− max(max(

b, −7) + S

(

b, 1, −7), 1) −

c, −60)

−ι(max(

b, −7) + S

(

b, 1, −7), 1) −

c, −60) ≤ 21

(20)

These equations are encoded in Z3 by expanding the operators max, S

, S

, and ι

following the deﬁnitions of Figure 4. For example, the Z3 encoding of the ﬁrst equality

of Equation (20) is displayed in Figure 10. Globally, the encoding of the three equations

of Equation (20) is a 1007 lines long Z3 ﬁle. Z3 solves these equations in 0.215 seconds

(average measured time on 5 executions).

6 Experiments

In this section, we report some experiments showing how our type system behaves in

practice. Section 6.1 presents Numl implementations of usual mathematical formulas

while Section 6.2 introduce a larger example demonstrating the expressive power of our

type system.

193

Numl - A Strongly Typed Language for Numerical Accuracy

193

(assert (= 2

(ite (and (= a 0) (= 1 0)) 0

(ite (and (= a 1) (= 1 1)) 1

(ite (and (= a -1) (= 1 -1)) -1

(ite (and (= a 1) (= 1 0)) 1

(ite (and (= a 0) (= 1 1)) 1

(ite (and (= a -1) (= 1 0)) -1

(ite (and (= a 0) (= 1 -1)) -1

(ite (and (= a 1) (= 1 -1))

(ite (> (+ (ite (>= b -7) b -7)

(ite (and (= a 0) (= 1 0)) 0

(ite (and (= a 1) (= 1 -1)) -1

(ite (and (= a -1) (= 1 1)) -1

(ite (and (= a -1) (= 1 -1)) 1

(ite (and (= a 1) (= 1 1)) 1

(ite (and (= a 1) (= 1 0)) 0

(ite (and (= a 0) (= 1 1)) 0

(ite (and (= a -1) (= 1 0)) 0

(ite (and (= a 0) (= 1 -1)) 0

2)))))))))) -7) 1

(ite (< (+ (ite (>= b -7) b -7)

(ite (and (= a 0) (= 1 0)) 0

(ite (and (= a 1) (= 1 -1)) -1

(ite (and (= a -1) (= 1 1)) -1

(ite (and (= a -1) (= 1 -1)) 1

(ite (and (= a 1) (= 1 1)) 1

(ite (and (= a 1) (= 1 0)) 0

(ite (and (= a 0) (= 1 1)) 0

(ite (and (= a -1) (= 1 0)) 0

(ite (and (= a 0) (= 1 -1)) 0

2)))))))))) -7) -1 2))

(ite (and (= a -1) (= 1 1))

(ite (< (+ (ite (>= b -7) b -7)

(ite (and (= a 0) (= 1 0)) 0

(ite (and (= a 1) (= 1 -1)) -1

(ite (and (= a -1) (= 1 1)) -1

(ite (and (= a -1) (= 1 -1)) 1

(ite (and (= a 1) (= 1 1)) 1

(ite (and (= a 1) (= 1 0)) 0

(ite (and (= a 0) (= 1 1)) 0

(ite (and (= a -1) (= 1 0)) 0

(ite (and (= a 0) (= 1 -1)) 0

2)))))))))) -7) 1

(ite (> (+ (ite (>= b -7) b -7)

(ite (and (= a 0) (= 1 0)) 0

(ite (and (= a 1) (= 1 -1)) -1

(ite (and (= a -1) (= 1 1)) -1

(ite (and (= a -1) (= 1 -1)) 1

(ite (and (= a 1) (= 1 1)) 1

(ite (and (= a 1) (= 1 0)) 0

(ite (and (= a 0) (= 1 1)) 0

(ite (and (= a -1) (= 1 0)) 0

(ite (and (= a 0) (= 1 -1)) 0

2)))))))))) -7) -1 2))

2)))))))))))

Fig. 10. Z3 encoding of the ﬁrst equality of Equation (20).

6.1 Usual Mathematical Formulas

Our ﬁrst examples concern usual mathematical formulas, to compute the volume of

geometrical objects or formulas related to polynomials. These examples aim at showing

that usual mathematical formulas are typable in our system. We start with the volume

of the sphere and of the cone.

> let sphere r = (4.0 / 3.0)

3.1415926{+,1,20}

r ;;

val sphere : real{’a,’b,’c} -> real{<expr>,<expr>,<expr>} = <fun>

> sphere 1.0 ;;

- : real{+,7,20} = 4.188

> let cone r h = (3.1415926{+,1,20}

h) / 3.0 ;;

val cone : real{’a,’b,’c} -> real{’a,’b,’c}

-> real{<expr>,<expr>,<expr>} = <fun>

> cone 1.0 1.0 ;;

- : real{+,4,20} = 1.0472

We repeatedly deﬁne the function sphere with more precision in order to show the

impact on the accuracy of the results. Note that the results now have 15 digits instead

of the former 5 digits.

194

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

194

> let sphere r = (4.0 / 3.0)

3.1415926535897932{+,1,53}

r ;;

val sphere : real{’a,’b,’c} -> real{<expr>,<expr>,<expr>} = <fun>

> sphere 1.0 ;;

- : real{+,7,52} = 4.1887902047863

The next examples concern polynomials. We start with the computation of the dis-

criminant of a second degree polynomial.

> let discriminant a b c = b

b - 4.0

c ;;

val discriminant : real{’a,’b,’c} -> real{’d,’e,’f} -> real{’g,’h,’i}

-> real{<expr>,<expr>,<expr>} = <fun>

> discriminant 2.0 -11.0 15.0 ;;

- : real{+,8,52} = 1.000000000000

Our last example concerning usual formulas is the Taylor series development of

the sine function. In the code below, observe that the accuracy of the result is corre-

lated to the accuracy of the argument. As mentioned in Section 2, error methods are

neglected, only the errors due to the ﬁnite precision are calculated (indeed, sin

0.382683432 . . .).

let sin x = x - ((x

x) / 3.0) + ((x

x) / 120.0) ;;

val sin : real{’a,’b,’c} -> real{<expr>,<expr>,<expr>} = <fun>

> sin (3.14{1,6} / 8.0) ;;

- : real{

,0,6} = 0.3

> sin (3.14159{1,18} / 8.0) ;;

- : real{

,0,18} = 0.37259

6.2 Newton-Raphson Method

In this section, we introduce a larger example to compute the zero of a function using the

Newton-Raphson method. This example, which involves several higher order functions,

shows the expressiveness of our type system. In the programming session below, we

ﬁrst deﬁne a higher order function deriv which takes as argument a function and

computes its numerical derivative at a given point. Then we deﬁne a function g and

compute the value of its derivative at point 2.0. Next, by partial application, we build

a function computing the derivative of g at any point. Finally, we deﬁne a function

newton which searches the zero of a function. The newton function is also an higher

order function taking as argument the function for which a zero has to be found and its

derivative.

> let deriv f x h = ((f (x + h)) - (f x)) / h ;;

val deriv : (real{<expr>,<expr>,<expr>} -> real{’a,’b,’c})

-> real{<expr>,<expr>,<expr>} -> real{’d,’e,’f}

-> real{<expr>,<expr>,<expr>} = <fun>

> let g x = (x

x) - (5.0

x) + 6.0 ;;

val g : real{’a,’b,’c} -> real{<expr>,<expr>,<expr>} = <fun>

> deriv g 2.0 0.01 ;;

195

Numl - A Strongly Typed Language for Numerical Accuracy

195

- : real{

,5,51} = -0.9900000000000

> let gprime x = deriv g x 0.01 ;;

val gprime : real{<expr>,<expr>,<expr>} -> real{<expr>,<expr>,<expr>} = <fun>

> let rec newton x xold f fprime = if ((abs (x-xold))<0.01{

,10,20}) then x

else newton (x-((f x)/(fprime x))) x f fprime ;;

val newton : real{

,10,21} -> real{0,10,20} -> (real{

,10,21} -> real{’a,’b,’c})

-> (real{

,10,21} -> real{’d,’e,’f}) -> real{

,10,21} = <fun>

> newton 9.0 0.0 g gprime ;;

- : real{

,10,21} = 3.0001

We call the newton function with our function g and its derivative computed by

partial application of the deriv function. We obtain a root of our polynomial g with a

guaranteed accuracy. Note that while Newton-Raphson method converges quadratically

in the reals, numerical errors may perturb the process [6].

7 Case of the IEEE754 Floating-Point Arithmetic

In this section, we introduce modiﬁed versions of the types of primitives introduced in

Section 3.2. These modiﬁed versions are speciﬁc to the IEEE754 ﬂoating-point arith-

metic [1]. The types introduced in Figure 4 for the primitives corresponding to the ele-

mentary operations +, −, × and ÷ are not tailored for a speciﬁc arithmetic. They only

assume that the system has enough bits to perform the operations in the format given by

the types so that the results of the operations are not rounded. Numl interpreter fulﬁlls

this requirement by performing all the numerical computations in multiple precision,

using the GNU Multiple Precision Arithmetic library GMP. Indeed, the type inference

enables to determine a priori the precision needed by GMP for the values and arithmetic

operations. An optimization would consists of also detecting when the computations ﬁt

into hardware formats (generally the formats of the IEEE754 arithmetic introduced in

Figure 1) in order to avoid the calls to GMP when possible. The type information also

permits to generate code for the ﬁxed-point arithmetic [12]. In this case, if the precision

of the formats corresponds to the types, no additional roundoff errors have to be added

and the general equations of Figure 4 hold again. In future work, we plan to develop a

compiler for our language (in addition to the current interpreter) which, based on the

formats given by the types, generates code using either the IEEE754 or the multiple

precision arithmetic (only when necessary). This compiler would also generate code

for the ﬁxed-point arithmetic.

In practice, in many cases, one wants to use the IEEE754 ﬂoating-point arithmetic

and not multiple precision libraries, for efﬁciency reasons or because these library are

not available in certain contexts. In this case, the values and the results of the operations

do not necessarily ﬁt inside the IEEE754 formats of Figure 1, they must be rounded. The

IEEE754 Standard deﬁnes ﬁve rounding modes for elementary operations over ﬂoating-

point numbers. These modes are towards −∞, towards +∞, towards zero, to the nearest

ties to even and to the nearest ties to away and we write them ◦

−∞

, ◦

+∞

, ◦

∼

and

◦

∼

, respectively. The semantics of the elementary operations ∗ ∈ {+, −, ×, ÷} is

then deﬁned by

∗

◦

= ◦ (f

∗ f

) (21)

196

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

196

% ∈ {11, 24, 53, 113}

, u

, p

, s

, u

, p

) =

, u

) + σ

, s

) − max





− p

, u

, s

, u

) − %





− ι





− p

, u

, s

, u

) − %





−

, u

, p

, s

, u

, p

) =

, u

) + σ

−

, s

) − max





− p

−

, u

, s

, u

) − %





− ι





− p

−

, u

, s

, u

) − %





, u

, p

, s

, u

, p

) =

+ u

+ 1 − max





+ u

+ 1 − p

− u

+ 1 − p

− u

+ 1 − %





− ι





+ u

+ 1 − p

− u

+ 1 − p

− u

+ 1 − %





, u

, p

, s

, u

, p

) = P

, p

, u

, p

)

(x, y, z) =



1 if x = y ∨ x = z ∨ y = z,

0 otherwise.

Fig. 11. Types of the IEEE754 ﬂoating-point arithmetic operators in precision %.

where ◦ ∈ {◦

−∞

, ◦

+∞

, ◦

∼

, ◦

∼

} denotes the rounding mode. Equation (21) states

that the result of a ﬂoating-point operation ∗

◦

done with the rounding mode ◦ returns

what we would obtain by performing the exact operation ∗ and next rounding the result

using ◦. The IEEE754 Standard also speciﬁes how the square root function must be

rounded in a similar way to Equation (21) but does not specify the roundoff of other

functions like sin, log, etc.

In the IEEE754 arithmetic, additional errors arise compared to the general context

of Section 3.2 and the types of the primitives of Figure 4 must be modiﬁed to correctly

model the errors of this speciﬁc arithmetic. The types of the IEEE754 primitives in

precision % ∈ {11, 24, 53, 113}, i.e. in half, single, double or quadruple precision, is

given in Figure 11. We assume that the rounding mode is ∼∈ {∼

, ∼

} (to the nearest.)

These equations model the fact that the accuracy of the result is dominated by either the

error on ﬁrst operand or on the second operand or on the rounding of the result in

precision %. For example, the error on x +

∼

y is e

= ε(x) + ε(y) + ◦(x + y) with, by

Equation (21),

197

Numl - A Strongly Typed Language for Numerical Accuracy

197

◦(x + y) <

ulp(x + y) =

ufp(x + y) − % . (22)

The types of the other operators are obtained in a similar way to the addition. Let us

also note that in the IEEE754 ﬂoating-point arithmetic the constants may no longer be

in any precision. They must ﬁt one of the formats given the standard.

8 Related Work

Several approaches have been proposed to determine the best ﬂoating-point formats

as a function of the expected accuracy on the results. Darulova and Kuncak use a for-

ward static analysis to compute the propagation of errors [8]. If the computed bound on

the accuracy satisﬁes the post-conditions then the analysis is run again with a smaller

format until the best format is found. Note that in this approach, all the values have the

same format (contrarily to our framework where each control-point has its own format).

While Darulova and Kuncak develop their own static analysis, other static techniques

[11, 24] could be used to infer from the forward error propagation the suitable formats.

Chiang et al. [5] have proposed a method to allocate a precision to the terms of an arith-

metic expression (only). They use a formal analysis via Symbolic Taylor Expansions

and error analysis based on interval functions. In spite of our linear constraints, they

solve a quadratically constrained quadratic program to obtain annotations.

Other approaches rely on dynamic analysis. For instance, the Precimonious tool

tries to decrease the precision of variables and checks whether the accuracy require-

ments are still fulﬁlled [19, 23]. Lam et al instrument binary codes in order to modify

their precision without modifying the source codes [14]. They also propose a dynamic

search method to identify the pieces of code where the precision should be modiﬁed.

Finally other work focus on formal methods and numerical analysis. A ﬁrst related

research direction concerns formal proofs and the use of proof assistants to guaranty the

accuracy of ﬁnite-precision computations [3, 13, 15]. Another related research direction

concerns the compile-time optimization of programs in order to improve the accuracy

of the ﬂoating-point computation in function of given ranges for the inputs, without

modifying the formats of the numbers [7, 20].

9 Conclusion

In this article, we have introduced a dependent type system able to infer the accuracy

of numerical computations. Our type system allows one to type non-trivial programs

corresponding to implementations of classical numerical analysis methods. Unstable

computations are rejected by the type system. The consistency of typed programs is

ensured by a subject reduction theorem. To our knowledge, this is the ﬁrst type system

dedicated to numerical accuracy. We believe that this approach has many advantages

going from early debugging to compiler optimizations. Indeed, we believe that the usual

type float proposed by usual ML implementations, and which is a simple clone of the

type int, is too poor for numerical computations. We also believe that this approach

is a credible alternative to static analysis techniques for numerical precision [8, 11, 24].

198

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

198

For the developer, our type system introduces few changes in the programming style,

limited to giving the accuracy of the inputs of the accuracy of comparisons to allow the

typing of certain recursive functions.

A ﬁrst perspective to the present work is the implementation of a compiler for

Numl. We aim at using the type information to select the most appropriate formats

(the IEEE754 formats of Figure 1, multiple precisions numbers of the GMP library

when needed or requested by the user or ﬁxed-point numbers.) In the longer term, we

also aim at introducing safe compile-time optimizations based on type preservation:

an expression may be safely (from the accuracy point of view) substituted to another

expression as long as both expressions are mathematically equivalent and that the new

expression has a greater type than the older one in the sense of Equation (6). Finally,

a second perspective is to integrate our type system into other applicative languages.

In particular, it would be of great interest to have such a type system inside a language

used to build critical embedded systems such as the synchronous language Lustre

[4]. In this context numerical accuracy requirements are strong and difﬁcult to obtain.

Our type system could be integrated naturally inside Lustre or similar languages.

References

1. ANSI/IEEE: IEEE Standard for Binary Floating-point Arithmetic (2008)

2. Atkinson, K.: An Introduction to Numerical Analysis, 2nd Edition. Wiley (1989)

3. Boldo, S., Jourdan, J., Leroy, X., Melquiond, G.: Veriﬁed compilation of ﬂoating-point com-

putations. J. Autom. Reasoning 54(2), 135–163 (2015)

4. Caspi, P., Pilaud, D., Halbwachs, N., Plaice, J.: Lustre: A declarative language for program-

ming synchronous systems. In: POPL. pp. 178–188. ACM Press (1987)

5. Chiang, W., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., Rakamaric, Z.:

Rigorous ﬂoating-point mixed-precision tuning. In: POPL. pp. 300–315. ACM (2017)

6. Damouche, N., Martel, M., Chapoutot, A.: Impact of accuracy optimization on the conver-

gence of numerical iterative methods. In: LOPSTR’15. LNCS, vol. LNCS 9527, pp. 1–18.

Springer (2015)

7. Damouche, N., Martel, M., Chapoutot, A.: Improving the numerical accuracy of programs

by automatic transformation. STTT 19(4), 427–448 (2017)

8. Darulova, E., Kuncak, V.: Sound compilation of reals. In: POPL’14. pp. 235–248. ACM

(2014)

9. Denis, C., de Oliveira Castro, P., Petit, E.: Veriﬁcarlo: Checking ﬂoating point accuracy

through monte carlo arithmetic. In: ARITH’16. pp. 55–62. IEEE (2016)

10. Franco, A.D., Guo, H., Rubio-Gonz

alez, C.: A comprehensive study of real-world numerical

bug characteristics. In: ASE. pp. 509–519. IEEE (2017)

11. Goubault, E.: Static analysis by abstract interpretation of numerical programs and systems,

and FLUCTUAT. In: SAS. LNCS, vol. 7935, pp. 1–3. Springer (2013)

12. Graphics, M.: Algorithmic C Datatypes, software version 2.6 edn. (2011), http://www.

mentor.com/esl/catapult/algorithmic

13. Harrison, J.: Floating-point veriﬁcation. J. UCS 13(5), 629–638 (2007)

14. Lam, M.O., Hollingsworth, J.K., de Supinski, B.R., LeGendre, M.P.: Automatically adapting

programs for mixed-precision ﬂoating-point computation. In: Supercomputing, ICS’13. pp.

369–378. ACM (2013)

15. Lee, W., Sharma, R., Aiken, A.: On automatically proving the correctness of math.h imple-

mentations. PACMPL 2(POPL), 47:1–47:32 (2018)

199

Numl - A Strongly Typed Language for Numerical Accuracy

199

16. Martel, M.: Floating-point format inference in mixed-precision. In: NFM. LNCS, vol. 10227,

pp. 230–246 (2017)

17. Milner, R., Harper, R., MacQueen, D., Tofte, M.: The Deﬁnition of Standard ML. MIT Press

(1997)

18. de Moura, L.M., Bjørner, N.: Z3: an efﬁcient SMT solver. In: TACAS. LNCS, vol. 4963, pp.

337–340. Springer (2008)

19. Nguyen, C., Rubio-Gonzalez, C., Mehne, B., Sen, K., Demmel, J., Kahan, W., Iancu, C.,

Lavrijsen, W., Bailey, D.H., Hough, D.: Floating-point precision tuning using blame analysis.

In: ICSE. ACM (2016)

20. Panchekha, P., Sanchez-Stern, A., Wilcox, J.R., Tatlock, Z.: Automatically improving accu-

racy for ﬂoating point expressions. In: PLDI. pp. 1–11. ACM (2015)

21. Pierce, B.C.: Types and programming languages. MIT Press (2002)

22. Pierce, B.C. (ed.): Advanced Topics in Types and Programming Languages. MIT Press

(2004)

23. Rubio-Gonzalez, C., Nguyen, C., Nguyen, H.D., Demmel, J., Kahan, W., Sen, K., Bailey,

D.H., Iancu, C., Hough, D.: Precimonious: tuning assistant for ﬂoating-point precision. In:

HPCNSA. pp. 27:1–27:12. ACM (2013)

24. Solovyev, A., Jacobsen, C., Rakamaric, Z., Gopalakrishnan, G.: Rigorous estimation of

ﬂoating-point round-off errors with symbolic taylor expansions. In: FM. LNCS, vol. 9109,

pp. 532–550. Springer (2015)

200

EPS Portugal 2017/2018 2017 - OPPORTUNITIES AND CHALLENGES for European Projects

200