Box Constrained Low-rank Matrix Approximation with Missing Values

Manami Tatsukawa

and Mirai Tanaka

Department of Industrial Engineering and Economics, Tokyo Institute of Technology,

2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan

Department of Mathematical Analysis and Statistical Inference, The Institute of Statistical Mathematics,

10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan

Keywords:

Low-Rank Matrix Approximation, Missing Data, Matrix Completion with Noise, Principal Component

Analysis with Missing Values, Collaborative Filtering, Block Coordinate Descent Method.

Abstract:

In this paper, we propose a new low-rank matrix approximation model for completing a matrix with missing

values. Our proposed model contains a box constraint that arises from the context of collaborative ﬁltering.

Although it is unfortunately NP-hard to solve our model with high accuracy, we can construct a practical

algorithm to obtain a stationary point. Our proposed algorithm is based on alternating minimization and

converges to a stationary point under a mild assumption.

1 INTRODUCTION

1.1 Background

Low-rank matrix approximation is commonly used

for feature extraction. This technique embeds high-

dimensional data into a lower dimensional space, be-

cause relevant data in a high-dimensional space often

lie in a lower dimensional space. Feature extraction

enables us to identify potential features that may help

to increase prediction accuracy when data are incom-

plete or contain some noise. We provide two exam-

ples of techniques for approximating a data matrix

with missing values.

1.1.1 PCA with Missing Value

Principal component analysis (PCA) is a classical

technique used for extracting features. This tech-

nique embeds high-dimensional data into lower di-

mensional space. The components embedded into

lower dimensional space are called principal compo-

nents. To handle incompleteness of input data we

often use nonlinear models; however, these models

cause problems such as overﬁtting and bad locally

optimal solutions. Tipping and Bishop (1999) intro-

duced a probabilistic formulation of PCA. The prob-

abilistic PCA is known to provide a good foundation

for handling missing values (Ilin and Raiko, 2010).

Probabilistic PCA is often solved by an expectation-

maximization (EM) algorithm.

1.1.2 Collaborative Filtering

Low-rank matrix approximation is utilized in recom-

mendation systems such as those found on iTunes,

Amazon, and Netﬂix. In these services, music, book,

and movie recommendations are provided to users.

Users rate items they have listened to, bought, or

watched. Based on the ratings, items are recom-

mended based on the user’s preferences or the items’

novelty to the user.

Let us consider the following situation: There are

m users and n items. Every user rates some of items

on a scale of one to ﬁve. One is the lowest score and it

means that a user does not prefer the item. We repre-

sent this with an m ×n matrix. Each row of the matrix

represents each user and each column represents each

item. We show an example of 3 × 5 matrix below:

V =





a b c d e

A 2 5 ? 4 1

B 1 ? 1 3 2

C ? 1 4 ? 5





. (1)

Matrix V

V represent that, for instance, user A rates

item b ﬁve and does not rate item c, using the sym-

bol “?”. In what follows, we use the symbol “?” for

representing missing values. In the example shown

in Equation (1), users A and B seems to have similar

trends. This means that an item that user A rates high

may also be preferred by user B. For example, item b,

which user A rates ﬁve, may also be rated high by

Tatsukawa, M. and Tanaka, M.

Box Constrained Low-rank Matrix Approximation with Missing Values.

DOI: 10.5220/0006612100780084

In Proceedings of the 7th International Conference on Operations Research and Enterprise Systems (ICORES 2018), pages 78-84

ISBN: 978-989-758-285-1

user B. If we complete the missing value of the ma-

trix, we can predict how users prefer items that they

have not evaluate yet. This technique of predicting is

can be used for recommendation systems.

A technique used in recommendation systems is

collaborative ﬁltering (CF). CF algorithms have three

main categories: memory-based, model-based, and

hybrid (Su and Khoshgoftaar, 2009). Memory-based

CF algorithms calculate similarities between users or

items to predict users’ preferences. Model-based CF

algorithms learn a model in order to make predic-

tions. Hybrid CF algorithms combine several CF

techniques.

In memory-based CF algorithms, similarities be-

tween users or items are used to make predictions. As

in measures of similarity, the vector cosine correlation

and the Pearson correlation are often used. However,

when many values are missing, it is difﬁcult to com-

pute similarities between users. In fact, the number

of items might be greater than the number of users

and each user may evaluate only a small number of

items. For this reason, many items are evaluated by

only a few users, while other users do not submit any

evaluations.

To overcome the weakness in memory-based CF

algorithms, model-based CF algorithms have been

investigated. Model-based CF approaches use data

mining or machine learning algorithms. One of the

techniques of model-based CF is dimensionality re-

duction, such as PCA or singular value decomposition

(SVD). As we mentioned above, high-dimensional

data is thought to be expressed by lower dimensional

data because related data lie in a lower dimensional

space.

1.2 Related Work

Low-rank matrix completion and approximation have

been studied. In this section, we brieﬂy introduce

some papers about these techniques.

Let us consider completing a matrix V

V ∈ (R ∪

{?})

m×n

with missing values, where the symbol ? in-

dicates that the corresponding value is missing. That

is, V

i j

= ? indicates V

i j

is missing. Cand

es and Recht

(2009) proposed rank minimization to complete ma-

trix V

V . They considered the following problem:

minimize rank(X

subject to X

i j

= V

i j

((i, j) ∈ Ω),

(2)

where Ω is the set of indices of observed entries, i.e.,

Ω = {(i, j) : V

i j

∈ R}. Problem (2) is NP-hard be-

cause it contains the l

-norm minimization problem.

This difﬁculty essentially arises from the nonconvex-

ity and discontinuity of the rank function. Hence, they

introduced nuclear norm minimization as a convex re-

laxation of Problem (2). Nuclear norm minimization

can be recast as a semideﬁnite optimization problem

(SDP). There are many efﬁcient algorithms and high-

quality software packages available for solving SDP,

including the interior-point method. However, the

computation time for solving SDP is very sensitive

to instance size and is unsuitable for solving large in-

stances

Olsson and Oskarsson (2009); Gillis and Glineur

(2011) studied the following problem to complete V

V :

minimize

∑

i=1

∑

j=1

i j

−V

i j

)

subject to rank(X

X) ≤ r,

(3)

where decision variable X

X ∈ R

m×n

is a completed ma-

trix of V

V and W

W ∈ {0,1}

m×n

is a given weight ma-

trix corresponding to an observation, i.e., W

i j

= 1 for

i j

∈ R and otherwise W

i j

= 0. Using Ω, we obtain

the following equivalent formulation of Problem (3):

minimize

∑

(i, j)∈Ω

i j

−V

i j

)

subject to rank(X

X) ≤ r.

(4)

Our formulation is similar to this one and this model

is helpful to understand our model. Olsson and Os-

karsson (2009) proposed a heuristic based on an ap-

proximated continuous (but nonconvex) formulation

of Problems (3). Gillis and Glineur (2011) proved the

NP-hardness of Problem (3), and equivalently, Prob-

lem (4).

On the other hand, the low-rank matrix approxi-

mation problem is easily solved when no values are

missing. In fact, when Ω is an entire set of indices,

Problem (4) is equivalent to the following problem:

minimize kX

X −V

V k

subject to rank(X

X) ≤ r,

where k · k

denotes the Frobenius norm deﬁned by

∑

i=1

∑

j=1

i j

for A

A ∈ R

m×n

. This problem is nonconvex; how-

ever, a global optimal solution is obtained by the

truncated SVD of V

V . Speciﬁcally, it is well known

that an optimal solution of this problem can be writ-

ten as

∑

l=1

(Trefethen and Bau, 1997, The-

orem 5.9), where σ

, p

, and q

respectively repre-

sent the l-th largest singular value and correspond-

ing singular vectors of V

V . The computation of all

singular values and the vectors of V

V is expensive.

More speciﬁcally, it requires a computation time in

O(min{m

n,mn

}), which can be too heavy for a

Box Constrained Low-rank Matrix Approximation with Missing Values

large instance. Instead, the r largest singular val-

ues and corresponding singular vectors can be quickly

computed by an iterative method if r is small.

When we use the truncated SVD for a matrix con-

taining missing values, we need to complete the ma-

trix. One method for doing this involves completing

the input matrix with the average of the non-missing

entries in the same rows or columns (Sarwar et al.,

2000). Speciﬁcally, their method comprises the fol-

lowing steps:

1. Temporarily ﬁll in the missing values of incom-

plete input matrix V

V with the average of the non-

missing entries in the same columns. We call

the completed matrix

V . That is,

i j

= (1/|{i

, j) ∈ Ω}|)

∑

:(i

, j)∈Ω

for (i, j) 6∈ Ω.

2. Compute row average vector µ

µ. The i-th ele-

ment µ

of µ

µ is the average of the i-th row of V

V .

That is, µ

= (1/|{ j : (i, j) ∈ Ω}|)

∑

j:(i, j)∈Ω

i j

3. Compute the best rank r approximation matrix V

V − µ

µ1

by using the truncated SVD, where 1

is a vector of all ones.

4. Return V

+ µ

µ1

as a low-rank approximation of

input matrix V

V .

1.3 Our Contribution and Structure of

This Paper

The remainder of this paper is organized as follows:

Section 2 proposes a new model for low-rank matrix

approximation contains a box constraint that arises

from the context of CF. The rank minimization mod-

els was previously proposed by Cand

es and Recht

(2009) and Olsson and Oskarsson (2009). Their

model can be recast as an SDP. There are many soft-

ware packages available for solving SDP, but SDP

is unsuitable for solving large instances. Our model

includes Problem (4), which is proved NP-hard by

Gillis and Glineur (2011). However, we can solve

our model by truncated SVD. Moreover, our model

is more suitable for CF than previous models owing

to the box constraint. We need not our model recast

as an SDP and suitable for large instances. Section 3

proves the NP-hardness of our proposed model. Sec-

tion 4 proposes an algorithm for solving our proposed

model and proves the convergence of the algorithm.

Section 5 reports preliminary numerical results to as-

sess our proposed model and algorithm. Section 6

presents concluding remarks.

2 MODEL

In this section, we propose a new model for low-rank

matrix approximation that is easier to calculate. Here,

matrix X

X ∈ R

m×n

denotes a completed matrix of ma-

trix V

V ∈ (R ∪ {?})

m×n

with missing values. We con-

sider minimizing the difference between V

V and X

This idea is described in previous studies.

We consider imposing a box constraint L

L ≤

X ≤ U

U, where L

L ∈ (R ∪ {−∞})

m×n

and U

U ∈ (R ∪

{+∞})

m×n

satisfy L

L ≤ U

U and the inequalities are en-

trywise. For example, L

L ≤ X

X means L

i j

≤ X

i j

for all

i, j. The box constraint is important when applying

low-rank matrix approximation with missing values

to CF, because users evaluate items in some range.

For example, an Amazon user evaluates an item by

assigning it one to ﬁve stars. We model such an evalu-

ation value range as the box constraint. Matrices L

L,U

represent the lower and upper bounds of an evaluation

value range, respectively. This constraint is helpful in

making exact or approximate predictions and in elim-

inating outliers.

To prevent a case in which both the rank constraint

and the box constraint are not simultaneously ful-

ﬁlled, we consider variables that fulﬁl each constraint

separately. For this reason, we introduce another vari-

able, Y

Y ∈ R

m×n

, and impose the box constraint on Y

Y .

We then add the squared Frobenius norm kX

X − Y

Y k

of the difference between X

X and Y

Y to the objective

function as a penalty.

Summarizing the previous argument, we formu-

late low-rank matrix approximation with missing val-

ues as the following optimization problem:

minimize kX

X −Y

Y k

+ λ

∑

(i, j)∈Ω

i j

−V

i j

)

subject to rank(X

X) ≤ r,

L ≤ Y

Y ≤ U

(5)

where λ is a parameter to determine the weight of the

two objectives. When we use large λ, the second part

in the objective function is emphasized, so that we

expect to obtain Y

Y close to V

V permitting small viola-

tion of the low-rank constraint. When we use small

λ, the ﬁrst part in the objective function is empha-

sized, so that we expect to obtain Y

Y satisfying the

two constraints simultaneously permitting small dif-

ference from V

V .

3 HARDNESS

In this section, we show the NP-hardness of Prob-

lem (5). Speciﬁcally, we prove the following result:

ICORES 2018 - 7th International Conference on Operations Research and Enterprise Systems

Theorem 1. When V

V ∈ ([0, 1] ∪ {?})

m×n

and r = 1,

it is NP-hard to ﬁnd an approximate solution to Prob-

lem (5) with an objective function accuracy of less

than 2

−12

(mn)

−7

To prove this theorem, we employ the following

theorem:

Theorem 2 (Gillis and Glineur (2011, Theorem 1.2)).

When V

V ∈ ([0,1]∪{?})

m×n

and r = 1, it is NP-hard to

ﬁnd an approximate solution to Problem (4) with an

objective function accuracy of less than 2

−12

(mn)

−7

Proof of Theorem 1. In Problem (5), we set L

i j

= V

i j

for (i, j) ∈ Ω; otherwise, L

i j

= −∞ and

i j

= +∞. Then, for (i, j) ∈ Ω, Y

i j

is ﬁxed to V

i j

and

the second part of the objective function is removed.

Thus, the resulting objective function can be written

∑

(i, j)∈Ω

i j

−V

i j

)

∑

(i, j)6∈Ω

i j

−Y

i j

)

. The lat-

ter part of this expression is also removed because

i j

for (i, j) 6∈ Ω is unconstrained and is able to coin-

cide with X

i j

. As a result, we obtain Problem (4). In

this procedure, we have reduced Problem (4) to Prob-

lem (5). This reduction is clearly in polynomial time.

Hence, the hardness for Problem (4) also holds for

Problem (5).

Remark 1. Theorem 1 is easily generalized to any r

because Theorem 2 is generalized to any r (Gillis and

Glineur, 2011, Remark 3).

4 ALGORITHM

We propose an alternating minimization algorithm for

solving Problem (5). In this section, we use the fol-

lowing extended-real-valued functions:

X,Y

Y ) = kX

X −Y

Y k

+ λ

∑

(i, j)∈Ω

i j

−V

i j

)

X) = ι(rank(X

X) ≤ r),

Y ) = ι(L

L ≤ Y

Y ≤ U

U),

f (X

X,Y

Y ) = f

X,Y

Y ) + f

X) + f

Y ),

where ι is the indicator function. That is, f

X) = 0 if

rank(X

X) ≤ r; otherwise f

X) = +∞; and f

Y ) = 0

if L

L ≤ Y

Y ≤ U

U; otherwise f

Y ) = +∞. Note that the

minimization of f (X

X,Y

Y ) is equivalent to Problem (5).

We consider the alternating minimization of f (X

X,Y

Y )

as shown in Algorithm 1. Each iteration of this algo-

rithm is easily computed as we see below.

In the update of X

X, we set X

(k+1)

to the best rank-r

approximation of Y

(k)

. That is, X

(k+1)

is an optimal

solution of the following subproblem:

minimize kX

X −Y

(k)

subject to rank(X

X) ≤ r.

Algorithm 1: Alternating minimization algorithm for solv-

ing Problem (5).

Take initial guess (X

(0)

) ∈ dom f

× dom f

for k = 0, 1,2,... until convergence:

Update X

(k+1)

= argmin

f (X

X,Y

(k)

Update Y

(k+1)

= argmin

f (X

(k+1)

Y ).

Although this subproblem is a nonconvex opti-

mization problem, the optimal solution is easily com-

puted by the truncated SVD of Y

(k)

The update of Y

Y is also easily computed. In fact,

we solve the following subproblem to update Y

Y :

minimize kX

(k+1)

−Y

Y k

+ λ

∑

(i, j)∈Ω

i j

−V

i j

)

subject to L

L ≤ Y

Y ≤ U

(6)

Note that this subproblem is separable. The separated

subproblem for each entry reduces to the minimiza-

tion of a univariate convex quadratic function over a

closed interval. Speciﬁcally, we solve

minimize (1 + λ)Y

i j

− 2(X

(k+1)

i j

+ λV

i j

subject to L

i j

≤ Y

i j

≤ U

i j

for each (i, j) ∈ Ω and

minimize Y

i j

− 2X

(k+1)

i j

subject to L

i j

≤ Y

i j

≤ U

i j

for each (i, j) 6∈ Ω. These have the following closed-

form solution:

(k+1)

i j











i j

≤ L

i j

< A

i j

< U

i j

≤ A

i j

where

i j











(k+1)

i j

+ λV

i j

1 + λ

((i, j) ∈ Ω),

(k+1)

i j

((i, j) 6∈ Ω).

Hence, we can solve Subproblem (6) by thresholding,

which requires a computation time in O(mn).

A sequence generated by Algorithm 1 converges

to a stationary point of f under a mild assumption.

Here, we say point (

Y ) is a stationary point of

f in the sense of Tseng (2001) if (

Y ) ∈ dom f =

{(X

X,Y

Y ) : f (X

X,Y

Y ) < +∞} and

(

Y ;∆X

X,∆Y

Y ) ≥ 0 (∀(∆X

X,∆Y

Y )),

where f

(

Y ;∆X

X,∆Y

Y ) is the lower directional

derivative of f at (

Y ) in the direction (∆X

X,∆Y

Y ),

i.e.,

(

Y ;∆X

X,∆Y

Y )

Box Constrained Low-rank Matrix Approximation with Missing Values

= liminf

ε↓0

f (

X +ε∆X

Y + ε∆Y

Y ) − f (

Y )

Note that this deﬁnition works even if f is nonsmooth.

Speciﬁcally, the following theorem holds.

Theorem 3. Let {(X

(k)

)} be a sequence gener-

ated by Algorithm 1 and assume that level set L =

{(X

X,Y

Y ) : f (X

X,Y

Y ) ≤ f (X

(0)

)} is bounded. Then,

{(X

(k)

)} has at least one cluster point. In addi-

tion, every cluster point is a stationary point of f .

To prove this theorem, the following technical

lemma is required.

Lemma 1. Effective domain dom f of f is closed.

Proof. Clearly, dom f = dom f

× dom f

and dom f

is closed. Thus, we only need to show the closed-

ness of dom f

. Take X

X 6∈ dom f

arbitrarily. Then,

the r-th largest singular value σ

X) of X

X is positive.

We arbitrarily take E

E such that kE

< σ

X). From

Golub and van Loan (2013, Corollary 8.6.2), we can

easily prove that σ

X + E

E) ≥ σ

X) − kE

> 0 for

l = 1, .. .,r, so that X

X + E

E 6∈ dom f

. This indicates

the closedness of dom f

Here, we provide a proof of Theorem 3. The fol-

lowing proof is essentially based on the discussion in

Tseng (2001, Sections 3 and 4).

Proof of Theorem 3. From the optimality in each up-

date, sequence {(X

(k)

)} is contained in L ⊂

dom f , so that {(X

(k)

)} is bounded. Hence,

{(X

(k)

)} has at least one cluster point on dom f

because dom f is closed from Lemma 1.

Let {(X

)

)} be a subsequence of

{(X

(k)

)} that converges to a cluster point (

Y ).

To show that (

Y ) is a stationary point, we only

have to prove f

(

Y ;∆X

X,∆Y

Y ) ≥ 0 for any (∆X

X,∆Y

Y )

because (

Y ) ∈ dom f . From the optimality in each

update, we obtain

f (X

j+1

)

j+1

)

) ≤ f (X

+1)

)

≤ f (X

X,Y

)

) (∀X

X).

Taking the limit as j tends to inﬁnity, we obtain

f (

Y ) ≤ f (X

Y ) (∀X

X) (7)

because f is continuous on dom f . In addition,

f (X

)

) ≤ f (X

)

Y ) (∀Y

Y )

holds. Taking the limit as j tends to inﬁnity, we obtain

f (

Y ) ≤ f (

X,Y

Y ) (∀Y

Y ). (8)

Because f

is differentiable, the following relation-

ship holds for any (∆X

X,∆Y

Y ):

(

Y ;∆X

X,∆Y

Y )

= h∇ f

(

Y ),(∆X

X,∆Y

Y )i

+ lim inf

ε↓0



(

X +ε∆X

X) − f

(

Y + ε∆Y

Y ) − f

(

Y )



≥ h∇

(

Y ),∆X

Xi + h∇

(

Y ),∆Y

Y i

+ lim inf

ε↓0

(

X +ε∆X

X) − f

(

+ lim inf

ε↓0

(

Y + ε∆Y

Y ) − f

(

Y )

= h∇

(

Y ),∆X

Xi + h∇

(

Y ),∆Y

Y i

+ f

(

X;∆X

X) + f

(

Y ;∆Y

Y )

= lim inf

ε↓0

f (

X +ε∆X

Y ) − f (

Y )

+ lim inf

ε↓0

f (

Y + ε∆Y

Y ) − f (

Y )

≥ 0,

where the last inequality holds because of inequali-

ties (7) and (8).

Remark 2. The boundedness of level set L = {(X

X,Y

Y ) :

f (X

X,Y

Y ) ≤ f (X

(0)

)} holds if, for example, L

i j

−∞ and U

i j

< +∞ for all i, j.

Remark 3. We can extend our proposed algorithm to

a closed convex constraint on Y

Y , instead of the box

constraint. If we can efﬁciently solve the subproblem

to update Y

Y , the algorithm works well. In practice, the

subproblem can be solved faster than the subproblem

to update X

X with the truncated SVD on Y

Y .

We proved the convergence to a stationary point

above instead of an optimal solution. A set of station-

ary points contains every local optimal solution and

of course the global optimal solution. Thus, in prac-

tice, we run our algorithm from multiple initial points

and select the best stationary point provided by our

algorithm.

5 PRELIMINARY EXPERIMENTS

We show preliminary numerical results using syn-

thetic dataset to investigate a convergence rate of our

algorithm and an effect given by differences of ini-

tial points. We executed all experiments on a ma-

cOS Sierra 10.12.6 with an Intel Core m3, a 1.1 GHz

clock speed, and 8 GB of physical memory. We im-

plemented our algorithms in MATLAB (R2017b).

We generated matrix A

A ∈ R

20×100

such that

rank(A

A) = 10 and 0.5 ≤ A

i j

≤ 5.5 for all (i, j) by

multiplying two matrices B

B ∈ R

20×9

and C

C ∈ R

9×100

ICORES 2018 - 7th International Conference on Operations Research and Enterprise Systems

0 50 100 150 200 250 300

iteration number

-6

-4

-2

objective value

SKKR

perturb + SKKR

low rank rand

rand

Figure 1: Comparison of iteration numbers.

and some scaling, where all entries of B

B and C

C are

identically sampled from the standard normal distri-

bution. Then, we round all entries of A

A, i.e.,

i j

∈

{1,2, ...,5}, and randomly missed 80% of them. We

used the incomplete matrix as V

V . In our algorithm,

we set parameters to r = 10 for computing truncated

SVD and λ = 1 for the weight parameter in the objec-

tive function.

In this experiment, we compared four methods of

generating initial points:

SKKR We applied the method proposed by Sarwar

et al. (2000) to given V

V and used the output as an

initial point of our algorithm;

perturb + SKKR We applied the method proposed

by Sarwar et al. (2000) to perturbed V

V ’s and used

the outputs as initial points of our algorithm;

low rank rand We generated initial points of our al-

gorithm in the same way of generating A

rand We generated 20 × 100 matrices whose entries

were identically sampled from the standard nor-

mal distribution and used it as initial points.

In each method of perturb + SKKR, low rank rand,

and rand, we generated 10 initial points and obtained

10 different solutions.

The differences of iteration numbers necessary to

converge to stationary points are shown in Figure 1.

The horizontal axis is iteration number and the ver-

tical axis is objective value. From Figure 1, we can

see that if we run our algorithm from initial points

generated by perturb + SKKR and low rank rand, it

converged faster than from an initial point provided

by SKKR. On the other hand, it took much more time

to converge if it started from initial points generated

by rand.

The objective values at the resulting stationary

points are shown in Figure 2. The vertical axis is

objective value. The black horizontal line around

perturb + SKKR low rank rand rand

0.8

1.2

1.4

1.6

1.8

2.2

2.4

2.6

objective value

-5

Figure 2: Comparison of objective values.

2 × 10

−5

depicts the objective value of the stationary

point provided by our algorithm starting from an ini-

tial point generated by SKKR. From Figure 2, we can

see that all solutions provided by our algorithm start-

ing from initial points generated by perturb + SKKR

and low rank rand are better than that by SKKR. Some

solutions resulting from rand is also smaller than that

from SKKR.

6 CONCLUSION

In this paper, we proposed a low-rank matrix approx-

imation model for completing a matrix with missing

values. Our proposed model utilizes not only a rank

constraint but also a box constraint. Owing to the

box constraint, this model shows promise for use in

recommendation systems. In addition, we proposed

an alternating minimization algorithm for solving our

proposed model, and proved that a sequence gener-

ated by our proposed algorithm converges to a sta-

tionary point under a mild assumption. Our numerical

results are preliminary, however, we showed that our

proposed algorithm converges quickly and provides

better solution than the existing method proposed by

Sarwar et al. (2000) in a speciﬁc case. The perfor-

mance of our algorithm depends on an initial guess.

Hence, we should try multiple initial guesses and take

the best one. In our experiments, we ﬁxed parame-

ters r and λ. However, we should employ cross vali-

dation to decide appropriate values for such parame-

ters. We left extensive numerical experiments as our

future work.

Box Constrained Low-rank Matrix Approximation with Missing Values

ACKNOWLEDGEMENTS

We thank to Prof. Tomomi Matsui and anonymous

referees for providing helpful and useful comments,

respectively.

REFERENCES

Cand

es, E. J. and Recht, B. (2009). Exact matrix comple-

tion via convex optimization. Foundations of Compu-

tational Mathematics, 9(6):717–772.

Gillis, N. and Glineur, F. (2011). Low-rank matrix ap-

proximation with weights or missing data is NP-hard.

SIAM Journal on Matrix Analysis and Applications,

32(4):1149–1165.

Golub, G. H. and van Loan, C. F. (2013). Matrix Compu-

tations. The Johns Hopkins University Press, fourth

edition.

Ilin, A. and Raiko, T. (2010). Practical approaches to

principal component analysis in the presence of miss-

ing values. Journal of Machine Learning Research,

11(Jul):1957–2000.

Olsson, C. and Oskarsson, M. (2009). A convex approach

to low rank matrix approximation with missing data.

In Salberg, A.-B., Hardeberg, J. Y., and Jenssen, R.,

editors, Proceedings of the 16th Scandinavian Confer-

ence on Image Analysis (SCIA ’09), pages 301–309.

Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2000).

Application of dimensionality reduction in recom-

mender system—a case study. In Kohavi, R., Masand,

B., Spiliopoulou, M., and Srivastava, J., editors, Pro-

ceedings of the ACM WEBKDD 2000 Workshop.

Su, X. and Khoshgoftaar, T. M. (2009). A survey of col-

laborative ﬁltering techniques. Advances in Artiﬁcial

Intelligence, 2009(421425):1–19.

Tipping, M. E. and Bishop, C. M. (1999). Probabilistic prin-

cipal component analysis. Journal of the Royal Statis-

tical Society. Series B, 61(3):611–622.

Trefethen, L. N. and Bau, D. (1997). Numerical Linear

Algebra. SIAM.

Tseng, P. (2001). Convergence of a block coordinate

descent method for nondifferentiable minimization.

Journal of Optimization Theory and Applications,

109(3):475–494.

ICORES 2018 - 7th International Conference on Operations Research and Enterprise Systems