Is (President, ) a Correct Sense Pair?

Linking and Creating Bilingual Sense Correspondences

Fumiyo Fukumoto, Yoshimi Suzuki and Attaporn Wangpoonsarp

Graduate Faculty of Interdisciplinary Research, University of Yamanashi, Japan

Keywords:

Bilingual Sense Correspondence, Domain Speciﬁc Senses.

Abstract:

This paper presents a method of linking and creating bilingual sense correspondences between English and

Japanese noun word dictionaries. Locally, we extracted bilingual noun words using sentence-based similarity.

Globally, for each monolingual dictionary, we identiﬁed domain-speciﬁc senses using a textual corpus with

category information. We incorporated these, i.e., we assigned a sense to each noun word of the extracted

bilingual words keeping domain (category) consistency. Evaluation on the WordNet 3.0 and EDR Japanese

dictionaries using Reuters and Mainichi Japanese newspaper corpora showed 23.1% improvement of bilingual

noun word extraction over the baseline with local data view only. Moreover, we found that the extracted

bilingual noun senses can be used as a lexical resource for the machine translation.

1 INTRODUCTION

There has long been a great deal of interest in retrieval

of bilingual lexicons with the availability of a number

of large-scale corpora. The extracted bilingual lexi-

con is widely used for cross-lingual NLP applications,

such as machine translation (MT), cross-lingual infor-

mation retrieval (CLIR), multilingual topic tracking,

text classiﬁcation and summarization. Much of the

previous work on ﬁnding bilingual lexicons has fo-

cused on word correspondence, rather than meaning.

Interpretation of whether the extracted bilingual lex-

icon is correct relies on human intervention. For ex-

ample, in most cases, we can accept a pair of English

word “president” and Japanese word “ ” as a

bilingual lexicon, while the noun “president” has at

least six different senses in WordNet, and “ ”

has two senses in the EDR dictionary.

In this paper, we propose a method for bilingual

noun word sense correspondence. We focused on

manually constructed monolingual dictionaries in dif-

ferent languages because dictionaries such as Word-

Net (Miller et al., 1990; Miller, 1995), ACQUILEX

(Briscoe, 1991), COMLEX (Grishman et al., 1994),

and EDR Japanese dictionaries

are ﬁne-grained and

they are successfully utilized not only for NLP but

also for NLP applications. We used them to make

up the deﬁcit of corpus statistics obtained from com-

www2.nict.go.jp/ipp/EDR/ENG/indexTop.html

parable corpora. We identiﬁed bilingual noun word

senses based on local and global data views. Here,

local data view indicates relevance between English

and Japanese sentences. We extracted bilingual noun

words by applying sentence-based similarity.

Global data view refers to domain/category infor-

mation. The assumption behind this is that predom-

inant sense of a word can depend on the domain or

source of a document (Gale et al., 1992; McCarthy

et al., 2007). We identiﬁed domain-speciﬁc senses us-

ing a corpus with category information. This method

ﬁrst identiﬁes each sense of a word in the dictionary to

its corresponding category. For each category, we cre-

ated a graph, where each node refers to each sense of

a word, and edges between nodes indicate similarity

between two nodes. We applied a Markov Random

Walk (MRW) model to the graph and ranked scores

for each sense. Finally, we incorporated the local

and global information, i.e., we assigned a domain-

speciﬁc sense to each noun word of the extracted

bilingual words maintaining domain/category consis-

tency.

The rest of the paper is organized as follows. The

next section describes our approach, especially local

and global data views. Section 3 reports some exper-

iments with a discussion of evaluation. Finally, we

summarize existing related work and conclude in Sec-

tion 5.

Fukumoto F., Suzuki Y. and Wangpoonsarp A.

Is (President,åd’

gç¸t

sé

aŸ) a Correct Sense Pair? - Linking and Creating Bilingual Sense Correspondences.

DOI: 10.5220/0006489900390048

In Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KEOD 2017), pages 39-48

ISBN: 978-989-758-272-1

2 SYSTEM DESIGN

The method consists of three steps: (1) bilingual noun

word extraction, (2) identiﬁcation of domain-speciﬁc

senses, and (3) bilingual sense correspondence. Here-

after, we describe bilingual noun words as BN words

and bilingual noun word senses as BNSs.

2.1 Local Data View

The ﬁrst step is to extract BN words from the corpora.

This process consists of two sub-steps: (i) retrieval

of relevant documents, and (ii) BN word extraction

based on sentence level similarity.

2.1.1 Retrieval of Relevant Documents

We used Reuters’96 and the Mainichi Japanese news-

paper documents. Let d

(1 ≤ i ≤ s) and d

(1 ≤ j ≤

t) be a Mainichi document and a Reuters document,

respectively. Each Reuters document d

is trans-

lated into a Japanese document d

E mt

using English-

Japanese MT software. We calculated similarity be-

tween two documents using BM25 (Robertson and

Walker, 1994), which is widely used in information

retrieval studies

. BM25 is given by:

BM25(d

E mt

) =

∑

w∈d

E mt

(1)

+ 1)t f

K +t f

+ 1)qt f

+ qt f

, (1)

where w is a word within d

E mt

; w

(1)

= log

(t−n+0.5)

(n+0.5)

is the weight of w; t is the number of Mainichi doc-

uments; n is the number of documents containing w;

K refers to k

((1 − b) + b

avdl

); k

, b, and k

are pa-

rameters set to 1, 1, and 1,000, respectively; dl

the document length of d

; avdl is the average doc-

ument length in words; and t f

and qt f

are the fre-

quencies of occurrence of the w (∈ d

E mt

) in d

, and

E mt

, respectively. If the similarity value between d

and d

E mt

is higher than the lower bound L

, these are

regarded as relevant documents, and a document pair

is created.

2.1.2 Bilingual Word Extraction

We assume that BN word correspondences obtained

using relevant documents are unreliable as many noun

words appear in a pair of relevant documents. There-

fore, we applied sentence level retrieval. First, we ap-

plied simple χ

statistics to the extracted pairs of rel-

evant documents and extracted Mainichi noun word

and Reuters noun word w

pairs with χ

values

http://trec.nist.gov/

Table 1: Variables of χ

statistics.

¬w

f (w

) = a f (w

,¬ w

) = b

¬w

f (¬ w

) = c f (¬ w

, ¬ w

) = d

greater than zero. The χ

statistic measures the lack

of independence between w

and w

, and can be com-

pared to the χ

distribution with one degree of free-

dom to judge extremeness (Yang and Pedersen, 1997).

It is deﬁned by:

) =

(ad − bc)

(a +b)(a + c)(b + d)(c + d)

. (2)

a, b, c, and d in Eq. (2) are shown in Table 1. f (x, y)

in Table 1 refers to the co-occurrence frequencies of

x and y on the Japanese and English sides, respec-

tively. Next, for each w

and w

pair, we apply sen-

tence level similarity given by:

S sim(w

) =

max

S w

∈Set

,S w

∈Set

sim(S w

,S w

where

sim(S w

,S w

) =

| S w

∩ S

| S w

| + | S

| −2× | S w

∩ S

| +2

. (3)

Set

and Set

are sets of sentences that include w

and

, respectively; |X| is the number of noun words in a

sentence X ; | S w

∩ S

| refers to the number of

noun words that appear in both S w

and S

; and

is the translation result of S w

. The larger

value of sim(S w

,S w

) indicates the more similar

these two sentences S w

and S w

. As shown in Eq.

(4), we retrieved w

and w

as the BN word such that

the similarity between w

and w

∈ BP(w

) are the

largest value. Here, BP(w

) is a set of BN word pairs,

each of which includes w

on the Japanese side.

i = arg max

i∈BP(w

)

sim(w

). (4)

2.2 Global Data View

The second step is to identify domain-speciﬁc senses

for each noun word. Globally, we used bilingual cate-

gory/domain correspondences, Reuters and Mainichi

categories. The process is applied independently to

the English (WordNet) and Japanese (EDR) dictionar-

ies. We used the MRW model to ranking the senses.

For each Reuters category c

and Mainichi category

such as “sports” and “economy” assigned to the

Reuters (Mainichi) documents, we created a Graph G

= (S, E) that reﬂects the relationships between senses

in a set S. S refers to a set of noun word senses in the

Reuters (Mainichi) documents assigned to the cate-

gory c

). Each sense s

∈ S is represented by a

vector. Each dimension of a vector corresponds to

each word appearing in syn ∪ gloss, where syn in-

dicates a synset and gloss refers to a gloss text in

a dictionary. Each element of a dimension is a fre-

quency count of the word in syn ∪ gloss. E is a set

of edges, which is a subset of S ∗ S. Each edge e

i j

∈

E is associated with an afﬁnity weight aw(i → j). We

used two afﬁnity types between i and j. Directed re-

lation is a hyponym relation, and undirected relation

is a synonym relation. The weight is computed using

the standard cosine measure between two senses. The

transition probability from s

to s

is then deﬁned by

normalizing the corresponding afﬁnity weight:

p(i → j) =











aw(i→ j)

|S|

∑

k=1

aw(i→k)

, if Σaw 6= 0

0 , otherwise.

(5)

We used the row-normalized matrix U =

i j

)

|S|∗|S|

to describe G with each sense correspond-

ing to the transition probability, where U

i j

= p(i → j).

To make U a stochastic matrix, the rows with all zero

elements are replaced by a smoothing vector with all

elements set to

|S|

. The matrix form of the saliency

score Score(s

) can be formulated in a recursive form

as in the MRW model:

λ = µU

λ +

(1−µ)

|S|

~e, where

= [Score(s

)]

|S|∗1

is a vector of saliency score for the

senses. ~e is a column vector with all elements equal

to 1. µ is a damping factor. We set µ to 0.85, as in the

PageRank (Brin and Pagee, 1998). The ﬁnal asym-

metric transition matrix M is given by:

M = µU

(1 − µ)

| S |

~e~e

(6)

Each sense score in a speciﬁc category is obtained

by the principal eigenvector of the new transition ma-

trix M. We applied the algorithm for each category.

For implementation, we used the Eigen library.

chose a vector with the largest eigenvalues. We nor-

malized a vector, and obtained rank scores of senses.

2.3 Bilingual Sense Correspondence

The ﬁnal step is to retrieve BNSs using both results

obtained by local and global data views. Let hw

The principal eigenvector is obtained by the power

method and inverse iteration method.

http://eigen.tuxfamily.org/index.php?title=Main Page

∈ Set

}

be a pair of nouns obtained by the BN

word extraction procedure where w

has the number

of m senses w

J s

(1 ≤ i ≤ m) and w

has the n senses

E s

(1 ≤ j ≤ n), respectively. We retrieved BNS

pair w

J s

and w

E s

J s

E s

)

s.t. w

J s

∈ Set

E s

∈ Set

i satisﬁes RNN. (7)

We recall that the corpora used to identify domain-

speciﬁc senses are Reuters and Mainichi documents,

each of which has different categories. Therefore,

we estimated the category correspondence accord-

ing to the χ

) statistics shown in Eq. (2).

In Eq. (2), we simply replaced w

by c

and w

by c

as each document of relevant document pairs

has category information. The subscript (c

) of

J s

E s

)

in formula (7) refers to the re-

sult of category correspondence between Reuters and

Mainichi obtained by Eq. (2). Set

and Set

show

ranked sense lists for c

, and c

, respectively, ob-

tained by the MRW model. RNN is the so-called Re-

ciprocal Nearest Neighbors in that two noun senses

are each other’s most similar noun (Hindle, 1990).

3 EXPERIMENTS

We evaluated each of the three procedures in the ex-

periments, (1) BN extraction by using the results of

relevant documents retrieval, (2) retrieving domain

speciﬁc senses, and (3) BNS correspondences.

3.1 Local Data View

In this subsection, we report the results of BN extrac-

tion.

3.1.1 Experimental Setup

We used Reuters’96 and Mainichi Japanese newspa-

per corpora from 20 August 1996 to 19 August 1997.

The Reuters’96 corpus consists of 806,792 documents

organized into three types of coarse-grained cate-

gories, i.e., topic, industry, and region. Each consists

of 126, 870, and 366 categories. The Mainichi cor-

pus consists of 119,051 documents organized into 16

categories. The difference in dates between Reuters

and Mainichi documents is less than ±3 days, e.g.,

when the date of the Reuters document is 27 August,

the corresponding Mainichi data is from 24 to 30 Au-

gust. We set a small date difference because if some

Table 2: Bilingual noun extraction.

Pairs Eng Jap # of bilingual # of nouns IRS

nouns (top 1,000) (top 1,000)

Docs

196,368 133,854 20,882

172,895 162 2.32

Docs & Sent 115,918 329 4.83

event occurred at some speciﬁc time period, the press

of all countries report it on one of these days. We used

English-Japanese MT software (Internet Honyakuno-

Ousama for Linux, Ver.5, IBM Corp.). Each Reuters

document was translated into a Japanese document,

and BM25 was applied.

The training data for choosing the lower bound L

were Reuters 20 August 1996 and Mainichi from 17

to 23 August 1996. The total number of English and

Japanese documents collected were 2,586 and 2,137,

respectively, and the number of relevant documents

collected manually for evaluation was 157. The clas-

siﬁcation was determined to be correct if the two hu-

man judges agreed on the evaluation. We used the

F-score for evaluation of relevant document retrieval,

which is a measure that balances precision (Prec) and

recall (Rec). Let cSet be a set of correct document

pairs. The deﬁnitions of Prec and Rec are given by:

Prec =

| {(d

) | (d

) ∈ cSet, BM25(d

) ≥ L

} |

| {(d

) | BM25(d

) ≥ L

Rec =

| {(d

) | (d

) ∈ cSet, BM25(d

) ≥ L

} |

| {(d

) | (d

) ∈ cSet |

The best performance of F-score was 0.564 when the

value was 150. We used the value (L

= 150) to

extract BN word pairs. We used one year of Reuters

and Mainichi documents except for the training data

that were used to estimate L

values. The difference

in dates between them was less than ±3 days.

3.1.2 Results

The results of BN words extraction are shown in Table

2. We compared the results obtained by our method,

“Docs & Sent” with the results obtained by only ap-

plying χ

statistics to the results of relevant docu-

ments, “Docs” to examine how the sentence-based

retrieval inﬂuences the performance. “Pairs” in Ta-

ble 2 shows the number of bilingual document pairs.

“Eng” and “Jap” show the numbers of Reuters and

Mainichi documents within the pairs satisfying the

similarity lower bound L

= 150, respectively. “# of

nouns” shows the number of correct BN in the top-

most 1,000 according to the χ

statistics (Docs) and

sentence-based similarity (Docs & Sent).

As shown in Table 2, sentence-based retrieval con-

tributed to a reduction in the number of useless BN

words without a decrease in accuracy, as about 67%

(115,918/172,895) of the size obtained by “Docs” was

retrieved, while about 2.03 (329/162) times the num-

ber of correct BN words were obtained in the topmost

1,000 nouns. “IRS” (Inverse Rank Score) is a mea-

sure of system performance by considering the rank

of correct BN words within the candidate nouns; it is

the sum of the inverse rank of each matching noun,

e.g., correct BN words by manual evaluation matches

at ranks 2 and 4 give an IRS of

= 0.75. A higher

IRS value indicates better system performance. Table

2 shows that sentence-based retrieval also contributes

to ranking performance compared with the results ob-

tained by applying χ

statistics only.

3.2 The Global Data View

Secondly, we report the results of retrieving domain

speciﬁc senses from the WordNet and EDR dictionar-

ies.

3.2.1 WordNet 3.0

We assigned the Reuters categories to each sense of

words in WordNet. We selected 38 Reuters cate-

gories, each of which is assigned to more than 80,000

documents. For each category, we collected noun

words with frequencies ≥ 5 within a document from

the one-year Reuters corpus. There are no existing

sense-tagged data for Reuters categories that could be

used for evaluation. Therefore, we selected a limited

number of words and evaluated these words qualita-

tively. To do this, we used SFC resources (Magnini

and Cavaglia, 2000), which annotate WordNet 2.0

synsets with domain labels. We selected 4 cate-

gories that are easy to manually identify correspond-

ing Reuters and SFC categories. Statistics of data

and the results are shown in Table 3. “Words” shows

the number of different words with frequency ≥ 5

within a document assigned to the category shown

in “Reu,” and “Sense” shows their total number of

senses. “SFC” refers to the number of senses appear-

ing in the SFC resource. “DSS” (Domain-Speciﬁc

Senses) is the number of senses assigned by our

method. “DSS of the # of Correct S” shows the num-

ber of correct senses in the topmost 1,000, and “D&S”

refers to the number of correct senses appearing in

both DSS and SFC.

Table 3: The results of sense assignments (WordNet).

Reu / SFC Words Senses SFC DSS

# of Correct S

IRS

DSS D&S

Economics / Economy 10,284 32,550 1,032 6,355 623 120 2.31

Sports / Sports 7,437 26,478 339 4,457 573 158 2.01

War / Military 7,681 26,831 913 4,366 510 68 1.41

Politics / Politics 10,349 32,536 793 5,981 612 103 2.21

Average 8,938 29,598 769 5,290 580 112 1.99

Table 4: DSS against the FS heuristic (WordNet).

Cat Sense DSS FS

Economics 5.6 0.73 0.68

Sports 4.5 0.71 0.69

War 4.9 0.52 0.30

Politics 4.2 0.75 0.68

Average 4.8 0.68 0.59

As shown in Table 3, the numbers of correct

senses obtained by “DSS” and “D&S” did not exactly

match. This is not surprising because our method

used the Reuters corpus, while the SFC resource con-

sists of 96% of the WordNet synsets, each of which

is manually annotated using 115 different SFC. The

IRS depends on the category, and the average IRS

was 1.99. It is interesting to note that some senses of

words that were obtained correctly by our approach

did not appear in the SFC resource. For example, the

words “shot” and “strike” in the Sport category, and

“liberty” and “military hospital” in the war/military

category were obtained by our approach but did not

appear in the SFC. This is because we used WordNet

3.0, while SFC was based on WordNet 2.0. These

observations clearly support the usefulness of our au-

tomated method.

In the WSD task, a ﬁrst sense (FS) heuristic is of-

ten applied because of its power and lack of a require-

ment for expensive hand-annotated data sets (Cotton

et al., 1998; McCarthy et al., 2007). We compared the

results obtained by DSS to those obtained by the FS

heuristic. For each category, we randomly selected 10

words from the senses assigned by DSS, and selected

10 sentences from the documents belonging to each

corresponding category. As a result, we tested 100

sentences for each category. Table 4 shows the results.

“Sense” refers to the number of average senses per

word. The average precision (Avg) of our method was

0.68, while the result obtained by the FS was 0.59.

3.2.2 EDR Dictionary

We assigned categories from Mainichi Japanese doc-

uments to each sense of words in the EDR Japanese

Table 5: The results of sense assignments (EDR).

Cat Words Senses Cor IRS

Economics 15,906 30,869 740 6.39

Sports 17,556 33,595 559 2.77

International 13,906 27,239 451 5.63

Average 15,789 30,568 583 4.93

word dictionary

. We selected 11 out of 16 categories,

each of which had a sufﬁcient number of documents.

All documents were tagged by the morphological an-

alyzer Chasen (Matsumoto et al., 2000), and noun

words were extracted. We choose three categories for

which it is easy to manually create correct data. Ta-

ble 5 shows the statistics of the data and the results

of assignment. “Cor” shows the number of senses as-

signed by our approach correctly in the topmost 1,000

senses. The average IRS obtained by the EDR was

4.93 and Reuters was 1.99. This is reasonable as

the assignment task using EDR is easy compared to

WordNet, i.e., the average number of senses per word

of the former is 1.93, while that of the latter is 3.31.

Table 6: DSS against the FS heuristic (EDR).

Cat Sense DSS FS

Economics 2.793 0.79 0.60

Sports 2.873 0.65 0.52

International 2.873 0.68 0.58

Average 2.846 0.70 0.57

In the WSD task, we randomly selected 30 words

from the senses assigned by DSS. For each word, we

selected 10 sentences from the documents belonging

to each corresponding category. The FS in the EDR is

determined based on the EDR corpus. Table 6 shows

the results. As can be seen in Table 6, DSS was also

better than the FS heuristics in Japanese data. The

overall performance for the FS (0.57) was not better,

similar to the case for the English data (0.59), while

the number of senses per word in the Japanese re-

source was smaller than that in WordNet. There were

many senses that did not occur in the EDR corpus, i.e.,

www2.nict.go.jp/out-promotion/techtransfer/EDR/

index.html?

Table 7: The results of bilingual sense correspondences.

Approach Candidates # of Correct ext.(1,000) RNN (1,071) IRS

BN BNS BN BNS BN BNS BN BNS

Local 171,895 —– 329 —– 431 —– 2.32 —–

Local & Global 171,895 115,918 437 312 679 580 4.10 4.35

Table 8: Examples of bilingual noun senses.

Cat pair # of Pairs # of Correct Examples

(Mai, Reu) Cand RNN senses(%) Sense id (gloss text)

(International, 269 10 9 (90.0) EDR: 01 (an irregular group of soldiers

given to sneak attacks)

Government) WordNet: guerrilla 01 (a member of an irregular

armed force)

(Economics, 4,964 58 44 (75.9) EDR: 09 (stock, share)

Economics) WordNet: stock 01 (the capital raised by a corpora-

tion through the issue of shares)

(Sports,

Sports)

6,560 68 48 (70.6) EDR: 02 (in golf, the average number of

strokes for playing around a course)

WordNet: par 01 (the number of strokes set for each

hole on a golf course)

(Local news, 2,903 34 23 (67.6) EDR: 02 (the act of killing someone)

Crime) WordNet: killer 01 (someone who causes the death of

a person)

62,460 nouns appeared in both EDR and Mainichi

newspapers (from 1991 to 2000), 164,761 senses in

all. Of these, 114,267 senses did not appear in the

EDR corpus. This also demonstrates that automatic

DSS works well compared to the frequency-based FS

heuristics.

3.3 Bilingual Sense Correspondence

Finally, we evaluated the performance of BNS corre-

spondences.

3.3.1 Experimental Setup

The data for BNS correspondence were the Reuters

and Mainichi corpora from the same period, i.e., 20

August 1996 to 19 August 1997. The total num-

bers of documents were 806,791, and 119,822, re-

spectively. Locally, we extracted BN words us-

ing sentence-based similarity. We retrieved cross-

lingually relevant Japanese documents with English

documents. The difference in dates between them

was less than ±3 days. We used these documents

to extract BN words. Globally, we assigned domain-

speciﬁc senses for each of the 38 categories for Word-

Net 3.0 and 11 categories for EDR. We estimated cat-

egory correspondences, and retrieved BNSs accord-

ing to their correspondences. We obtained 92 cate-

gory correspondences in all with χ

values larger than

zero. From these data, we extracted BNSs.

3.3.2 Results

Table 7 shows the results of BNS correspondences.

“Local” indicates the results using only sentence-

based similarity, and “Local & Global” shows the

results obtained by our method. “# of Correct ext.”

refers to the number of correct extractions within the

topmost 1,000, and “RNN” shows the number of cor-

rect extractions by applying RNN. “BN” refers to the

number of BN words, and “BNS” is their senses.

As can be clearly seen in Table 7, the results with

integration of local and global data views improved

overall performance of BN word extraction compared

to local data only because 10.8% (437−329)/1,000

improvement within the topmost 1,000, and 23.1%

(679−431)/1,071 improvement with RNN. We ob-

tained 312 BNSs within the topmost 1,000, although

bilingual sense correspondence is a difﬁcult task.

Moreover, RNN is effective for BNS correspon-

dences. We obtained a total of 1,071 BNSs, which

580 were correct, while that without RNN was 312

within the topmost 1,000.

Table 8 lists examples of BNSs for each category

correspondence. (x, y) of the category pair refers to

the Mainichi and Reuters category correspondence.

“Cand” denotes the number of extracted word pairs,

and “RNN” shows word pairs obtained by RNN.

“Sense id” shows the sense and its order of appear-

ance in each dictionary, WordNet and EDR. Table 8

Table 9: Constituents of bilingual sense of words.

Eng authority, board, budget, business, case

coast, company deﬁcit, ﬁnance, group, issue

money, opposition people, power, president

seat, space, state, trial

Jap , , , , ,

, , , , , , ,

, , , , , ,

, ,

Table 10: Examples of bilingual senses.

Cat pair Sense pair

(Mai, Reu) (Mai, Reu)

(top news, industrial) ( 01, authority 01)

(international, international) ( 02, authority 05)

(top news, election) ( 02, case 01)

(top news, violence) ( 01, case 02)

(top news, funding) ( 01, deﬁcit 01)

(top news, politics) ( 02, deﬁcit 02)

shows that the ﬁrst three senses of words, “guerrilla,”

“stock,” and “par” corresponded correctly. However,

“the act of killing someone” in the EDR was in-

correctly identiﬁed by “killer” (person) in WordNet.

Our approach for identifying BNSs is based on term-

based corpus statistics. It will be necessary to inves-

tigate other types of lexicon, such as verb and subjec-

tive/objective noun collocations, for further improve-

ment. Our method for category correspondence is

very simple as it is based on the χ

statistics. Much of

the previous work on text categorization indicated that

hierarchical structure (e.g., MeSH, Yahoo!, LookS-

mart) improves the accuracy of text categorization,

and it is worth attempting to use a hierarchical struc-

ture to determine corresponding categories with high

accuracy.

Finally, we compared the results of bilingual sense

correspondences with a machine readable bilingual

dictionary and two English-Japanese MT systems.

We randomly choose 80 BNSs from 580 correct bilin-

gual senses obtained by RNN. There were 40 different

category pairs, and as shown in Table 9, the number

of different English and Japanese words were 20, and

32, respectively. Table 10 shows examples of BNSs

and its corresponding category pairs.

We choose titles of documents as a translation test

data. Because they are short sentences, and hard to be

affected by a syntactic analyzer in MT systems which

enable to approximate a fair comparison. For each of

the 80 BNS pairs, we created translation test data.

For example, in the word, “authority” of BNS pair

( 01, authority 01) in Table 10, we randomly se-

lected 10 titles sentences including “authority” from

the Reuters documents assigned to the category “in-

Table 11: Comparison against a dictionary and MT systems.

# of Correct senses (%)

Eijirou 329 / 800 (41.2)

Honyakuno-Ousama 521 / 800 (65.1)

SYSTRANet 550 / 800 (68.8)

RNN 562 / 800 (70.3)

dustrial”. We translated these test data by using two

MT systems

, and compared the translation result of

“authority”. We also compared the results obtained by

our method with English-Japanese dictionary, Eijirou

on the WEB

. In the English-Japanese dictionary, we

used the ﬁrst-sense heuristic (choosing the ﬁrst sense

of a word). The comparative results are shown in Ta-

ble 11.

As we can see from Table 11 that the results ob-

tained by RNN was 70.3%, and it was better to the

results obtained by bilingual dictionary (41.2%), and

slightly better to the results obtained by SYSTRANet

(68.8%). Moreover, it works well compared to the

Honyakuno-Ousama (MT) that we used in the process

of BN word extraction as local data view. It can be ob-

served from these results that the extracted BNSs can

be used as a lexical resource for MT system.

4 RELATED WORK

Our approach to link bilingual word senses in dictio-

naries can be regarded as a type of ontology align-

ment. The earliest such attempts were Chimaera

(McGuinness et al., 2000) and PROMPT (Noy and

Musen, 2000). Suchanek et al. developed an ontol-

ogy alignment system called PARIS which relies on

instance overlap-based cues to align instances, cate-

gories, and relations from two knowledge bases. They

reported that the method is effective, although it re-

mains insufﬁcient to align ontologies that share few

or no data entries in common. Wijaya et al. focused

on the problem, and presented a method of aligning

ontologies called PIDGIN that employs a very large

natural language text as interlingua and graph-based

self-supervised learning (Wijaya et al., 2013). The

use of corpora is similar to our method, although the

target of integration is quite different, i.e., PIDGIN

aimed at relation and category alignment, while our

method aligned noun word senses.

In the context of bilingual lexicon extraction,

much of the previous work used comparable cor-

pora. One attempt involved directly retrieving bilin-

We used Internet Honyakuno-Ousama, and SYSTRANet

English-Japanese MT software. www.systranet.com/translate

www.alc.co.jp

gual lexicons from corpora (Fung and Cheung, 2004;

Gaussier et al., 2004). The alternative approach con-

sists of two steps: ﬁrst, cross-lingual relevant doc-

uments are retrieved from comparable corpora, and

then bilingual term correspondences within these rel-

evant documents are estimated. Much of the previ-

ous work in ﬁnding relevant documents used MT sys-

tems or existing bilingual lexicons to translate one

language into another (Utsuro et al., 2003). Docu-

ment pairs are then retrieved based on document sim-

ilarity. Another approach to retrieving relevant doc-

uments involves the collection of relevant document

URLs from the WWW (Resnik and Smith, 2003).

Munteanu et al. proposed a method for extracting par-

allel sub-sentential fragments from very non-parallel

bilingual corpora (Munteanu and Marcu, 2006). All

of these methods successfully extracted bilingual lex-

icons but they ignored the meanings of words. One at-

tempt to deal with the meaning of words is Kusner et

al.’s method (Kusner et al., 2015). They presented the

Word Mover’s Distance (WMD) between text doc-

uments by utilizing word2vec embeddings (Mikolov

et al., 2013a; Mikolov et al., 2013b). Word2vec learns

a vector representation for each word using a neural

network architecture consisting of three layers, i.e. an

input layer, a projecting layer, and an output layer to

predict nearby words.

There also has been a lot of work where bilin-

gual word vectors are induced using parallel corpora

(Brown et al., 1993; Haghighi et al., 2008; Dyer et al.,

2013; Kocisky et al., 2014). Dyer et al. presented

an alignment model called FASTALIGN model which

uses an alignment distribution deﬁned by a single pa-

rameter that measures how close the alignment is to

the diagonal (Dyer et al., 2013). Blunsem et al. ex-

tended Dyer et al.’s model for learning bilingual word

representations. They marginalize out the alignments

which enable to capture more bilingual semantic con-

text (Kocisky et al., 2014). Gouws et al. proposed

a simple and computationally efﬁcient model called

BioBOWA (Bilingual Bag-of-Words without Align-

ments) for learning bilingual distributed representa-

tions of words which can scale to large monolingual

datasets, and does not require word-aligned parallel

training data (Gouws et al., 2015). The method re-

quires monolingual data which trains directly, and ex-

tracts a bilingual signal from a smaller set of raw-text

sentence-aligned data. They evaluated the induced

cross-lingual embeddings on the two tasks, i.e. doc-

ument classiﬁcation and lexical translation task, and

the results showed that the method outperforms cur-

rent state-of-the-art methods, especially it contributes

to reduce computational cost.

In the context of domain-speciﬁc senses of a

word, Magnini et al. presented a lexical resource

where WordNet 2.0 synsets were annotated with Sub-

ject Field Codes (SFC) by a procedure that exploits

the WordNet structure (Magnini and Cavaglia, 2000;

Bentivogli et al., 2004). 96% of the WordNet synsets

of the noun hierarchy could have been annotated us-

ing 115 different SFC, while identiﬁcation of the do-

main labels for senses required a considerable amount

of hand-labeling. McCarthy et al. presented a method

to ﬁnd domain-speciﬁc predominant noun senses au-

tomatically using a thesaurus acquired from raw tex-

tual corpora and the WordNet similarity package (Mc-

Carthy et al., 2007). They tested two domains,

“Sports” and “Finance.”

indent To our knowledge, there have been only a few

previous work on bilingual sense extraction (Plous

and Ji, 2003). One approach is word translation dis-

ambiguation presented by Li et al. (Li and Li, 2004).

Their method is based on one-to-many sense map-

ping. They used a machine learning technique that

repeatedly constructs classiﬁers in the two languages

in parallel. They reported that the approach sig-

niﬁcantly outperformed existing methods using two

nouns (Pedersen, 2000), and seven of the twelve En-

glish words studied in WSD research by (Yarowsky,

1995). Their method requires a small number of

sense-tagged training data in both of the two lan-

guages, while our method requires documents as-

signed to categories and a dictionary with gloss text

which unfortunately hinders a direct and fair compar-

ison between their system and ours.

Our method has three novel aspects. First, we pro-

pose a method to integrate different data views to im-

prove the quality of bilingual sense correspondence.

Second, from the perspective of existing knowledge-

based integration, we propose a method for corre-

sponding senses between two monolingual dictionar-

ies via many-to-many sense mapping. Finally, from

the perspective of robustness, the method is auto-

mated, and requires only documents assigned to do-

mains/categories and a dictionary with gloss text. It

can be applied easily to a new domain, sense in-

ventory, or different languages given sufﬁcient doc-

uments.

5 CONCLUSION

We have developed an approach for linking and cre-

ating bilingual sense correspondences by combining

local and global data views. Future work will in-

clude: (i) investigating other types of lexicon for fur-

ther improvement, (ii) extending the method to use hi-

erarchical structures of categories for category corre-

spondence, (iii) retrieving other parts of speech word

senses, and (iv) evaluation of the method using dic-

tionaries other than English WordNet and Japanese

EDR.

ACKNOWLEDGEMENTS

The authors would like to thank Bernardo Magnini for

providing SFC resources, and the anonymous review-

ers for their comments and suggestions. This work

was supported by the Telecommunications Advance-

ment Foundation, and Support Center for Advanced

Telecommunications Technology Research, Founda-

tion.

REFERENCES

Bentivogli, L., Forner, P., Magnini, B., and Pianta, E.

(2004). Revising the wordnet domains hierarchy: Se-

mantics, coverage and balancing. In Proceedings of

the Workshop on Multilingual Linguistic Ressources,

pages 101–108.

Brin, S. and Pagee, L. (1998). Lexical Issues in Natural

Language Processing. In Proc. of the 7th International

Conference on World Wide Web 7, pages 107–117.

Briscoe, E. J. (1991). Lexical Issues in Natural Language

Processing. In Natural Language and Speech. Pro-

ceedings of the Symposium on Natural Language and

Speech, pages 39–68.

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer,

R. L. (1993). The Mathematics of Statistical Machine

Translation: Parameter Estimation. 19(2):263–311.

Cotton, S., Edmonds, P., Kilgarriff, A., and

Palmer, M. (1998). SENSEVAL-2,

http://www.sle.sharp.co.uk/senseval2/.

Dyer, C., Chahuneau, V., and Smith, N. A. (2013). A Sim-

ple, Fast, and Effective Reparameterization of IBM

Model 2. In Proc. of the 2013 Conference of the

North American Chapter of the Association for Com-

putational Linguistics: Human Language Technolo-

gies, pages 644–648.

Fung, P. and Cheung, P. (2004). Mining Very Non-Parallel

Corpora: Parallel Sentence and Lexicon Extraction

via Bootstrapping and EM. In Proc. of 2004 Con-

ference on Empirical Methods in Natural Language

Processing, pages 57–63.

Gale, W. A., Church, K. W., and Yarowsky, D. (1992). One

Sense Per Discourse. In Proc. of Speech and Natural

Language Workshop 1992, pages 233–237.

Gaussier, E., Renders, H.-M., Matveeva, I., Goutte, C., and

ejean, H. (2004). A Geometric View on Bilingual

Lexicon Extraction from Comparable Corpora. In

Proc. of the 42nd Annual Meeting of the Association

for Computational Linguistics, pages 527–534.

Gouws, S., Bengio, Y., and Corrado, G. (2015). Bil-

BOWA:Fast Bilingual Distributed Representations

without Word Alignments. In Proc. of the 32nd In-

ternational Conference on Machine Learning, pages

748–756.

Grishman, R., MacLeod, C., and Meyers, A. (1994). COM-

LEX Syntax: Building a Computational Lexicon. In

Proc. of the 15th International Conference on Compu-

tational Linguistics, pages 268–272.

Haghighi, A., Liang, P., Kirkpatrick, T. B., and Klein, D.

(2008). Learning Bilingual Lexicons from Monolin-

gual Corpora. In Proc. of 46th Annual Meeting of

the Association for Computational Linguistics: Hu-

man Language Technologies, pages 771–779.

Hindle, D. (1990). Noun Classiﬁcation from Predicate-

Argument Structures. In Proc. of the 28th Annual

Meeting of the Association for Computational Lin-

guistics, pages 268–275.

Kocisky, T., Hermann, K. M., and Blunsom, P.

(2014). Learning Bilingual Word Representations by

Marginalizing Alignments. In Proc. of the 52nd An-

nual Meeting of the Association for Computational

Linguistics, pages 224–229.

Kusner, M. J., Sun, Y., Kolkin, N. I., and Weinberger, K. Q.

(2015). From Word Embeddings to Document Dis-

tances. In Proc. of the 32nd International Conference

on Machine Learning, pages 957–966.

Li, H. and Li, C. (2004). Word Translation Disambigua-

tion Using Bilingual Bootstrapping. In Computational

Linguistics, pages 1–22.

Magnini, B. and Cavaglia, G. (2000). Integrating Subject

Field Codes into WordNet. In Proc. of LREC-2000.

Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y.,

Matsuda, Y., Takaoka, K., and Asahara, M. (2000).

Japanese Morphological Analysis System ChaSen

Version 2.2.1. In NAIST Technical Report NAIST.

McCarthy, D., Koeling, R., Weeds, J., and Carroll, J.

(2007). Unsupervised Acquisition of Predominant

Word Senses. In Computational Linguistics, volume

33(4), pages 553–590.

McGuinness, D. L., Fikes, R., Rice, J., and Wilder, S.

(2000). An Environment for Merging and Testing

Large Ontologies. In Proc. of the Conference on Prin-

ciples of Knowledge Representation and Reasoning,

pages 483–493.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a).

Efﬁcient estimation of word representations in vector

space. CoRR, abs/1301.3781.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and

Dean, J. (2013b). Distributed Representations of

Words and Phrases and Their Compositionality. In

Proc. of NIPS, pages 3111–3119.

Miller, G. A. (1995). Wordnet: A lexical database for en-

glish. Commun. ACM, 38:39–41.

Miller, G. A., Beckwith, R. T., Fellbaum, C. D., Gross, D.,

and Miller, K. J. (1990). WordNet: An on-line Lexical

Database. In International Journal of Lexicography,

volume 3(4), pages 235–244.

Munteanu, D. S. and Marcu, D. (2006). Extracting Parallel

Sub-Sentential Fragments from Non-Parallel Corpora.

In Proc. of the 21st International Conference on Com-

putational Linguistics and 44th Annual Meeting of the

Association for Computational Linguistics, pages 81–

88.

Noy, N. F. and Musen, M. A. (2000). Prompt: Algo-

rithm and Tool for Automated Ontology Merging and

Alignment. In Proc. of the 17th National Conference

on Artiﬁcial Intelligence, pages 450–455.

Pedersen, T. (2000). A Simple Approach to Building En-

sembles of Naive Bayesian Classiﬁers for Word Sense

Disambiguation. In Proc. of the 2nd Conference on

Empirical Methods in Natural Language Processing,

pages 197–207.

Plous, S. and Ji, H. (2003). A Model for Matching Se-

mantic Maps between Language (French/English, En-

glish/French). In Computational Linguistics, pages

155–178.

Resnik, P. and Smith, N. A. (2003). The Web as a Parallel

Corpus. In Computational Linguistics, volume 29(3),

pages 349–380.

Robertson, S. E. and Walker, S. (1994). Some Simple Ef-

fective Approximations to the 2-Poisson Model for

Probabilistic Weighted Retrieval. In Proc. of the

17th Annual International ACM SIGIR Conference on

Research and Development in Information Retrieval,

pages 232–241.

Utsuro, T., Horiuchi, T., Hamamoto, T., Hino, K., and

Nakayama, T. (2003). Effect of Cross-Language IR in

Bilingual Lexicon Acquisition from Comparable Cor-

pora. In Proc. of the 10th Conference of the Euro-

pean Chapter of the Association for Computational

Linguistics, pages 355–362.

Wijaya, D., Talukdar, P. P., and Mitchell, T. (2013). PID-

GIN: Ontology Alignment using Web Text as Inter-

lingua. In Proc. of ACM Conference on Information

and Knowledge Management, pages 589–598.

Yang, Y. and Pedersen, J. O. (1997). A Comparative

Study on Feature Selection in Text Categorization. In

Proc. of the 14th International Conference on Ma-

chine Learning, pages 412–420.

Yarowsky, D. (1995). Unsupervised Word Sense Disam-

biguation Rivaling Supervised Methods. In Proc. of

the 33rd Annual Meeting of the Association for Com-

putational Linguistics, pages 189–196.