A Branch-and-Bound Approach to Efﬁcient Classiﬁcation and Retrieval

of Documents

Kotaro Ii

, Hiroto Saigo

and Yasuo Tabei

School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan

Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan

Keywords:

Text Classiﬁcation, Text Retrieval, Sufﬁx Tree, Branch and Bound, SEQL, fastText.

Abstract:

Text classiﬁcation and retrieval have been crucial tasks in natural language processing. In this paper, we

present novel techniques for these tasks by leveraging the invariance of feature order to the evaluation results.

Building on the assumption that text retrieval or classiﬁcation models have already been constructed from

the training documents, we propose efﬁcient approaches that can restrict the search space spanned by the test

documents. Our approach encompasses two key contributions. The ﬁrst contribution introduces an efﬁcient

method for traversing a search tree, while the second contribution involves the development of novel pruning

conditions. Through computational experiments using real-world datasets, we consistently demonstrate that

the proposed approach outperforms the baseline method in various scenarios, showcasing its superior speed

and efﬁciency.

1 INTRODUCTION

Text classiﬁcation has been an important task in nat-

ural language processing, where it is used for text re-

trieval, spam ﬁltering, sentiment analysis and opinion

mining (Pang and Lee, 2008; Medhat et al., 2014).

Great recent advances have been made using neural

networks (Vaswani et al., 2017; Johnson et al., 2017).

Among text classiﬁcation methods, linear classiﬁers

has long been popular due to their simplicity and

interpretability (Joachims, 1998; Fan et al., 2008).

Their properties are studied well, and known to of-

ten perform quite well if the correct features are pro-

vided (Wang and Manning, 2012). Moreover, they

can easily scale to very large corpora (Agarwal et al.,

2014). For instance, fastText has scaled linear classi-

ﬁers so that a billion of words can be trained (Joulin

et al., 2017). They also found that the use of N-

grams has a potential to improve the classiﬁcation

accuracy by implicitly considering the local word or-

ders. Although the use of N-grams as features con-

fronts a combinatorial explosion problem, there exists

efﬁcient implementations such as SEQL (Ifrim et al.,

2008; Ifrim and Wiuf, 2010).

In this paper, we introduce a simple yet efﬁcient

approach for evaluating text classiﬁcation models by

making use of the invariance of the feature order to

the evaluation results. The proposed method has var-

ious applications, but the most useful applications

would be; i) fast evaluation of document classiﬁcation

model, and ii) fast retrieval of the relevant documents.

In computational experiments, we demonstrate the ef-

ﬁciency of our approach in evaluating new test doc-

uments in both classiﬁcation and retrieval tasks. We

employ two existing methods for building text clas-

siﬁcation models; 1) SEQL is a logistic regression-

based classiﬁer with L1/L2 norm regularization. A

notable ability of SEQL is its efﬁcient extraction of

N-gram features by a branch-and-bound strategy. 2)

fastText is an another linear text classiﬁer, but is able

to employ pre-trained word vectors in a similar way

as in word2vec (Mikolov et al., 2013). We utilized

three commonly used datasets (Review, Spam, EC)

for computational evaluation and conﬁrmed the en-

hanced efﬁciency achieved by our approach in all

cases.

The contributions of our approach are the follow-

ings:

1. Proposal of an efﬁcient way of traversing a search

tree.

2. Development of new bounding conditions for

information retrieval and classiﬁcation.

3. Demonstration of the effectiveness in computa-

tional experiments using three real-world datasets.

Ii, K., Saigo, H. and Tabei, Y.

A Branch-and-Bound Approach to Efﬁcient Classiﬁcation and Retrieval of Documents.

DOI: 10.5220/0012310600003654

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2024), pages 205-214

ISBN: 978-989-758-684-2; ISSN: 2184-4313

205

This paper is organized as follows. Section 2 gives

a brief background about linear models in text classiﬁ-

cation. In Section 3, we describe details about our pro-

posed methods. In Section 4, we describe the datasets

used in the experiments, and experimental conditions.

Section 5 gives experimental results. Related works

and the differences from our approach are described

in Section 6. Section 7 summarizes the paper. Sec-

tion 8 overviews the possible impact of this work, and

Section 9 shows the limitation.

2 TEXT CLASSIFICATION WITH

BAG OF WORDS

The bag of words (BoW) is a simple and an efﬁ-

cient approach for representing sentences, where a

text is represented as the collection of its words. Typ-

ically, linear models such as linear / logistic regres-

sion or SVM are trained on the BoW representation.

Consider a dataset be {(x

, y

), (x

, y

), ..., (x

, y

)},

where x

represents the i-th text represented in the bag

of features, and y

∈ {0, 1} is the corresponding class

label. After training linear models, we can obtain a set

of features w

w and the corresponding weights β

β. The

resulting model for predicting the response of a new

text x

can be represented as

f (x

) =

∑

j=1

I(w

∈ x

), (1)

where I(w

∈ x

) is an indicator function that returns

1 if w

appears in x

, and 0 otherwise. Here, m is the

number of learned features. Equation (1) is readily

used for regression or ranking problems, but can be

used for a classiﬁcation tasks, since plugging it into

the sigmoid function σ(x

x) =

1+e

results in logistic re-

gression. This representation can accommodate word

embedding models of the form σ(W

W E

x)), where E

denotes a pre-trained word embedding matrix, W

W de-

notes the weight matrix speciﬁc to each task, and x

denotes a one-hot vector. We can observe that once

pre-trained word embedding E

E is obtained, then the

model is linear in W

W .

Below we consider a situation of evaluating the

test dataset D by a learned linear function f (.).

3 METHODS

In this section, we present our novel approach for efﬁ-

ciently evaluating linear models. Firstly, we introduce

the proposed data structure based on a trie. Next, we

Figure 1: Illustration of a trie storing {a, ai, at, o, on, of}.

The order of traversing the tree is shown next to each node.

Figure 2: Illustration of our proposed trie. Apart from the

original trie, each node stores not only a string, but also a

real number γ. Based on the value of γ, the order of tree

traversal changes. The order of traversing the tree is shown

next to each node.

describe our novel algorithms designed for the efﬁ-

cient evaluation of prediction models. Our focus is on

document classiﬁcation and retrieval, where we pro-

pose distinct algorithms with different pruning condi-

tions.

3.1 A Modiﬁed Trie Data Structure

A sufﬁx tree, also known as a trie, is a tree-shaped

data structure commonly used for string matching. In

a trie, each node represents a string, and all the chil-

dren of a node share a common preﬁx associated with

the parent node. The root of the trie corresponds to the

empty string. The basic trie data structure is shown in

Figure 1. In our approach, each node in a trie not only

stores a string but also a weight obtained from a clas-

siﬁcation algorithm (Figure 2). These weights play

a crucial role in efﬁcient classiﬁcation. During the

construction of a trie, we aggregate the weights of all

the descendant nodes, and record it as γ. Then, while

traversing the trie, we prioritize nodes with larger ab-

solute γ. In this manner, we can efﬁciently traverse

the part of a trie with the large weights, and terminate

the traversal before visiting nodes with small weights.

It therefore allows us to focus on the most relevant

parts of the trie and avoid unnecessary traversals.

3.2 Document Classiﬁcation

A naïve document classiﬁcation algorithm is shown in

Algorithm 1. For each data point, the algorithm exam-

ines whether each feature is present or not. If a feature

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

206

is included, the corresponding weight is summed up,

as shown in Equation (1). At the end of evaluation,

classiﬁcation results are determined by the signs of

the function values.

Data: D

D, w

w, β

Result: y

for i = 1, . . . , n do

s = 0;

for j = 1, . . . , m do

if w

∈ x

then

// string matching

end

s += β

;

end

if s ≥ 0 then

= 1;

end

else

= 0;

end

Algorithm 1: Naïve document classiﬁer.

An efﬁcient document classiﬁer is shown in Al-

gorithm 2, in which we employ a modiﬁed trie data

structure. The overall preprocessing step consists of

summing up the weights and sorting the weights, to-

gether requires O(m log m).

Next, we elaborate on our novel tree pruning con-

dition that can skip unnecessary search space. For

a notational simplicity, we call a function value as a

score s below, and the function value evaluated up to

the j-th feature in the i-th document as s

i j

, such that

i j

∑

k=1

I(w

∈ x

)

Theorem 1. Suppose that up to the j-th features are

evaluated in the i-th document, and let the current esti-

mate of score be s

i j

. If the lowerbound of s

i j

is strictly

positive, then one can determine the prediction for the

i-th document as positive, and safely skip the evalua-

tion of the rest of the features, where the lowerbound

is given as:

i j

∑

{k| j<k,β

<0}

. (2)

Proof. We split the score s

i j

into the already evaluated

part and the not yet evaluated part.

i j

∑

{k| j<k}

≥ s

i j

∑

{k| j<k,β

<0}

Data: D

D, w

w, β

Result: y

−

∑

{k|β

<0}

;

∑

{k|β

>0}

;

for l = 1, . . . , n do

−

∑

{k|β

<0,k∈subtree(l)}

;

∑

{k|β

>0,k∈subtree(l)}

;

end

for i = 1, . . . , n do

for j = 1, . . . , m do

need

i j

= True;

end

Sort(β

β, key = abs(γ

−

) + γ

);

for i = 1, . . . , n do

s = 0;

for j = 1, . . . , m do

if need

i j

= False then

continue;

end

if w

∈ x

then

s += β

;

if β

> 0 then

−= β

;

end

else

−

+= β

;

end

else

−= γ

;

−

+= γ

−

;

for k ∈ subtree( j) do

need

k j

= False;

end

if s ≥ 0 and Equation (2) holds then

= 1;

break;

end

if s < 0 and Equation (3) holds then

= 0;

break;

end

Algorithm 2: Efﬁcient document classiﬁer.

Then the lowerbound can be obtained by discarding

the positive weights. Once the lowerbound turns out

to be strictly positive, then it is guaranteed that the

prediction is positive without evaluating the rest of the

A Branch-and-Bound Approach to Efﬁcient Classiﬁcation and Retrieval of Documents

207

features.

Similarly, if the upperbound below is strictly neg-

ative, then the prediction of the i-th document is guar-

anteed to be negative.

i j

−

∑

{

>0}

. (3)

3.3 Document Retrieval

A naïve algorithm for document retrieval is shown in

Algorithm 3, where the goal is to retrieve the docu-

ment with the smallest score. For each data point, the

algorithm veriﬁes whether a speciﬁc string (feature) is

included or not. If a string (feature) is included, then

the corresponding weight is summed up.

Data: D

D, w

w, β

Result: i

∗

s = 0

for j = 1, . . . , m do

for i = 1, . . . , n do

if w

∈ x

then

+= β

;

end

∗

= arg min

return i

∗

;

Algorithm 3: Naïve document retriever.

The proposed efﬁcient document retrieval algo-

rithm is shown in Algorithm 4. Similarly to the doc-

ument classiﬁcation case, we employ a modiﬁed trie

as a basic data structure (Figure 2), and restrict the

search space by employing pruning conditions.

Theorem 2. Suppose that up to the j-th features are

evaluated, and the current minimum score s

∗

l j

is iden-

tiﬁed in the l-th document . If the following condition

is satisﬁed, then one can safely discard the document

i for further evaluation without losing the optimality;

∗

l j

∑

{k| j<k,β

>0}

< s

i j

∑

{k| j<k,β

<0}

. (4)

Proof. Similarly to the document classiﬁcation case,

we split s

i j

into the already evaluated part and the not

yet evaluated part, and estimate the lowerbound as:

i j

∑

{k| j<k}

≥ s

i j

∑

{k| j<k,β

<0}

. (5)

On the other hand, the upperbound of the current min-

imum score s

∗

l j

with respect to the i-th document can

Data: D

D, w

w, β

Result: i

∗

−

∑

{k|β

<0}

;

∑

{k|β

>0}

;

−

∑

{k|β

<0,k∈subtree(l)}

;

∑

{k|β

>0,k∈subtree(l)}

;

for i = 1, . . . , n do

for j = 1, . . . , m do

need

i j

= True;

end

Sort(β

β, key = abs(γ

−

) + γ

);

s = 0;

I = {1, 2, . . . , n};

for j = 1, . . . , m do

∗

= ∞;

for i ∈ I do

if need

i j

= False then

continue;

end

else if w

∈ x

then

+= β

;

if β

> 0 then

−= β

;

end

else

−

+= β

;

end

else

−= γ

;

−

+= γ

−

;

for k ∈ subtree( j) do

need

k j

= False;

end

if s

∗

> s

then

∗

= s

;

∗

= i;

end

for i ∈ I do

if Equation (4) holds then

remove i from I;

end

if len(I) == 1 then

∗

= i;

Break;

end

return i

∗

;

Algorithm 4: Efﬁcient document retriever.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

208

be estimated in a similar fashion as:

∗

l j

∑

{k| j<k}

≤ s

∗

l j

∑

{k| j<k,β

>0}

. (6)

If the lowerbound of s

i j

turns out to be larger than

the upperbound of s

∗

l j

, the it is guaranteed that the i-th

document never obtains smaller score than that of the

l-th document, so one can skip the further evaluation

of the i-th document without losing optimality.

The time complexity for the proposed approach

remains unchanged from that of text classiﬁcation and

remains O(mlogm). Note that it is straightforward to

extend our approach to retrieve the top-k documents

instead of retrieving the best document.

4 EXPERIMENTS

In this section, we demonstrate the effectiveness of

the proposed approach in three real-world datasets.

Our proposed approach can be applied to any linear

model, and enjoy the improved efﬁciency, however, in

the experiments below, we employ SEQL (Ifrim et al.,

2008; Ifrim and Wiuf, 2010)

and fastText (Joulin

et al., 2017)

due to their public availability and rich

features in text modeling.

4.1 Text Modeling by SEQL

SEQL is a linear model, and its objective function is

given as

min

∑

i=1

ξ(y

, x

, β

β) +Cα∥β

β∥

+C(1 − α)∥β

β∥

, (7)

where C and α are regularization parameters, and ξ(.)

denotes a loss function chosen either from hinge loss

or logistic loss. In our experiments, we set C to 1 and

α to 0. In addition, we choose logistic loss.

4.2 Text Modeling by fastText

fastText is a popular library for text classiﬁcation. In

fastText, the possibility that the i-th data is classiﬁed

to a class c is given as p(y

= c) = so f tmax

W E

where W

W is weight matrix and E

E is a pre-trained word

embedding matrix. The model is trained to minimize

the negative log-likelihood over the classes;

min

−

∑

i=1

log( f (W

W E

)) (8)

Available from https://github.com/heerme/seql-

sequence-learner

Available from https://fasttext.cc/

4.3 Dataset

The ﬁrst dataset used in the experiment is the IMDB

movie review dataset. The label indicates whether the

text is positive or negative. There are 25,000 movie

review texts with positive and negative contents, re-

spectively. The second dataset used in the experiment

is spam ham dataset available on Kaggle. The label

indicates whether the text is spam or ham. The data

size is 5171, and there are 1499 spam texts and 3672

ham texts. The third dataset used in the experiment

is EC dataset available on kaggle. The label indicates

whether the text is about household or books. The

data size is 31133, and there are 19313 texts about

household and 11820 texts about books.

4.4 Settings

For matching strings in texts, we employed a popu-

lar Boyer-Moore (BM) algorithm (Boyer and Moore,

1977). However, one can alternatively employ Knuth-

Morris-Pratt (KMP) (Knuth et al., 1977) as well. Ex-

periments were conducted to evaluate the pruning ef-

ﬁciency and time savings achieved by the proposed

method. So we investigate the extent to which the

method could efﬁciently prune branches within the

search space and quantify the resulting reduction in

processing time. In order to investigate the effect of

the size of the dataset, we ﬁxed the number of features

to 1000, and selected {20, 40, 60, 80, 100} samples

randomly from each dataset. We ran the above proce-

dures 100 times in each condition, and the averages

are reported.

5 RESULTS

Table 1 summarizes the efﬁciency gained for docu-

ment classiﬁcation problem. We can conﬁrm that our

pruning strategy was effective in all the datasets. It

appears that fastText slightly beneﬁts more from prun-

ing than SEQL. Table 2 summarizes the efﬁciency

gained for document retrieval problem. We can again

conﬁrm the effectiveness of our approach in all the

datasets. By comparing Tables 1 and 2, we observed

that that the document retrieval task beneﬁts slightly

more than the classiﬁcation task.

In the following subsections, we discuss the re-

sults obtained by each method in each dataset into de-

tails.

A Branch-and-Bound Approach to Efﬁcient Classiﬁcation and Retrieval of Documents

209

Table 1: Pruning efﬁciency in document classiﬁcation problem.

SEQL fastText

With Pruning Without Pruning With Pruning Without Pruning

Traversed[%] time[s] Traversed[%] time[s] Traversed[%] time[s] Traversed[%] time[s]

review 85.1 5935.8 100 6798.9 67.2 6770.2 100 9632.0

spam 73.8 374.1 100 485.9 68.4 765.5 100 984.9

EC 86.9 2282.9 100 2538.3 66.3 3831.7 100 5655.7

Table 2: Pruning efﬁciency in document retrieval problem.

SEQL fastText

With Pruning Without Pruning With Pruning Without Pruning

Traversed[%] time[s] Traversed[%] time[s] Traversed[%] time[s] Traversed[%] time[s]

review 71.6 5150.1 100 6717.8 59.6 6279.6 100 9911.8

spam 60.0 334.9 100 485.9 64.6 706.9 100 925.1

EC 82.8 2246.9 100 2500.8 57.0 3458.5 100 5636.6

Figure 3: Rise of the execution time in classifying doc-

uments modeled by SEQL with respect to the increase

in the dataset size in the review dataset (top), the spam

dataset (middle) and the EC dataset (bottom).

5.1 Document Classiﬁcation

Figure 3 and Figure 4 shows the rise of the execution

time and the rise of the number of prunings, respec-

Figure 4: Rise of the number of the prunings in classify-

ing documents modeled by SEQL with respect to the in-

crease in the dataset size in the review dataset (top), the

spam dataset (middle) and the EC dataset (bottom).

tively, in classifying documents modeled by SEQL

with respect to the increase in the dataset size in three

real datasets. The corresponding results for fastText

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

210

are shown in Figure 7 and Figure 8. We can observe

in all the case that our pruning approach was more efﬁ-

cient than the brute-force method over varying dataset

sizes.

Figure 5: Rise of the execution time in retrieving the best

document modeled by SEQL with respect to the increase

in the dataset size in the review dataset (top), the spam

dataset (middle) and the EC dataset (bottom).

5.2 Document Retrieval

Figure 5 and Figure 6 shows the rise of the execu-

tion time and the rise of the number of prunings, re-

spectively, in retrieving documents modeled by SEQL

with respect to the increase in the dataset size in three

real datasets. The corresponding results for fastText

are shown in Figure 9 and Figure 10. We can ob-

serve in all the case that our pruning approach was

more efﬁcient than the brute-force method over vary-

ing dataset sizes. This observation suggests that the

proposed method enjoys greater effectiveness when

applied to larger datasets.

Figure 6: Rise of the number of the prunings in retrieving

the best document modeled by SEQL with respect to the

increase in the dataset size in the review dataset (top), the

spam dataset (middle) and the EC dataset (bottom).

6 RELATED WORKS

In this subsection, we ﬁrst give related works and elab-

orate on the difference from this work. Retrieval of

top-k documents has been studied in the information

retrieval (Ding and Suel, 2011; Dimopoulos et al.,

2013). In this context, our approach is close to a

Term-At-A-Time (TAAT) traversal with early termi-

nation. However, there exists substantial differences.

First of all, in information retrieval, the goal is to re-

trieve relevant documents as fast as possible, so the

documents are pre-processed such that the appearance

of key terms in each document is recorded in inverted

lists. Although it plays a key role in fast information

retrieval, not-yet processed documents are kept out of

the scope. On the other hand, our approach has a built-

in string matching engine, which enables us to handle

raw streaming data such as those in SNS. As a result,

one can pre-train a classiﬁer using hate speech dataset,

A Branch-and-Bound Approach to Efﬁcient Classiﬁcation and Retrieval of Documents

211

Figure 7: Rise of the execution time in classifying doc-

uments modeled by fastText with respect to the increase

in the dataset size in the review dataset (top), the spam

dataset (middle) and the EC dataset (bottom).

then use it to identify suspicious posts quickly. In ad-

dition to the aforementioned contributions, we also

proposed an efﬁcient method for classifying all the

documents, which sets our approach apart from exist-

ing methods in text retrieval.

7 CONCLUSION AND

DISCUSSION

In this paper, we have proposed efﬁcient ap-

proaches for document classiﬁcation and document

retrieval.Our approach encompasses two key contribu-

tions. The ﬁrst one introduces an efﬁcient method for

traversing a search tree, while the second one involves

the development of novel pruning conditions. Com-

putational experiments using the real world dataset

conﬁrmed the effectiveness of the pruning consis-

tently over three real-world datasets. One of our fu-

Figure 8: Rise of the number of the prunings in classify-

ing documents modeled by fastText with respect to the in-

crease in the dataset size in the review dataset (top), the

spam dataset (middle) and the EC dataset (bottom).

ture work is to investigate the effectiveness of our ap-

proach when applied to pre-trained word embedding

model, especially in very large datasets. Lastly, our

approach is applicable to any linear model and its ex-

tensions. Examples include logistic regression (Har-

rell, 2001; Hosmer and Lemeshow, 2000; Chen et al.,

2017), linear SVM (Demiriz et al., 2002), naive

Bayes (Kim et al., 2000), decision tree, and boosted

linear model (Mamitsuka and Naoki, 1998; Schapire

and Singer, 2000).

8 IMPACT OF THIS WORK

The evaluation of a linear model is indeed a funda-

mental task in machine learning, and our approach

can be beneﬁcial for various learning tasks. Exam-

ples of these tasks include regression, classiﬁcation,

and ranking. Our method is particularly useful for lin-

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

212

Figure 9: Rise of the execution time in retrieving the best

document modeled by fastText with respect to the increase

in the dataset size in the review dataset (top), the spam

dataset (middle) and the EC dataset (bottom).

ear models where the features have structures such as

texts, trees, and graphs, as evaluating such models re-

quires repeated matching of structures. The beneﬁts

of our approach would be maximized when evaluat-

ing graphs, as it involves solving the subgraph isomor-

phism problem (Kudo et al., 2005).

9 LIMITATION

In this paper, the focus of the document classiﬁcation

problem was restricted to a binary case. While it is

relatively straightforward to extend our approach to

handle the multi-class case, the efﬁciency beneﬁts in

that scenario may be less pronounced. The same phe-

nomena would likely be observed for the top-k docu-

ment retrieval problem, especially when dealing with

a large value of k.

Figure 10: Rise of the number of the prunings in retrieving

the best document modeled by fastText with respect to the

increase in the dataset size in the review dataset (top), the

spam dataset (middle) and the EC dataset (bottom).

REFERENCES

Agarwal, A., Chapelle, O., Dudikib, M., and Langford,

J. (2014). A reliable effective terascale linear learn-

ing system. Journal of Machine Learning Research,

15:1111—-1133.

Boyer, R. and Moore, J. S. (1977). A fast string searching

algorithm. Comm. ACM. New York: Association for

Computing Machinery, 20(10):762—-772.

Chen, W., Xie, X., Wang, J., Pradhan, B., Hong, H., Bui,

D., Duan, Z., and Ma, J. (2017). A comparative study

of logistic model tree, random forest, and classiﬁca-

tion and regression tree models for spatial prediction

of landslide susceptibility. Catena, 151:147––160.

Demiriz, A., Bennet, K., and Shawe-Taylor, J. (2002). Lin-

ear programming boosting via column generation. Ma-

chine Learning, 46(1-3):225–254.

Dimopoulos, C., Nepomnyachiy, S., and Suel, T. (2013).

Faster top-k document retrieval using block-max in-

A Branch-and-Bound Approach to Efﬁcient Classiﬁcation and Retrieval of Documents

213

dexes. In Proceedings of the ACM International Con-

ference on Web Search and Data Mining (WSDM).

Ding, S. and Suel, T. (2011). Faster top-k document re-

trieval using block-max indexes. In Proceedings of

the Annual International ACM SIGIR Conference on

Research and Development in Information Retrieval

(SIGIR).

Fan, R., Chang, K., Hsieh, C., X.R., W., and Lin, C.

(2008). Liblinear: A library for large linear classiﬁca-

tion. Journal of Machine Learning Research, 9:1871–

–1874.

Harrell, F. (2001). Ordinal logistic regres-

sion. (In Regression Modeling Strategies).

Springer:Berlin/Heidelberg,Germany.

Hosmer, D. and Lemeshow, S. (2000). R.X. Applied Logis-

tic Regression. John Wiley and Sons: Hoboken, NJ,

USA.

Ifrim, G., Bakir, G., and Weikum, G. (2008). Fast logistic

regression for text categorization with variable-length

n-grams. In Proceedings of the International Con-

ference on Knowledge Discovery and Data Mining

(KDD).

Ifrim, G. and Wiuf, C. (2010). Bounded coordinate-descent

for biological sequence classiﬁcation in high dimen-

sional predictor space. In Proceedings of the Interna-

tional Conference on Knowledge Discovery and Data

Mining (KDD).

Joachims, T. (1998). Text categorization with support vector

machines: Learning with many relevant features. In

In Proceedings of the 10th European Conference on

Machine Learning, pages 137—-142.

Johnson, M., Schuster, M., Le, Q., Krikun, M., Wu, Y.,

Chen, Z., Thorat, N., Viégas, F., Wattenberg, M., Cor-

rado, G., Hughes, M., and Dean, J. (2017). Google’s

multilingual neural machine translation system: En-

abling zero-shot translation. Transactions of the As-

sociation for Computational Linguistics, pages 339–

351.

Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T.

(2017). Bag of tricks for efﬁcient text classiﬁcation.

In In Proceedings of the 15th Conference of the Eu-

ropean Chapter of the Association for Computational

Linguistics.

Kim, Y., Hahn, S., and Zhang, B. (2000). Text ﬁltering

by boosting naive bayes classiﬁers. In In Proceedings

of the 23rd Annual International ACM SIGIR Confer-

ence on Research and Development in Information Re-

trieval.

Knuth, D., Morris, J., and Pratt, V. (1977). Fast pattern

matching in strings. SIAM Journal on Computing,

6(2):323—-350.

Kudo, T., Maeda, E., and Matsumoto, Y. (2005). An applica-

tion of boosting to graph classiﬁcation. In Advances

in Neural Information Processing Systems 17 , pages

729–736. MIT Press.

Mamitsuka, H. and Naoki, A. (1998). Query learning strate-

gies using boosting and bagging. In Proceedings of the

Fifteenth International Conference on Machine Learn-

ing (ICML98).

Medhat, W., Hassan, A., and Korashy, H. (2014). Sentiment

analysis algorithms and applications: A survey. Ain

Sham Engineering Journal, 5:1093—-1113.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).

Efﬁcient estimation of word representations in vector

space. In In Proceedings of the 1st International Con-

ference on Learning Representations.

Pang, B. and Lee, L. (2008). Opinion mining and sentiment

analysis. Foundations and Trends in Information Re-

trieval, 2(1-2):1––135.

Schapire, R. and Singer, Y. (2000). Boostexter: A boosting-

based system for text categorization. Mach. Learn.,

39:135—-168.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A., Kaiser, L., and Polosukhin, I. (2017).

Attention is all you need. In In Proceedings of the

31th Conference on Neural Information Processing

Systems.

Wang, S. and Manning, C. (2012). Baselines and bigrams:

Simple, good sentiment and topic classiﬁcation. In In

Proceedings of the 50th Annual Meeting of the Associ-

ation for Computational Linguistics, pages 90––94.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

214