Ofﬂine Text-Independent Arabic and Chinese Writer Identiﬁcation

Using a Multi-Segmentation Codebook-Based Strategy

Mohamed Nidhal Abdi

1,2 a

and Maher Khemakhem

3 b

Mir@cl Laboratory, FSEG, University of Sfax, Tunisia

Institut Sup

erieur des Math

ematiques Appliqu

ees et de l’Informatique, ISMAI, University of Kairouan, Tunisia

Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

Keywords:

Writer Identiﬁcation, Grapheme Codebook, Segmentation, K-Medoid Clustering, Feature Combination.

Abstract:

Many approaches rely on segmentation for ofﬂine text-independent writer identiﬁcation. Segmentation

schemes based on contours, junctions and projections are widely used and are very effective with Latin al-

phabet handwriting. However, these schemes seem to be less consistent in capturing writer individuality with

Arabic and Chinese. As writing systems, the latter languages are morphologically different and are con-

sidered more complex than Latin alphabet languages. In this paper, four different segmentation techniques

are tested for the identiﬁcation of Arabic and Chinese writers. Then, these techniques are combined to in-

crease the accuracy of identiﬁcation. Experiments were realized on handwriting samples by 300 writers from

Arabic IFN/ENIT dataset and 300 writers from Chinese HIT-MW dataset. An additional 300 writers from

English/German CVL dataset were used as a control group. Taken separately, these segmentation techniques

that gave good results with CVL (Top1% = 99.00%) were not as conclusive with IFN/ENIT and HIT-MW.

Nevertheless, the use of different types of segmentation in combination proved to be highly efﬁcient for Ara-

bic and Chinese with Top1% = 96.33% and Top1% = 91.33%, respectively.

1 INTRODUCTION

Arabic and Chinese are considered complex scripts

by learners and scholars alike (Guellil et al., 2021;

Tahsildar, 2019). Although its writing system is 14

centuries old, the Arabic language changed little over

the years and is still readable in its ancient form by

contemporary readers. In both its handwritten and

printed forms, it is a right-to-left semi-cursive script

that exhibits unique morphological features. Writers

are allowed to merge certain letters according to their

handwriting style, simplify others and arbitrarily in-

sert elongations between connected letters. A letter

can have up to four different forms depending on its

position in the word (Amara and Bouslama, 2003).

Moreover, Arabic calligraphy is well known for being

particulary permissive with letter shapes (Alshahrani,

2008).

On the other hand, Chinese is a 30-centuries-

old logographic writing system where visually com-

plex handwritten characters are composed using di-

https://orcid.org/0009-0002-8783-1642

https://orcid.org/0000-0002-1287-1634

rectional strokes (Wang et al., 1999). Most words

consist of two or more characters from approx-

imately 50000 available ones, hence the lack of

the grapheme-phoneme correspondence characteriz-

ing alphabet-based languages (Taylor and Taylor,

2014). Chinese characters are in majority (80%)

phonetic-logographic characters, formed of combina-

tions and recombination of radical components for

meaning, and phonetic components for pronuncia-

tion. The remaining character categories are pic-

tographs, ideographs, denotation of events, ﬁgura-

tive extension of meaning and phonetic loans (Chen,

1996).

Up until recently, the automatic identiﬁcation of

Arabic and Chinese writers was not addressed as ex-

tensively as with the English language. In the con-

text of ofﬂine text-independent identiﬁcation, this pa-

per aims at assessing the extent to which segmenta-

tion schemes that proved efﬁcient for Latin alphabet

languages work for Arabic and Chinese. Thus, four

different segmentation techniques are studied. Exper-

iments are conducted with Arabic IFN/ENIT, Chinese

HIT-MW and compared to the English/German CVL

dataset results.

Abdi, M. and Khemakhem, M.

Ofﬂine Text-Independent Arabic and Chinese Writer Identiﬁcation Using a Multi-Segmentation Codebook-Based Strategy.

DOI: 10.5220/0012297600003654

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2024), pages 613-619

ISBN: 978-989-758-684-2; ISSN: 2184-4313

613

In the remainder of this paper, section 2 surveys

the recent literature of Arabic and Chinese writer

identiﬁcation. Section 3 summarizes the approach

proposed. In sections 4 and 5, segmentation tech-

niques are presented and codebook generation is ex-

plained. Experimental results are detailed in section 6

and the conclusions are drawn in the last section.

2 RELATED WORKS

A review of writer identiﬁcation for Arabic is hereby

presented. Al-Ma’adeed et al. combined multi-

scale edge-hinge and grapheme features for Arabic

writer identiﬁcation in (Al-Ma’adeed et al., 2008).

Djeddi and Souici-Meslati applied artiﬁcial immune

recognition system (AIRS) in (Djeddi and Souici-

Meslati, 2011) with grey level co-occurrence matri-

ces (GLCM). kNN, SVM and na

ıve Bayes were also

tested from classiﬁcation. For historical writer iden-

tiﬁcation, Fecker et al. (Fecker et al., 2014) intro-

duced multiple features, namely histograms of ori-

ented gradients (HOG), oriented basic image (OBI)

features and scale-invariant feature transform (SIFT).

Feature combination was achieved using averaging

and voting schemes. Synthetic grapheme code-

books were proposed for Arabic in (Abdi and Khe-

makhem, 2015) with a model-based segmentation-

free approach. Bagged discrete cosine transform

(BDCT) descriptors were utilized in an approach by

Khan et al. (Khan et al., 2017) that was trained and

tested on IFN/ENIT and AHTID/MW datasets. The

textural features of local binary patterns (LBP), local

ternary patterns (LTP) and local phase quantization

were extracted in (Chahi et al., 2019). As for classi-

ﬁcation, Chahi et al. opted for a 1-nearest neighbor

classiﬁer. Semma et al. presented an approach based

on deep learning in (Semma et al., 2021), where fea-

tures of interest are determined with accelerated seg-

ment test (FAST) key points and Harris corner de-

tector. Rasoulzadeh and BabaAli (Rasoulzadeh and

BabaAli, 2022) tested a neural network architecture

inspired by the vector of locally aggregated descrip-

tors (VLAD) on Arabic KHATT dataset among other

datasets. Finally, Ahmed et al. explained concav-

ity/convexity of contours (CON

) and contour point

curve angle (CPCA) codebooks for Arabic writer

identiﬁcation in (Ahmed et al., 2023).

As for Chinese writer identiﬁcation, He et al. pro-

posed two-dimensional wavelet textural features and

hidden Markov tree (HMT) for similarity classiﬁca-

tion (He et al., 2008). Tan et al. explained in (Tan

et al., 2011) sixteen morphological handwriting fea-

tures extracted from characters bounding and TBLR

quadrilateral boxes. Four more features also used

adaptative similarity adjustment. In a bag-of-features

approach (Hu et al., 2014), Hu et al. compared SIFT

descriptors in the form of improved Fisher kernels

(IFK) and locality-constrained linear coding (LLC) to

hard voting (HV) and vector quantization (VQ). Yang

et al. employed deep convolutional neural network

(DCNN) for writer identiﬁcation (Yang et al., 2015),

trained on an augmented dataset. Another deep learn-

ing approach that relies on deep multi-stream CNN

is explained in (Xing and Qiao, 2016). Xiong and

Lu used contour-directional features (CDF) and char-

acter pair similarity measurement (CPSM) in (Xiong

and Lu, 2017). A global identiﬁcation scheme based

on edge co-occurrence features (ECF) was enhanced

in (Xiong et al., 2019) with displacement ﬁeld-based

similarity (DFS) to conﬁrm characters’ authorship.

Local phase quantization (LPQ) features were fused

with PCA-reduced deep learning features in another

approach (Xu et al., 2021). Finally, Semma et al.

(Semma et al., 2022) tested features learned from

DCNN and encoded with VLAD on multiple lan-

guages including Chinese.

3 PROPOSED APPROACH

Our proposed approach encompasses a training phase

and a testing phase. In the ﬁrst phase, handwriting is

segmented into graphemes that are clustered with the

K-medoid algorithm to form a grapheme codebook.

Codebooks serve to encode handwriting samples into

grapheme histograms prior to similarity comparisons.

In the second phase, classiﬁcation is performed using

the City-Block metric, where unknown samples are

matched against samples of established authorship.

Finally, writer normalized distances obtained with in-

dividual codebooks are combined using a log-based

weighting formula to enhance identiﬁcation outcome.

Results of Arabic IFN/ENIT and Chinese HIT-MW

datasets are compared and interpreted in relation to

those of English/German CVL dataset.

4 SEGMENTATION

Handwriting is binarized using Otsu’s thresholding

algorithm (Otsu, 1979). To yield separate graphemes,

segmentation operates by disrupting ink continuity at

speciﬁc morphological features. Four basic segmen-

tation techniques are considered, where handwriting

is segmented at the level of (a) contours, (b) morpho-

logical points of interest (MPI), (c) vertical projection

maxima and (d) ﬁlled lower halves minima.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

614

4.1 Contours

The external layout of handwriting shapes are sepa-

rated and taken as segmentation output. Hence, dis-

ruption of ink continuity is realized at full connected-

component contours. As illustrated in Figure 1,

resulting graphemes may consist in Arabic cursive

words or piece of words (PAW), Chinese logographs

or parts of them, and Roman letters or group of letters.

Figure 1: Contour-level segmentation of Arabic, Chinese

and English.

4.2 Morphological Points of Interest

Branch points, cross-points and corner-points are the

MPI marking segmentation disruption points (Abdi

et al., 2009). They are detected in thinned handwrit-

ing skeletons. The segmentation product is the con-

nected components left after the deletion of MPI pix-

els along with their 8-connected neighbors and the re-

moval of residual small components. Figure 2 shows

the result consisting of detached Arabic segments and

ligatures, pen stokes from Chinese logographs and

Roman letter fragments.

Figure 2: MPI-level segmentation of Arabic, Chinese and

English.

4.3 Vertical Projection Maxima

Handwriting is vertically projected and its curve

smoothed with a moving average of 8 pixels. Lo-

cal projection maxima are detected with a δ = 2

threshold. These maxima, excluding the ﬁrst and

the last ones, are used to divide the script in a ver-

tical manner (Figure 3). Morphologically speaking,

the segmentation lines obtained tend to coincide with

the height peak of certain characters in Arabic, lo-

gographs’ largest vertical lines in Chinese and Roman

letter halves.

Figure 3: Segmentation of Arabic, Chinese and English

based on vertical projection maxima.

4.4 Filled Lower Halves Minima

Handwriting shapes are processed as connected com-

ponents. First, enclosed regions in the shape are ﬁlled.

Then, using the position of its largest horizontal pro-

jection line, the upper half of its bounding box is ﬁlled

so that only the protruding structures of the lower

half remain intact. After vertical projection, curve

smoothing occurs with a window of 8 pixels. Fi-

nally, the local minima x-coordinates, expect for the

ﬁrst and last ones, are detected with a δ = 1 threshold

and retained for vertical segmentation of the original

handwriting. An example of segmentation results is

depicted in Figure 4.

Figure 4: Segmentation of Arabic, Chinese and English

based on ﬁlled lower halves minima.

Each previously described segmentation tech-

nique concludes with a post-processing step, which

starts with a morphological dilation using a struc-

turing element of 5 pixels and ends with extract-

ing the contours of the resulting shapes as the ﬁnal

graphemes to be retained.

5 CODEBOOK GENERATION

Codebooks are created in two steps: graphemes are

ﬁrst encoded into feature vectors (FV), then processed

with a clustering algorithm.

Feature encoding consists in subsampling

grapheme shapes to 50 evenly spaced points and

encoding them into 100-dimensional FVs of polar

coordinates. An angular coordinate of a point is taken

in [0,90

◦

] with origin (0, 0) being the upper-left cor-

ner of the grapheme’s bounding box, and its distance

Ofﬂine Text-Independent Arabic and Chinese Writer Identiﬁcation Using a Multi-Segmentation Codebook-Based Strategy

615

(a) Arabic VerticalSeg (b) Chinese MPISeg (c) English LowerSeg

Figure 5: Best performing segmentation-issued codebooks for (a) IFN/ENIT, (b) HIT-MW and (c) CVL datasets.

coordinate is noted in pixels. FVs are standardized

feature-wise to mean zero and standard deviation one

according to the training dataset’s global values.

Clustering of encoded graphemes is realized on

the training dataset with the K-medoid algorithm

(Kaur et al., 2014). The latter takes graphemes as

cluster centers (medoids) instead of graphemes’ mean

values (centroids) and is more resilient to noise and

outliers than K-means. For initial medoid attribution,

the K-means++ algorithm is retained (Arthur and Vas-

silvitskii, 2007). Average medoid dissimilarity is iter-

atively minimized until convergence. Therefore, four

codebooks of 144 graphemes are computed for Ara-

bic, Chinese and English for each segmentation tech-

nique explained earlier.

6 EXPERIMENTAL RESULTS

Experimentations are realized on handwriting sam-

ples by 300 writers from Arabic IFN/ENIT (Pechwitz

et al., 2002) and 300 writers from Chinese HIT-MW

(Su et al., 2007). An additional 300 writers from En-

glish/German CVL (Kleber et al., 2013) are retained

as a control group for results comparison (Figure 6).

Handwriting data is divided into equal training

and testing parts per writer. To classify a testing

sample, the training samples are ranked according to

their similarity yielding one (Top1) or N (TopN) most

probable matches. Distances are computed with the

City-Block metric as follows:

d(u,v) =

∑

i=1

− v

| (1)

where u is the testing sample, v is the training sample

and n is the feature vector dimensionality (100).

6.1 Writer Identiﬁcation

Writer identiﬁcation results of the codebooks associ-

ated to the different segmentation techniques are re-

ae87

ai38

aq41

(a)

b04010709

b04091301

b04052002

(b)

0023-7

0164-3

0084-1

(c)

Figure 6: Handwriting samples from (a) IFN/ENIT, (b)

HIT-MW and (c) CVL datasets.

ported for IFN/ENIT, HIT-MW and CVL. In Table 1,

writer identiﬁcation accuracy is presented in the form

of Top1 and Top5 percentages (Top1% and Top5%).

ContourSeg, MPISeg, VerticalSeg and LowerSeg are

the codebooks based on contours, MPI, vertical pro-

jection maxima and ﬁlled lower halves minima seg-

mentations, respectively.

All four segmentation techniques worked well for

CVL with a Top1% ranging from 98.00% to 99.00%.

Lower performances were obtained for Arabic and

Chinese with IFN/ENIT giving a Top1% in the 58.33-

70.67% range and HIT-MW a Top1% in the 45.00-

59.00% range. The best results were achieved with

VerticalSeg for Arabic (Top1% = 70.67%) using the

codebook shown in Figure 5a, and with MPISeg for

Chinese (Top1% = 59.00%) using the codebook in

Figure 5b. On the other hand, LowerSeg did not per-

form well for Arabic (Top1% = 58.33%) and Chinese

(Top1% = 45.00%), despite being the best performing

codebook with CVL (Top1% = 99.00%) as illustrated

in Figure 5c.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

616

Table 1: Writer identiﬁcation results.

IFN/ENIT HIT-MW CVL (control)

Codebook▽ Top1% Top5% Top1% Top5% Top1% Top5%

ContourSeg 59.33 79.67 49.67 73.67 98.33 99.00

MPISeg 62.33 86.33 59.00 80.00 98.67 99.33

VerticalSeg 70.67 88.67 47.67 71.67 98.00 99.67

LowerSeg 58.33 82.00 45.00 68.00 99.00 100.00

Table 2: Writer identiﬁcation combination (IFN/ENIT).

Codebook combination▽ Top1% Top5%

VerticalSeg & MPISeg 95.67 98.67

VerticalSeg & MPISeg & ContourSeg 95.67 98.67

VerticalSeg & MPISeg & ContourSeg & LowerSeg 96.33 98.00

Table 3: Writer identiﬁcation combination (HIT-MW).

Codebook combination▽ Top1% Top5%

MPISeg & ContourSeg 89.67 97.00

MPISeg & ContourSeg & VerticalSeg 90.33 97.67

MPISeg & ContourSeg & VerticalSeg & LowerSeg 91.33 98.33

For morphological reasons, the codebooks that

succeeded in capturing inter-writer variability of CVL

writers were not as biometrically signiﬁcant with

IFN/ENIT and HIT-MW. To address this issue, a

multi-segmentation approach to Arabic and Chinese

is proposed.

6.2 Codebooks Combination

After classiﬁcation, the training/testing distance ma-

trix obtained with each codebook is min-max nor-

malized in [0,150]

N, using the global values of the

training dataset inter-sample distance matrix. Matri-

ces are ordered by their Top1% in a descending or-

der. The following reciprocal log weighting formula

is used to merge matrices element-wise:

d =

∑

i=1

(

1/log(1 + i/10)

∑

i=1

1/log(1 + i/10)

) (2)

where

d is the merged distance, n is the codebooks

count (up to 4) and d

the training/testing distance

according to codebook number i. Table 2 and Ta-

ble 3 present the combination results for IFN/ENIT

and HIT-MW, respectively.

Codebook combination substantially increased

identiﬁcation rates for IFN/ENIT and HIT-MW. For

Arabic, the combination of the two best performing

codebooks, i.e., VerticalSeg and MPISeg, increased

Top1% by 25 percentage points. For Chinese, Top1%

gained 30.67 percentage points by combining MP-

ISeg and ContourSeg. The best Top1% values are ob-

tained by combining all codebooks, and are 96.33%

and 91.33% for IFN/ENIT and HIT-MW, respectively.

7 CONCLUSIONS

This paper addressed the efﬁciency of segmentation

for Arabic and Chinese writer identiﬁcation in com-

parision to Latin alphabet handwriting. Segmenta-

tion schemes based on contours, MPI, vertical pro-

jection maxima and ﬁlled lower halves minima were

considered. Then, the issued graphemes were sub-

sampled and their polar coordinates encoded into

feature vectors. Codebooks were generated using

the K-medoid clustering algorithm with experimenta-

tions performed on Arabic IFN/ENIT, Chinese HIT-

MW and English/German CVL datasets. Taken in-

dividually, the studied segmentation techniques did

not perform as well for Arabic (Top1% = 70.67%)

and Chinese (Top1% = 59.00%) as they did for En-

glish/German (Top1% = 99.00%). However, combin-

ing codebook results at the level of normalized train-

ing/testing distance matrices substantially enhanced

writer identiﬁcation, with Top1% = 96.00% for Ara-

bic and Top1% = 91.33% for Chinese. Further inves-

tigations are being conducted to assess the impact of

other aspects such as preprocessing and clustering on

Arabic and Chinese writer identiﬁcation.

REFERENCES

Abdi, M. N. and Khemakhem, M. (2015). A model-based

approach to ofﬂine text-independent arabic writer

identiﬁcation and veriﬁcation. Pattern Recognition,

48(5):1890–1903.

Abdi, M. N., Khemakhem, M., and Ben-Abdallah, H.

Ofﬂine Text-Independent Arabic and Chinese Writer Identiﬁcation Using a Multi-Segmentation Codebook-Based Strategy

617

(2009). A novel approach for off-line arabic writer

identiﬁcation based on stroke feature combination. In

2009 24th International Symposium on Computer and

Information Sciences, pages 597–600. IEEE.

Ahmed, B. Q., Hassan, Y. F., and Elsayed, A. S. (2023).

Ofﬂine text-independent writer identiﬁcation using

a codebook with structural features. Plos one,

18(4):e0284680.

Al-Ma’adeed, S., Al-Kurbi, A.-A., Al-Muslih, A., Al-

Qahtani, R., and Kubisi, H. A. (2008). Writer

identiﬁcation of arabic handwriting documents using

grapheme features. In 2008 IEEE/ACS International

Conference on Computer Systems and Applications.

IEEE.

Alshahrani, A. A. (2008). Arabic script and the rise of ara-

bic calligraphy. Online Submission.

Amara, N. E. B. and Bouslama, F. (2003). Classiﬁcation

of arabic script using multiple sources of information:

State of the art and perspectives. Document Analysis

and Recognition, 5:195–212.

Arthur, D. and Vassilvitskii, S. (2007). K-means++ the ad-

vantages of careful seeding. In Proceedings of the

eighteenth annual ACM-SIAM symposium on Discrete

algorithms, pages 1027–1035.

Chahi, A., Ruichek, Y., Touahni, R., et al. (2019). An effec-

tive and conceptually simple feature representation for

off-line text-independent writer identiﬁcation. Expert

Systems with Applications, 123:357–376.

Chen, M. J. (1996). An overview of the characteristics of

the chinese writing system. Asia Paciﬁc Journal of

Speech, Language and Hearing, 1(1):43–54.

Djeddi, C. and Souici-Meslati, L. (2011). Artiﬁcial immune

recognition system for arabic writer identiﬁcation. In

International Symposium on Innovations in Informa-

tion and Communications Technology. IEEE.

Fecker, D., Asit, A., M

argner, V., El-Sana, J., and Fin-

gscheidt, T. (2014). Writer identiﬁcation for histori-

cal arabic documents. pages 3050–3055, Stockholm,

Sweden. IEEE.

Guellil, I., Sa

adane, H., Azouaou, F., Gueni, B., and Nou-

vel, D. (2021). Arabic natural language process-

ing: An overview. Journal of King Saud University-

Computer and Information Sciences, 33(5):497–507.

He, Z., You, X., and Tang, Y. Y. (2008). Writer identiﬁca-

tion of chinese handwriting documents using hidden

Markov tree model. Pattern Recognition, 41(4):1295–

1307.

Hu, Y., Yang, W., and Chen, Y. (2014). Bag of features

approach for ofﬂine text-independent chinese writer

identiﬁcation. In 2014 IEEE International Conference

on Image Processing (ICIP), pages 2609–2613. IEEE.

Kaur, N. K., Kaur, U., and Singh, D. (2014). K-Medoid

clustering algorithm-a review. Int. J. Comput. Appl.

Technol, 1(1):42–45.

Khan, F. A., Tahir, M. A., Kheliﬁ, F., Bouridane, A., and

Almotaeryi, R. (2017). Robust off-line text indepen-

dent writer identiﬁcation using bagged discrete cosine

transform features. Expert Systems with Applications,

71:404–415.

Kleber, F., Fiel, S., Diem, M., and Sablatnig, R. (2013).

CVL-database: An off-line database for writer re-

trieval, writer identiﬁcation and word spotting. In

2013 12th international conference on document anal-

ysis and recognition, pages 560–564. IEEE.

Otsu, N. (1979). A threshold selection method from gray-

level histograms. IEEE Trans. Syst. Man Cybern.,

9(1):62–66.

Pechwitz, M., Maddouri, S. S., M

argner, V., Ellouze, N.,

Amiri, H., et al. (2002). IFN/ENIT-database of hand-

written arabic words. In Proc. of CIFED, volume 2,

pages 127–136. Citeseer.

Rasoulzadeh, S. and BabaAli, B. (2022). Writer identiﬁ-

cation and writer retrieval based on NetVLAD with

re-ranking. IET Biometrics, 11(1):10–22.

Semma, A., Hannad, Y., Siddiqi, I., Djeddi, C., and El Ket-

tani, M. E. Y. (2021). Writer identiﬁcation using deep

learning with fast keypoints and harris corner detector.

Expert Systems with Applications, 184:115473.

Semma, A., Hannad, Y., Siddiqi, I., Lazrak, S., and El Ket-

tani, M. E. Y. (2022). Feature learning and encod-

ing for multi-script writer identiﬁcation. International

Journal on Document Analysis and Recognition (IJ-

DAR), 25(2):79–93.

Su, T., Zhang, T., and Guan, D. (2007). Corpus-based

HIT-MW database for ofﬂine recognition of general-

purpose chinese handwritten text. International Jour-

nal of Document Analysis and Recognition (IJDAR),

10:27–38.

Tahsildar, M. N. (2019). Chinese language complexities

among international students in china. Education

Quarterly Reviews, 2(1):67–76.

Tan, J., Lai, J.-H., Wang, C.-D., and Feng, M.-S. (2011).

A stroke shape and structure based approach for

off-line chinese handwriting identiﬁcation. Interna-

tional Journal of Intelligent Systems and Applications,

3(2):1.

Taylor, M. M. and Taylor, I. (2014). Writing and literacy in

chinese, korean and japanese. Writing and Literacy in

Chinese, Korean and Japanese, pages 1–506.

Wang, J., Chen, H.-C., Radach, R., and Inhoff, A. (1999).

Reading Chinese script: A cognitive analysis. Psy-

chology Press.

Xing, L. and Qiao, Y. (2016). Deepwriter: A multi-stream

deep CNN for text-independent writer identiﬁcation.

In 2016 15th international conference on frontiers

in handwriting recognition (ICFHR), pages 584–589.

IEEE.

Xiong, Y.-J., Liu, L., Lyu, S., Wang, P. S., and Lu, Y. (2019).

Improving text-independent chinese writer identiﬁca-

tion with the aid of character pairs. International Jour-

nal of Pattern Recognition and Artiﬁcial Intelligence,

33(02):1953001.

Xiong, Y.-J. and Lu, Y. (2017). Chinese writer identiﬁcation

using contour-directional feature and character pair

similarity measurement. In 2017 14th IAPR Interna-

tional Conference on Document Analysis and Recog-

nition (ICDAR), volume 1, pages 119–124. IEEE.

Xu, Y., Chen, Y., Cao, Y., and Zhao, Y. (2021). A deep

learning method for chinese writer identiﬁcation with

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

618

feature fusion. In Journal of Physics: Conference Se-

ries, volume 1883, page 012142. IOP Publishing.

Yang, W., Jin, L., and Liu, M. (2015). Chinese character-

level writer identiﬁcation using path signature feature,

DropStroke and deep CNN. In 2015 13th Interna-

tional Conference on Document Analysis and Recog-

nition (ICDAR), pages 546–550. IEEE.

Ofﬂine Text-Independent Arabic and Chinese Writer Identiﬁcation Using a Multi-Segmentation Codebook-Based Strategy

619