is imperative in the development of Amharic IR. 
Although some efforts have been made to develop 
Amharic IR systems using stems, their effectiveness 
with respect to the use of various forms has not been 
systematically analyzed thus far. Therefore, this 
research analyzes the use of stems and roots for 
content representation and investigates their effects 
on Amharic IR. 
The rest of this paper is organized as follows. 
Section 2 describes Amharic language and its 
morphology. Section 3 discusses related work and 
Section 4 presents how documents and queries are 
represented in Amharic IR system. Experimental 
results and evaluation are discussed in Section 5. In 
Section 6, we make conclusion along with the way 
forward in Amharic IR. 
2 Amharic LANGUAGE 
Amharic is the official language of the government 
of Ethiopia. Although several languages are spoken 
in Ethiopia, Amharic is spoken as a mother tongue 
by a sizeable proportion of the country's population 
currently estimated to be over 110 million. Among 
the Semitic language family, it is the second most 
spoken language in the world, next to Arabic. Due to 
its historical significance and official status, 
Amharic has been serving as the lingua franca of the 
country since a long time. As a result, many literary 
works, government documents, educational 
materials, religious literary works, etc. are 
predominantly produced in Amharic. Amharic uses 
Ethiopic script for writing having 34 base characters 
(with a vowel ኧ /ə/), each of which are modified to 
have six other orders representing vowels in the 
order of ኡ /u/, ኢ /i/, ኣ /a/, ኤ /e/, እ /ɨ/, and ኦ /o/. 
Like other Semitic languages, complex 
morphological processes are carried out on Amharic 
word classes such as verbs, nouns and adjectives 
(Yimam, 2001). Amharic verbs are the most 
complex word classes and can be generated by 
attaching affixes on verbal stems. On the other hand, 
verbal stems can be generated from verbal roots by 
inserting vowels between radicals. For example, the 
verbal stem ገደል-  /gədəl-/ is derived from the verbal 
root  ግ-ድ-ል  /g-d-l/. Moreover, verbal stems (e.g. 
ተገደል-  /təgədəl-/) can be derived from other verbal 
stems (e.g. ገደል-  /gəd
əl-/) by affixing morphemes. 
The verb formation process is usually completed by 
attaching a verbal stem with person, gender, number, 
case, tense/aspect and mood markers. For example, 
from the verbal stem ገደል- /gədəl-/  the following 
verbs can be generated: ገደልኩ  /gədəlku  'I killed'/, 
ገደልኩህ  /gədəlkuh  'I killed you'/, ገደልን  /gədəln  'we 
kill'/,  ተገደልኩ  /təgədəlku  'I was killed'/, ገደለች 
/gədələtʃ 'she killed'/, etc.  As verbs are marked for 
subject and object, they alone can represent a 
complete sentence. For example, the word አልሰበረንም 
/ʔəlsəbərənɨm  'he did not break us'/, which is 
constructed from the morphemes ʔəl-səbər-ə-nɨ-m
, is 
a complete sentence with the following linguistic 
information:  ʔəl-…-m /not/, -səbər- /did break/, -ə- 
/he/ and -nɨ- /us/. Accordingly, thousands of verbs 
can be derived from a verbal root through a complex 
morphological process carried out by attaching a 
combination of person, case, gender, number, tense, 
aspect, mood and others (Abate and Assabie, 2014; 
Assabie, 2017). 
Based on a morphological structure, Amharic 
nouns and adjectives can be either derived or non-
derived. For example, the word መሬት  /məret 'earth'/ 
and  ዛፍ  /zaf  'tree'/ are non-derived nouns whereas 
words like ስብራት /sɨbɨrat 'the state of being broken'/ 
and  ደግነት  /dəgɨnət 'generosity'/ are nouns derived 
from the verbal root ስ-ብ-ር  /s-b-r 'to break'/ and the 
adjective ደግ  /dəg 'generous'/, respectively. Derived 
nouns are generated from other word classes though 
morphological processes. In general, Amharic nouns 
can be derived from verbal roots, adjectives and 
other nouns by affixing vowels or bound 
morphemes. Derived adjectives can be formed from 
verbal roots by infixing vowels between consonants 
(e.g. ክ-ብ-ድ  /
k-b-d 'to become heavy'/ → ከባድ /kəbad 
'heavy'/), nouns by suffixing bound morphemes such 
as -ኧኛ /ʔəɲa/ (e.g. ጉልበት  /gulbət 'power'/ → ጉልበተኛ 
/gulbətəɲa 'powerful') and verbal stems by prefixing 
or suffixing bound morphemes (e.g. ደካም-  /dəkam-/ 
→  ደካማ  /dəkama 'weak'/). Although the 
morphological process of derivation of nouns and 
adjectives is complex by itself, even more 
complexity arises from their inflections. Amharic 
nouns and adjectives are inflected for number by 
suffixing  -ኦች  /-ʔotʃ/ or -ዎች  /-wotʃ/, definiteness by 
suffixing  -ኡ  /-ʔu/ or -ዉ /-wu/, objective case by 
suffixing  -ን /-n/, possessive case by suffixing 
different morphemes depending on the subject, and 
gender by suffixing -ኢት
 /-ʔit/. These inflections can 
appear alone or in combination at the same time, 
along with prepositions and negation markers which 
lead to the generation of thousands of word forms 
from a single noun or adjective. For example, 
ያለባለቤቶቹ  /jaləbaləbetotʃu 'without the owners of the 
house'/ is generated from the morphemes jə-ʔələ-
balə-bet-otʃ-u (jə preposition  'of/with', ʔəl negation 
marker 'not/without', balə possessive marker 'owner 
of', bet noun 'house', otʃ plural marker, and u definite