Developing a Contextual English Proficiency Test: The Case of

English Test for Islamic Community (ETIC)

Siti Nurul Azkiyah

and Abdurrosyid

Faculty of Educational Sciences, UIN Syarif Hidayatullah Jakarta, Indonesia

Keywords: Contextual test; English profieciency test; ETIC; test development

Abstract: Assessment has been widely recognized to play a very important role in education, one of which

is assessment of language proficiency. For the context of English, TOEFL, IELTS and TOEIC

have been widely used. However, there have been criticisms towards these types of test in terms

of the dominance of English native culture and the cost of taking the test. Therefore, this study is

intended to develop an English proficiency test, which is contextual for Indonesia especially

Islamic community as the largest one in Indonesia. The test is named ETIC (English Test for

Islamic Community), which development strictly follows the principles and the stages of an

international standardized test. Three domains namely academic, Islam, and Indonesia become

the main areas for the content of the test while the test specification is built based on a thorough

study of competencies tested in TOEFL, IELTS as well as those mentioned in CEFR. The results

of the study are expected not only to produce a unique and contextual English proficiency test but

also contribute to the development of intellectual property rights of UIN Syarif Hidayatullah

Jakarta as the locus of the study.

1 INTRODUCTION

It is widely believed that assessment has a very

important role in education. A large number of

studies have consistently indicated that assessment is

esssential not only for students but also for teachers

and institutions (e.g. Alduais, 2012; Creemers &

Kyriakides, 2008; Creemers, 1994; Hughes, 1989).

Furthermore, Creemers, and Kyriakides (2008)

argue that assessment is vital for teachers in

providing information on what teachers should do in

order to improve their teaching and learning

activities. In other words, the results or data gathered

from assessment should enable teachers identify the

high and the low achievers in their classrooms, to

analyze the weaknesses and the strengths of their

teaching methods, as well as to underline the

materials that need to be retaught. These are

considered as necessary data for teachers to decide

improvement strategies of their teaching and

learning strategies.

There have been different types of assessment,

one of which is proficiency test. According to some

experts such as Bachman and Palmer (1996), Carr

(2011), Hughes (1989), and Kopriva (2008),

proficiency test is expected to show somebody’s

overall proficiency in a language regardless of

his/her educational background. Some examples of

proficiency test in language include TOEFL (Test of

English as a Foreign Language), IELTS

(International English Language Testing System),

TOEIC (Test of English for International

Communication) etc. These types of tests are used

not only for academic purposes but also for other

purposes such as business and other general

purposes.

The continuity of English proficiency test

development is highly demanded since there is no

such perfect test that can work well for all times and

contexts. This view is supported by Kim (2018) who

exhibits the existence of bias in English proficiency

test in Seoul National University. Some experts also

consider that self-developed English proficiency

tests can do justice for the test takers in the form of

accomodating their needs and local values (Franco

& Galvis, 2012) and also being likely less biased

2940

Azkiyah, S. and Abdurrosyid, .

Developing a Contextual English Proﬁciency Test: The Case of English Test for Islamic Community (ETIC).

DOI: 10.5220/0009926529402944

In Proceedings of the 1st International Conference on Recent Innovations (ICRI 2018), pages 2940-2944

ISBN: 978-989-758-458-9

because all know that the tests such as TOEFL and

IELTS are the tests that belong to the native

speakers of English which favor the English native

contexts. Also, the continual development of English

proficiency test will avoid washback for English

learning (Thaidan, 2015, eventhough it is not

entirely.

The issue that most English tests are too

“Anglobalized” are justified as it can be seen from

the test framework, the values and ideologies in all

parts of the test (Thomas & Breidlid, 2015). The

tests have been criticized to be biased, containing

values and contexts of English native speakers. Even

an English test developed in non-English speaking

countries like Indonesia has been found to use much

more English culture rather than the source

(Indonesia) culture, at 46% and 17% respectively

(Azkiyah & Setiono, 2018). This is likely due to the

fact that the concept of native speakers’ competence

has been widely applied prior to the emergence of

communicative competence (Chinh, 2013).

However, many studies have found that

familiarity is a crucial aspect effecting students’

performance on the test. Sabatin (2012), for

instance, mentions that the domination of the

English native speakers may affect test validity due

to test takers’ lack of familiarity. Similarly, Sosa

(2012) considers that containing too many cultural

contents may throw off some students and thus is not

fair. This is because individual contexts, class, race,

gender as well as cultural experience play roles in

reading: readers do not passively receive texts

(Tomlinson, 2010). Finally, another crucial problem

is the expensive price of TOEFL or IELTS test,

which is similar to a semester-tuition fee in many

universities in Indonesia.

Referring to the importance of English, UIN

Syarif Hidayatullah Jakarta as the context of the

study, regulates that all students have to achieve a

minimum standard of proficiency in both English

and Arabic. Considering some arguments on the

necessery to accomodate the local values and needs

of the students in the test and the expensive price of

the aforementioned English proficiency tests,

through the language center, the university decides

to develop its own test, which then is named English

Test for Islamic Community (ETIC).

Therefore, this paper is intended to describe the

stages of developing ETIC, the unique

characteristics of ETIC as well as its items quality.

In line with these objectives, three questions are

raised as follows;

1) What are the stages of developing ETIC?

2) What specific characteristics should be attached

to ETIC to make it unique and relevant to the

context of Indonesia and Islamic University?

3) How is the quality of items in ETIC?

It is expected that the results of the study do not

only produce a unique English proficiency test but

also contribute to the development of intellectual

property rights of UIN Syarif Hidayatullah Jakarta.

2 METHOD

2.1 Research Design

Concerning the design, this research will use

both qualitative and quantitative methods. The

qualitative approach is used to address question 1

and 2 while the quantitative one is employed to

answer question 3. Firstly, an analysis of the

specification of TOEFL and IELTS was conducted

as a benchmark in developing the items in ETIC.

This analysis is important to develop the unique

specification of ETIC, which then will be used to

develop the items of ETIC. The quantitative

approach was then used to analyze the test items of

ETIC though a pilot study intended to analyze the

validity, reliability, difficulty level, and item

discrimination of the test items. This analysis is

important in order to make sure that every single

item in the test meets the requirements of a quality

test.

2.2 Study Subject

This study used both human and non-human as

the subjects: the non-human ones were the TOEFL

and IELTS tests while the human participants of the

study consisted of the developers of ETIC and

students of UIN Syarif Hidayatullah Jakarta. In this

study, the team who developed ETIC consisted of 8

lecturers of English from both the Education and

Literature Departments of UIN Syarif Hidayatullah

Jakarta and 300 students who voluntarily attended

the pilot study intended to examine the quality of the

items included in ETIC.

2.3 Data Analysis

Regarding the first and the second research

questions, which were the analysis of the

specification of TOEFL and IELTS and

development of ETIC specification, two TOEFL

books and two IELTS books were analyzed to

understand the specification of both types of well-

Developing a Contextual English Proﬁciency Test: The Case of English Test for Islamic Community (ETIC)

2941

known English proficiency tests. This analysis

consisted of the analysis for Listening, Structure and

Written Expression and Reading. In answering the

first and the second research questions, descriptive

and analytical approach were used to understand the

specification of TOEFL and IELTS.

For the third research question, upon the

completion of the items of ETIC, the responses of

students who joined the pilot study were analyzed

using SPSS to analyze the validity, reliability,

difficulty level, and item discrimination of ETIC.

3 FINDINGS AND DISCUSSION

3.1 Steps in Developing ETIC

This study was a long study since it involved

various activities starting from the development of

the test specification to the pilot study and the

revision of the items. The details of each stage are

presented below.

The Development of Test Specification. The

development of test specification was started in

March 2016, in which following the structure of

TOEFL, the team was divided into three namely

listening, structure and written expression, and

reading comprehension. Each team analyzed the

competencies examined in both IELTS and TOEFL,

the lengths of the text, the type of questions, the

topics of the texts, the number of questions and the

way the questions was raised.

The Development of the Items. Another activity

in the study which also took time was the

development of the items. Each person in the team

was assigned to develop 50 – 85 items so that there

would be much more items than what were in the

real package of items in ETIC. This activity took

place from April up to August 2016 since

constructing an item is very difficult.

The review and the revision of the Items. After

each member of the team finished constructing the

items, the draft was sent to reviewers to the fist

validation of the content. The results of the review

was sent to the corresponding member of the team.

This activity took place in July - September 2016.

The Pilot Study of the Items. After revising the

items based on the feedback of the reviewers, each

member of the team worked to revise the items.

When the revision finished, the next big agenda was

the pilot study conducted in September 2016.

The Analysis of the Items. The analysis of the

item was conducted after the pilot study. The data

were firstly inputted in SPSS and analyzed both

descriptively and statistically. The descriptive

analysis dealt with the frequency of each response

for each item. This analysis was important in order

to understand the distribution of responses for each

item. The statistical analysis concerned the

reliability, validity and the difficulty level of each

item. This activity was conducted to in September

2016.

The Revision of the Items. The second round of

revision was scheduled in this study to revise the

items based on the pilot study. It was recognized that

students did not select certain answers offered in

some items. In addition, it was also noticed that

some items were both too easy and too difficult, and

therefore they should be improved. The revision was

scheduled in October 2016.

Applying for the Intellectual Rights Property.

The last activity in the study was applying for the

Intellectual Right Property to the Ministry of Law

and Human Rights. The document for this

Intellectual Property Right was sent to the Ministry

in October 20, 2016.

3.2 The Specification and the

Uniqueness of ETIC

As previously mentioned, in designing the

specification of ETIC, analysis of skills examined in

TOEFL (paper-based) and IELTS was conducted,

the results of which served as the framework to

develop the specification of ETIC. In principle, the

development of ETIC considered the context of

Indonesia, Islam, and academic sphere, which

became the main theme of the content of ETIC. In

measuring the results, ETIC adherred CEFR (The

Common European Framework of Reference for

Languages) which has been widely used as a

reference in testing language proficiency.

Regarding the uniqueness of ETIC, different

from TOEFL and IELTS, ETIC has several unique

characteristics. The first is the topics or theme

embedded in the test. Islam, Indonesia, and

academic are the core themes included in the test.

The topic on Islam as of the three topics in

developing the test content of ETIC is a means of

ensuring justice and fairness for Muslim when

taking English test covering general knowledge

knowledge and habit of Muslim around the world

ranging from clothing culture and rules to

architecture styles dominating the Islam world. In

addition, as the second largest religion in the world

with 1.6 billion adherents or 23 percent of global

population as of 2010 (Chappell, 2015). The test

content also proves the very existence of linguistic

ICRI 2018 - International Conference Recent Innovation

2942

interplay between English along with its native

speakers and Islamic civilization (Abdurrosyid,

2017). The theme of Indonesia is pivotal to be

included in the test since the country is a with the

largest Islamic population in the world with 222

million adherents (“Top 10 largest”, 2018) as well as

the locus where ETIC is designed and developed.

The academic knowledge and information is the

third main content of ETIC because the test also

intended to accomodate academic test takers in

general besides Muslim communities. The second is

the style of constructing the questions. Instead of

constructing questions, ETIC uses statements to start

the items included in the test. The questioning style

is used to expose the test takers to more various

ways of testing their English proficiency besides

common styles of questioning in English that mostly

using, to be, auxiliary verbs and W-H question.

3.3 The Quality of the Items

The analysis of the item was conducted after

the pilot study. The data were firstly inputted in

SPSS and analyzed both descriptively and

statistically. The descriptive analysis dealt with the

frequency of each response for each item. This

analysis was important in order to understand the

distribution of responses for each item. The

statistical analysis concerned the reliability, validity

and the difficulty level of each item.

With respect to the validity analysis, we found

some items should be improved because the

coefficient of their (corrected item) total correlation

was below 3. Another quantitative analysis

conducted in this study was descriptive statistics

which attempted to report the distribution of answer

and the difficulty level. The findings of this analysis

showed that at least one option of some items was

not selected by students, which was likely due to the

fact that the option was obviously wrong.

Concerning the reliability analysis, when each

component was analyzed separately, the findings

revealed that among the three components, listening

had the highest reliability coefficient, i.e. α .749

while that of reading and structure and written

expression were below .6. However, the reliability

of the three components in ETIC was very good, α =

. 828, which was accepted in this study because

ETIC was intended to measure language proficiency

that covered all items included in the instrument.

In short, in order to improve the quality of some

items, several actions were performed. Firstly we

changed some options which were not selected by

participants. In addition, for the context of reading

we also looked at the text and modified a little to

make sure that the words used did not contain

ambiguous meaning. Finally, it should be noted that

the items included in the pilot study were exactly the

items we used for ETIC and consequently there were

no spare items to be removed to improve the quality.

4 CONCLUSION

The main goals of this study were to develop a

contextual English proficiency test and apply for its

intellectual property rights. To start the process, the

steps and principles of test construction were strictly

followed by establishing a team who was divided

into three following the components included in

TOEFL (paper-based).

The first step was the development of test

specification, which was then followed by items

construction. In the specification, Islam, Indonesia,

and academic were decided to be the core themes

included in the test and statements instead of

questions were applied to start the items included in

the test. Before distributing the items in the pilot

study, relevant and authoritative experts were asked

to review the quality of the items qualitatively,

followed by corresponding revision. The next step

was a pilot study conducted to students of the

university from various majors, which data were

analyzed in terms of the difficulty level, the

validity and reliability analysis. The findings

indicated that the items included in ETIC was valid

and reliable to examine English proficiency.

Finally, the process and the findings of this

study imply some important suggestions. Firstly, it

should be noted that test development is not an easy

work. It involves a lot of steps, materials, and times

and requires high commitment and coordination.

Therefore, it is suggested to prepare a solid team

who are committed to the hard work. The team

should consist not only test developers but also

reviewers and proof readers. Since all the test

developers in this study were all Indonesian, it is

deemed important to have proof readers to make

sure that every single word, phrases and sentences as

well as collocations are correct. Secondly, in this

study, the item number included in the pilot study

was exactly the number of items that would be

included in the test. There were no spare items in

case deletion was necessary during the analysis.

Consequently, there was no item deleted. Although

improvement of both the text and the options were

conducted, it is considered important to spare some

more items to provide some spaces for deletion of

Developing a Contextual English Proﬁciency Test: The Case of English Test for Islamic Community (ETIC)

2943

inapropriate items. Hence, it is highly suggested to

have more spare items during the pilot study and to

keep enriching the test bank.

REFERENCES

Abdurrosyid, (2017). The Islamic Entries in three major

English Dictionaries. Insaniyat: Journal of Islam and

Humanities Vol. 2, No.1. p. 41 – 49.

Bachman, L.F., Palmer, A. S. (1996). Language Testing in

Practice. Oxford: Oxford University Press.

Brown, H.D. (2004) Language assessment: principles and

classroom practices. New York: Longman.

Carr, N. T. (2011). Designing and Analyzing Language

Tests. Oxford, Oxford University Press.

Combe, C. & Davidson, P. (2014) Common Educational

Proficiency Assessment (CIPA) in English. Language

Testing, Vol. 31 No. 2, p. 269 – 276.

Council of Europe. ( 2011). Manual for language test

development and examining. ALTE.

Cohen, A.S. & Wollack (2006) Handbook on test

development. Madison: University of Wisconsin.

Chae, E.Y. & Shin, J.A. (2015) A study of a timed clozed

test for evaluating L2 proficiency. English Teaching,

Vol. 70, No 3, p. 117 – 125.

Chappel, B. (2015) World’s Muslim Population will

surpass Christians this century, Pew Says. Retrieved

from https://www.npr.org/sections/thetwo-

way/2015/04/02/397042004/muslim-population-will-

surpass-christians-this-century-pew-says

Chen, Y. & Puttitanun, T. (2015), Intellectual property

rights and innovation in developing countries. Journal

of Development Economics. Vol 78, p. 474 – 493.

Cheng, L. & Watanabe, Y. (2008) Washback in language

testing. London: Lawrance Elbaum Associates

Publishers.

Chapelle, C. and Douglas, D. (2006). Assessing Language

through Computer Technology. Cambridge:

Cambridge University Press.

Creemers, B.P.M. (1994). The effective classroom.

London. Cassell.

Creemers, B. & Kyriakides, L. (2008). The Dynamics of

Educational Effectiveness. A contribution to Policy,

Practice and Theory in Contemporary schools. New

York & London: Routledge Taylor & Francis Group.

Cho, H. & Brutt-Griffler, J. (2015). Integrated reading and

writing: a case of Korean English langhuage learners.

Reading in a Foreign Language, Vol. 27, No. 2, p.

242 - 261

Crystal, D. (2007). English as a Global Language.

Cambridge: Cambridge University Press.

Fulcher, G. (2010). Practical Language Testing. London:

Holder Education.

Franco, C.P., & Galvis, A. H. (2012). The role of

situational context and linguistics context when testing

EFL Vocabulary Knowledge in a Language Teacher

Education Program: A preliminary Approach.

Colombia Applied Linguistics Journal. Vol. 15. No 1.

P. 85 – 99.

Hamid, M. O. (2014) World Englishes in Internatinal

proficiency tests. World Englishes, Vol. 33, No. 2, p.

263 – 277.

Heaton, J.B. (1990). Writing English language test. New

York: Longman.

Hughes, A. (2003) Language testing for language

teachers

. Oxford: Oxford Univeristy Press.

Kim, E-Y, J. (2018). Utility and bias in a Korean

standardized test of English: the case of i-TEPS (Test

of Englilsh Proficiency developed by Seoul National

University). Asian Englishes. DOI:

10.1080/13488678.2018.1463346

Kopriva, R. J. (2008). Improving Testing for English

Language Learners. New York: Routledge Taylor &

Francis Group.

Lado, R. (1961) Language Testing. London: Longman.

Marcus, R. (2004) Strategic management of Intelletual

property. MIT Sloan Management Review, Spring.

McNamara, T. and Roever, C. (2006). Language Testing:

The Social Dimension. London: Blackwell.

Reviere, R., Berkowitz, S., Carter, C. C., Gergusan, C. G.

(Eds.) (1996). Needs Assessment: A Creative and

Practical Guide for Social Scientists. Taylor and

Francis: Washington.

Sabatin, I. (2013). The Effect of cultural background

knowledge on learning English language.

International Journal of Science, Culture and Sport,

1(4): 22-32.

Soler, E. A., & Jorda, M. P. S. (2007). Introduction. In E.

A. Soler, & M. P. S. Jorda (Eds.), Intercultural

Language Use and Language Learning Springer (pp.

1-6).

Sosa, K. (2012). Standardized testing and cultural bias.

Retrieved from:

http://www.brighthubeducation.com/student-

assessment-tools/65699-standardized-testing-and-

cultural-bias/.

Tomlinson, B. 2010. Research in Materials Development

for Language Teaching. London: Continuum.

Top 10 Largets Muslim Populations in the world. (2018).

Retrieved from https://support.muslimpro.com/hc/en-

us/articles/115002006087-Top-10-Largest-Muslim-

Populations-In-The-World

Thaidan, R (2015). Washback in Language Testing.

Education Journal. Vol 4. No 1, p. 5 – 8.

Thomas, P & Beriedlid, A. (2015) In the shadow of

‘Anglobalization’ national test in English in Norway

and the making of a new English underclass. Journal

of Multicultural Discourse. Vol. 10, No. 2, p. 349 –

368.

Phillipson, R. (1993). Lingustic Imperialism. Oxford:

Oxford Universiy Press.

Yan, X. (2014) An examination for rater performance on a

local oral English proficiency test: A mixed-methods

approach. Language Testing, Vol. 31, No 4, p. 501 –

527.,

ICRI 2018 - International Conference Recent Innovation

2944