From our third assumption, all the u
ij
are i.i.d sam-
ples. Considering all the assumptions, the probability
of the vector of item responses for a given player can
be produced by the likelihood function
Prob(U
j
|θ
j
) =
n
∏
i=1
P
u
ij
i
(θ
j
)Q
1−u
ij
i
(θ
j
). (3)
This yields the log-likelihood function
L = logProb(U
j
|θ
j
)
=
n
∑
i=1
[u
ij
logP
ij
(θ
j
) + (1− u
i
j) logQ
ij
(θ
j
)].
Since the item parameters for all the n items are
known, only derivatives of the log-likelihood with re-
spect to a given ability will need to be taken:
∂L
∂θ
j
=
n
∑
i=1
u
ij
1
P
ij
(θ
j
)
∂P
ij
(θ
j
)
∂θ
j
+
n
∑
i=1
(1− u
ij
)
1
Q
ij
(θ
j
)
∂Q
ij
(θ
j
)
∂θ
j
. (4)
When Newton-Raphson minimization is applied on
L, an ability estimator θ
j
for the player is obtained.
The value of θ
j
determines the ability of the decision
maker.
5 CONCLUSION AND
PROSPECTS
In this position paper, we have outlined several mea-
surement procedures to quantify the quality of human
decisions. Our proposed model generates the predic-
tion of choices made by any decision-makers for any
problems. It also ranks the decision-makers by the
quality of the decisions made. The model is estab-
lished via the evaluations generated by an AI agent
of supreme strength, and uses this information as the
knowledge-base for the IA to analyze the problem.
These procedures can also be employed to model
an IA to mimic a decision-maker by tuning down
to match the decision-maker’s native characteristics.
Numerous aspects like speed-accuracy trade-off, ef-
fect of procrastination and impact of time pressure
also can be analyzed, and their effect on performances
by the decision-makers can be tested. Other fields
where this model can be applied include, but are not
limited to, economics, psychology, test-taking, sports,
stock market trading, and software benchmarking. Of
these fields test-taking has the closest formal corre-
spondence to our chess model.
We aim thereby to shed light on the following
problems, for application domains such as test-taking
for which we can establish a correspondence to our
chess model: Do the intrinsic criteria for mastery
transferred from the chess domain align with extrin-
sic criteria inferred from population and performance
data in the application’s own domain? How close is
the agreement and what other scientific regularities,
performance mileposts, and assessment criteria may
be inferred from it? What does this say about distri-
butions, outliers, and the effort needed for mastery?
REFERENCES
Andrich, D. (1978). A rating scale formulation for ordered
response categories. Psychometrika, 43:561–573.
Andrich, D. (1988). Rasch Models for Measurement. Sage
Publications, Beverly Hills, California.
Baker, F. B. (2001). The Basics of Item Response Theory.
ERIC Clearinghouse on Assessment and Evaluation.
Busemeyer, J. R. and Townsend, J. T. (1993). Decision
field theory: a dynamic-cognitive approach to deci-
sion making in an uncertain environment. Psycholog-
ical review, 100(3):432.
Chinchalkar, S. (1996). An upper bound for the number
of reachable positions. ICCA JOURNAL, 19(3):181–
183.
DiFatta, G., Haworth, G., and Regan, K. (2009). Skill rating
by Bayesian inference. In Proceedings, 2009 IEEE
Symposium on Computational Intelligence and Data
Mining (CIDM’09), Nashville, TN, March 30–April 2,
2009, pages 89–94.
Maas, H. v. d. and Wagenmakers, E.-J. (2005). A psycho-
metric analysis of chess expertise. American Journal
of Psychology, 118:29–60.
Masters, G. (1982). A Rasch model for partial credit scor-
ing. Psychometrika, 47:149–174.
Morris, G. A., Branum-Martin, L., Harshman, N., Baker,
S. D., Mazur, E., Dutta, S. N., Mzoughi, T., and Mc-
Cauley, V. (2005). Testing the test: Item response
curves and test quality. Am. J. Phys., 74:449–453.
Muraki, E. (1992). A generalized partial credit model: Ap-
plication of an em algorithm. Applied psychological
measurement, 16(2):159–176.
Ostini, R. and Nering, M. (2006). Polytomous Item Re-
sponse Theory Models. Sage Publications, Thousand
Oaks, California.
Rasch, G. (1960). Probabilistic models for for some in-
telligence and attainment tests. Danish Institute for
Educational Research, Copenhagen.
Thorpe, G. L. and Favia, A. (2012). Data analysis using
item response theory methodology: An introduction
to selected programs and applications. Psychology
Faculty Scholarship, page 20.
Wichmann, F. and Hill, N. J. (2001). The psychometric
function: I. Fitting, sampling, and goodness of fit. Per-
ception and Psychophysics, 63:1293–1313.
WikiBooks (2012). Bestiary of behavioral eco-
nomics/satisficing — Wikibooks, the free textbook
project. [Online; accessed 7-August-2014].
DesigningIntelligentAgentstoJudgeIntrinsicQualityofHumanDecisions
613