Towards Quality Assessment of AI Systems: A Case Study

Jesús Oviedo Lama

1,2 a

, Jared David Tadeo Guerrero Sosa

, Moisés Rodríguez Monje

1,2 c

Francisco Pascual Romero Chicharo

and Mario Piattini Velthuis

AQCLab Software Quality, Ciudad Real, Spain

Instituto Tecnologías y Sistemas de Información y Escuela Superior de Informática,

Universidad de Castilla-La Mancha, Ciudad Real, Spain

Keywords: Software Quality, Quality of Artificial Intelligence System, ISO/IEC 25000, ISO/IEC 25059.

Abstract: Artificial Intelligence is being a lever of change at all levels of society, both in public administration, in

companies, organizations and even for the daily activities of individuals. Therefore, it is necessary, as in

software, that Artificial Intelligence Systems obtain the results expected by the users and for this purpose,

their functionality must be controlled, and their quality must be assured. This article presents the results of a

functional suitability evaluation of a real Artificial Intelligence System by applying an evaluation environment

based on ISO/IEC 25059 and ISO/IEC 25040 standards.

1 INTRODUCTION

Artificial Intelligence is a key part of the digital

transformation that is growing both at the

organisational level, where it is facilitating the

robotisation and automation of processes, as well as

in everyday activities: such as personal assistants,

recommendations in internet searches, household

appliances, etc. This growth in artificial intelligence

is evidenced by objective data such as the growth in

the number of artificial intelligence projects hosted on

GitHub, where in the last decade the number of

projects has increased from 845 in 2011 to 1.8 million

in 2023. Having a significant growth of 59.3%

between 2022 and 2023. It can also be seen with the

number of research papers that have tripled between

2010 and 2022 (Maslej, et al., 2024).

This great impact of artificial intelligence on our

society makes it necessary to generate trust in users

(European Commission, 2020). For this reason, the

European Union has worked to create a regulatory

https://orcid.org/0000-0001-7962-1042

https://orcid.org/0000-0001-7999-9870

https://orcid.org/0000-0003-2155-7409

https://orcid.org/0000-0001-9116-2870

https://orcid.org/0000-0002-7212-8279

https://www.sei.cmu.edu/our-work/artificial-

intelligence-engineering/

framework that gives rise to an ‘ecosystem of trust’,

and the result of this work is the Artificial Intelligence

Act - Regulation (EU) 2024/1689, which classifies AI

systems according to their risk and establishes the

controls that must be implemented. However, the Act

focuses on more legal and ethical aspects and does not

cover technical aspects related to the quality of AI

systems.

Therefore, in order for users to achieve complete

confidence in AI systems, it is necessary that they are

developed with adequate quality criteria for which

mature development techniques, such as software

engineering techniques, must be employed (Serban,

et al., 2020). The use of software engineering

techniques in the development of AI systems is an

emerging area of ‘IS for AI’, i.e. how to use software

engineering techniques to build intelligent systems

with adequate quality and productivity. In fact, the

‘1st Workshop on AI Engineering - Software

Engineering for AI’ was launched by IEEE and ACM

in 2021. The SEI calls this research area ‘Artificial

Intelligence Engineering

’ the result of ‘combining

254

Oviedo Lama, J., Guerrero Sosa, J. D. T., Rodríguez Monje, M., Romero Chicharo, F. P., Piattini Velthuis and M.

Towards Quality Assessment of AI Systems: A Case Study.

DOI: 10.5220/0013521700003964

In Proceedings of the 20th International Conference on Software Technologies (ICSOFT 2025), pages 254-261

ISBN: 978-989-758-757-3; ISSN: 2184-2833

the principles of systems engineering, software

engineering, computer science and human-centred

design to create AI systems according to human needs

for mission outcomes’.

The use of software engineering techniques for

the construction of AI systems is based on the

principles established by Horneman (Horneman, et

al., 2019), who can be called the father of this new

area, considering that AI systems are software-

intensive systems, and therefore:

 ‘Established principles for designing and

deploying quality software systems that meet

their mission objectives on time should be

applied to engineering AI systems.’

 ‘Teams should strive to deliver functionality on

time and with quality, design for important

architectural requirements (such as security,

usability, reliability, performance and

scalability), and plan for system maintenance

throughout the life of the system.’

While ‘Artificial intelligence (AI) is pervasive in

today's landscape, there is still a lack of software

engineering expertise and best practices in this field’

(van Oort, et al., 2021), although developers of AI-

intensive systems are aware of the need to employ

these techniques, they also recognise that traditional

software engineering methods and tools are neither

adequate nor sufficient on their own and need to be

adapted and extended (Lavazza & Morasca, 2021).

For all these reasons, it is necessary to evolve the

methods for quality assurance of ‘software 2.0’

towards what can be called ‘MLware’ (Borg, 2021).

Although techniques or practices for quality

assurance of ML models or ML-based systems are

already beginning to emerge (Hamada, et al., 2020),

such as the use of white box and black box testing

(Gao, et al., 2019), it is necessary to ensure that the

tests performed on these AI systems follow

established quality standards and criteria:

performance, reliability, scalability, security, etc.

Therefore, quality assurance of AI-based systems is

an area that has not yet been well explored and

requires collaboration between the SE and AI

research communities and currently, there is a lack of

(standardised) approaches for quality assurance of

AI-based systems, being essential for their practical

use (Feldererer & Ramler, 2021).

Therefore, in the new area of Artificial

Intelligence Engineering, challenges have arisen

related to the definition and guarantee of the

behavioural and quality attributes of these systems

and applications, which have led us to work on the

development of an environment that allows the

evaluation of the functional suitability of AI systems.

Therefore, the main objective of this article is to

present the results obtained in the evaluation of the

functional suitability of an artificial intelligence

system using the built environment (Oviedo, et al.,

2024). In addition, the differences and difficulties

encountered in the evaluation of an AI system,

compared to the evaluation of a traditional software

system, are also presented. For all this, the rest of the

article is structured as follows: section 2 presents a

summary of the importance of quality in AI systems

and the proposals that exist for evaluating the quality

of these systems. Section 3 describes the evaluation

environment based on ISO/IEC 25000 family

standards used to carry out the evaluation. Section 4

presents the AI system evaluated, and the assessment

performed. Section 5 presents the conclusions

obtained with this work and the future lines of work

and research.

2 QUALITY IN AI SYSTEMS

For the quality assurance of artificial intelligence

systems, as is done in Software Engineering, it is

necessary to consider several dimensions (Borg,

2021), such as the quality of the data, the quality of

the AI system itself, the quality of the development

processes, etc. However, as mentioned above,

existing work on software quality needs to be adapted

to the particularities of artificial intelligence. This is

why in recent years new standards have emerged that

are adapted to the particularities of AI systems to

address their quality (Oviedo, et al., 2024) (Piattini,

2024).

At the level of AI system development processes,

the new ISO/IEC 5338 standard ‘Life cycle processes

for AI systems’ (ISO/IEC, 2023) has emerged, based

on ISO/IEC 12207, which lays the foundations for the

relevant processes that organisations should follow to

ensure the proper development of AI systems. This

standard defines 33 processes separated into four

groups, where it adapts the software lifecycle

processes to specific aspects of AI. To this end, 23 of

the 33 processes have been directly adapted from

ISO/IEC 12207 and three new processes have been

defined: Knowledge Acquisition Process, AI Data

Engineering Process and the Continuous Validation

Process (Márquez, et al., 2024).

On the other hand, at the level of AI system

product quality, several models, techniques and, in

some cases, tools have been proposed to assess and

ensure the quality of AI systems. However, these

works are based on software engineering standards

Towards Quality Assessment of AI Systems: A Case Study

255

that have not been adapted to the particularities of AI

systems or are focused on evaluating the quality in

use and data of the systems (Oviedo, 2024).

Therefore, in order to address the evaluation of AI

systems, as has happened with other cases

(sustainability of software products (Calero, et al.,

2021), cloud services (Navas, 2016), e-learning

systems (Rahman, 2019), etc.), it has been necessary

to adapt the quality model defined in the ISO/IEC

25010 standard for software products to the special

characteristics of AI systems, thus giving rise to a

new quality model contained in the ISO/IEC 25059

standard (ISO/IEC, 2023) in accordance with the

framework of the ISO/IEC 25000 standards.

This new ISO/IEC 25059 standard includes the

quality characteristics that any AI system must

comply with, based on those proposed by ISO/IEC

25010, but presenting certain changes and adaptations

in some of the characteristics and sub-characteristics

(Oviedo, et al., 2024).

The proposal made by ISO/IEC 25059 is a starting

point that must be completed to be applicable in the

industry by defining metrics and thresholds to

determine the quality values of each of the sub-

characteristics and characteristics defined in the

standard. In addition, to address the evaluation of AI

systems, it is necessary to have an evaluation process

and a technological environment to allow the

collection of data necessary for the calculation of the

new metrics and to obtain the quality value of each of

the characteristics that make up the quality model for

the AI system.

3 FUNCTIONAL SUITABILITY

ASSESSMENT ENVIRONMENT

FOR AI SYSTEMS

In order to address the evaluation of AI systems, it is

necessary to adapt existing software quality

evaluation environments. On the other hand, as there

is no work based on the new ISO/IEC 25059 standard

(ISO/IEC, 2023), nor where metrics are defined to

know the quality of the AI system itself (Oviedo, et

al., 2024), it has been necessary to develop an

evaluation environment composed of a quality model

and metrics, an evaluation process and a set of

measurement and evaluation tools, focusing on one of

the characteristics defined in the ISO/IEC 25059

standard, Functional Suitability.

3.1 Quality Model for the Functional

Suitability of AI Systems

The first element of the developed environment is the

quality model for Functional Suitability (Oviedo, et

al., 2024), which is based on the ISO/IEC 25059

standard and complemented with a set of properties

and metrics that allow obtaining the level of quality

from values of the AI system itself. The definition of

properties and metrics has been based on the

functional suitability model defined in (Rodriguez, et

al., 2016), but its adaptation has been necessary given

that Functional Suitability is one of the quality

characteristics of the ISO/IEC 25059 standard that

present changes with respect to the ISO/IEC 25010

standard (ISO/IEC, 2011). Specifically:

 The sub-characteristic Functional Correctness

has been modified given that in AI systems

correct results are not produced in the totality

of the executions, which has led to the need to

establish new metrics for its evaluation, as well

as to adapt the measurement procedure.

 The sub-characteristic Functional Adaptability

is a new sub-characteristic, so it has been

necessary to define new Quality Properties and

metrics to address the evaluation of this sub-

characteristic.

 At the Characteristic level it has been necessary

to make changes in the thresholds and ranges in

order to adapt to the changes made in the sub-

characteristics.

Figure 1 presents the quality model defined for the

functional suitability of AI systems, showing each of

the properties that influence each sub-characteristic.

3.2 Evaluation Process

The second element of the environment is the

Assessment Process (Oviedo, et al., 2024). The

assessment process is based on the ISO/IEC 25040

standard (ISO/IEC, 2023) which already proposes an

assessment model, although this process has had to be

adapted to address the assessment of the Functional

Suitability of IA Systems, in Figure. 2 presents the

activities and tasks that make up this process.

3.3 Assessment Tool

The third element is the software environment

(Oviedo, et al., 2024), developed to automate the

measurement and evaluation of functional suitability

in AI systems. The environment has four

components:

ICSOFT 2025 - 20th International Conference on Software Technologies

256

Figure 1: Functional Suitability Quality Model for AI Systems.

Figure 2: Functional Suitability Assessment Process for AI Systems (Oviedo, et al., 2024).

 Measurement tool: it oversees recording the

information of the AI Systems to obtain the

necessary metrics for the evaluation of the

functional suitability of the AI systems.

 Assessment tool: this tool includes the

implementation of the model for assessing

functional suitability and a program that

performs the necessary calculations to carry out

the assessment based on the results of the

measurement tool.

 Visualisation tool: this is a visualisation

environment where the results are presented.

 Database: where all the information of the

evaluations is stored to be used by the rest of the

tools of the AI systems evaluation environment.

4 EVALUATION ENVIRONMENT

USE CASE

After the development of the evaluation environment,

it was necessary to test its applicability on real AI

systems to ensure that the metrics, properties, and

thresholds defined in the evaluation environment are

appropriate. In addition, the use case allowed for the

identification of improvements or issues that may

exist during the evaluation of an AI system.

To carry out the case study, three AI systems were

available as a starting point, whose characteristics are

described in Table 1.

Table 1: AI Systems for Case Studies.

Project Industry

AI System

Typology

Accuracy

Requirements

Automotive

Strike Location

Automotive fuzzy rules > 60%.

Skills through

videos

- fuzzy rules No value

Predict Total

Power

Consumption

for a Customer

Type

Energy

Distribution

fuzzy rules > 60%.

In the first place, the ‘Skills Assessment through

videos’ system was discarded for the use case given

the large use of computer resources that had to be

Towards Quality Assessment of AI Systems: A Case Study

257

used to process the video and the subjectivity of the

results as it is an assessment of aspects of personality.

The system ‘Automotive Strike Location’ was also

rejected due to the difficulty of obtaining all the

necessary documentation to carry out the evaluation

due to the confidentiality of the project.

Finally, the system ‘Prediction of the total power

consumption for a type of customer’ was selected as

the candidate for the case study given its viability

when carrying out the evaluation.

4.1 Presentation of the AI System

The system Prediction of total power consumption for

a customer type is a deterministic AI system, based

on fuzzy rules. The objective of the system is to

predict, from the S02

load profiles of the customers

of a transformation centre, the level and direction of

the effect on power consumption for each of the

different types of customers (Public lighting, Family

house, Garage, etc.) for a specific day. The

predictions are inferred from the fuzzy rules that

define the AI model, these rules consider the type of

customer and information related to the date such as

meteorological and social aspects. The system has

two functional requirements, one to determine the

level of affectation and the other to determine the

direction of affectation. All AI system characteristics

of the evaluated product are listed in Table 2.

4.2.1 Defining, Designing and Planning the

Evaluation

As this is a use case for validating the functional

suitability environment for AI systems, the first three

activities of the evaluation process were carried out in

parallel and together with the development team of

the AI system used for the evaluation.

Initially, the functional suitability model for AI

systems was presented to the AI system development

team, and each of the measurement elements forming

the model (characteristic, sub-characteristic,

properties, etc.) were explained in detail. Then, a

planning of the activities to be carried out to address

the case study was elaborated.

Hourly load curve incremental values

Table 2: Characteristics of the assessed AI system.

System Type Fuzzy Rule-based Deterministic

Nº Fuzzy Rules 266

Affected

Variables

14 variables, including:

 Season of the year.

 Temperatures: mean, minimum,

maximum.

 Precipitation.

 Hours of sunshine.

 Type of day: If it is a specific

holiday or the day before or after it,

if it is a weekend or if it is a

working day or if there was any

incident.

 Type of customer: Street lighting,

Family house, Garage, Business,

Sporadic, Church, Industrial

uilding.

Nº Requirements 2 Accuracy: >60%

Tests Case 2 Run:

338 runs of each

test case

Subsequently, work was undertaken with the

development team to identify the elements of the AI

system that would be included in the scope of the

evaluation and to ensure that these would allow the

evaluation of the selected AI system for the use case

of the evaluation environment to be addressed. The

elements that were identified to carry out the

evaluation were: the AI system deployed to be able to

be executed, requirements specification, semantic

rules, existing documentation of the application and

test cases.

In addition, in this initial activity, meetings were

also held with the AI system development team to

define the dataset needed for the execution of the test

cases, since several executions of the same test case

must be performed, as established in the evaluation

model (Oviedo, et al., 2024). Specifically, to create

the dataset for the test cases, it was decided that for

each of the 7 types of customers (indicated in Table

2) four dates would be selected at random for each

month of the year, with at least one of these dates

being on a weekend. This provided a representative

dataset that considered different aspects of each of the

variables affecting the AI system, as listed in Table 2.

4.2.2 Execute Evaluation

The next activity, the execution of the evaluation, was

the most important activity of the use case. It started

by recording the requirements specification and the

ICSOFT 2025 - 20th International Conference on Software Technologies

258

specification of the test cases (provided by the AI

System developer) in the measurement tool of the

evaluation environment. The 338 test cases were then

run on the system Prediction of total power

consumption for one type of customer. Once the two

test cases were executed with the different data

defined for the tests, the results of these executions

were recorded.

Table 3: Metric results in the measurement of Functional

Suitability.

METRICS MEASURE 1 MEASURE 2

Nº of Functional

Requirements Specified

2 2

No. of Requirements

Implemented

2 2

No. of Requirements

Tested

2 2

Coverage 34,61% 41,35%

No. of Exact Functional

Requirements

2 2

No. of Improved

Requirements

2 2

Specified Usage Targets 1 1

Correct Usage Targets 1 1

After recording the information in the

measurement tool, the base metrics used to carry out

the evaluation were obtained; these metrics are shown

in Table 3. After analysing the results of this first

measurement, it was found that most of the metrics

obtained adequate results by presenting that the

requirements implemented and tested were equal to

the number of requirements specified and the degree

of accuracy was slightly higher than that established

as requirements for the system, which we should

remember was 60%. However, most of the fuzzy

rules defining the AI model of the evaluated system

had not been covered by the test cases that had been

executed, despite having performed more than 300

executions of each of the test cases with different

data. Therefore, the development team carried out a

new review of the AI model rules to define new data

and to be able to perform a greater number of runs of

the two test cases defined for the AI system in order

to validate aspects defined in the rules that had not

been taken into account in the first measurement; a

total of 338 runs were performed with different data

for each of the test cases, thus achieving a slight

improvement in this second measurement. Table 3

shows the results of both measurements.

At this point, a meeting was held with the system

development team to analyse whether it was possible

to obtain new data to cover a greater number of tests,

given that after the second measurement only 41.35%

of the rules could be tested, given the planning of the

evaluation and the deadlines available to the

development team, it was decided to evaluate the

system with the data from the second measurement

and see what results were obtained. The results of the

second measurement were processed with the

evaluation tool, where the evaluation criteria

specified in the model were applied, which made it

possible to obtain the quality values of the properties,

sub-characteristics and finally the Functional

Suitability. Table 4 shows the quality results

obtained.

Table 4: Results obtained in the assessment of Functional

Suitability.

MEASURING ELEMENT VALUE

PROPERTIES

Completeness of Functional

Implementation

100

Completeness of Functional Tests 0

Functional Accuracy 100

Appropriate Functionality

100

Appropriate Learning

100

SUB-

CARACTERISTICS

Functional Completeness

Functional Correctness

100

Functional Appropriateness

100

Functional Adaptability

100

CARACTERISTIC: Functional Suitability

As can be seen, based on the quality values

achieved in the sub-characteristics, a result of 3 out of

5 has been reached for Functional Suitability.

Although not an optimal result, it is not a bad result

either because 3 out the 4 sub-characteristics have

reached the optimum value (100), that the system was

developed without considering that it was going to be

subjected to a functional suitability evaluation

process and the deadlines available to the AI system

development team.

4.2.3 Conclude the Evaluation

In the last activity of the evaluation process, the

evaluation report was generated, where the results

were collected and presented and delivered to the AI

system developer. The evaluation results were then

reviewed with the AI system developer.

The review of the evaluation results identified the

problem with the AI system in relation to Functional

Suitability, which was revealed by the quality values

obtained in the evaluation of the functional

completeness sub-characteristics and their properties.

The Functional Completeness sub-characteristic

reached an intermediate value due to the result

Towards Quality Assessment of AI Systems: A Case Study

259

obtained in one of its properties, namely Functional

Test Completeness, whose quality level was 0. The

reason for this result is because the defined test plan,

despite having more than 300 runs with different data,

had only managed to cover 41.35% of the semantic

rules defined for the AI system. Therefore, the testing

process to which the AI system has been subjected

does not allow us to ensure that it has been fully

tested.

Then, to conclude the evaluation, an analysis of

the problem encountered was carried out to identify

possible improvements that could be made to the AI

system in the future, where it was found that more

data was needed to be able to carry out a greater

number of test case executions. It was found that the

different temperatures that can occur throughout a

season were not considered. For example, it was

taken into account that in autumn and spring the days

would be mild and it has been found that there are

rules for these seasons where the temperature is cold

or hot.

This improvement was shared with the company

that owns the AI system, which found it interesting

and committed to carry out an in-depth analysis of the

semantic rules to improve its testing process for

future developments. However, in this case the

company decided not to address in the short term the

definition of new datasets to increase the number of

executions of the defined test cases mainly for two

reasons:

 The users of the AI system were satisfied with

the results obtained by the system and therefore

did not consider it necessary.

 They did not have enough staff and time to

tackle these tasks.

Finally, a review of the results of the case study

and all the general experience derived from it was

carried out to draw the conclusions presented in the

following section.

5 CONCLUSIONS AND FUTURE

WORK

The rise of Artificial Intelligence in all facets of

everyday life makes it necessary to employ software

engineering techniques, such as quality assessment of

these systems. While traditional software engineering

methods and tools are a starting point for AI systems,

they must be adapted to the particularities of these

systems. Therefore, this article presents a use case of

an evaluation of an AI system using a functional

suitability assessment environment, which has been

developed considering the particularities of these

systems.

Through this case study it has been possible to

verify the applicability of the evaluation environment

developed to evaluate the functional suitability in AI

systems, specifically in deterministic AI systems

based on rules. It has also allowed us to verify the

permissibility of the evaluation environment, and the

need to review and establish more restrictive

thresholds that prevent good levels of functional

suitability (3 out of 5) from being reached with values

of Property to 0.

On the other hand, the use case has allowed us to

detect problems related to the use of the evaluation

environment. Among the problems detected is the

fact that the organisations of these systems do not

have a formal specification of the requirements and

test cases built to test the AI system, due to the

immaturity in the application of software engineering

processes in the development of AI systems. Another

difficulty encountered in addressing the assessment

of functional suitability in AI systems is the large

number of executions that must be performed on the

defined test cases. This requires a set of input data to

ensure that the rules defined for the AI system are

covered with a sufficient degree of confidence.

Another drawback that has been detected during the

use case is related to the different technologies used

for the development of AI systems, due to the fact that

aspects specific to that technology can make the

evaluation more complex. Specifically, in this case,

where the system evaluated is a deterministic system

based on rules, there have been aspects that have

facilitated the evaluation. On the one hand, defining

the AI model by means of rules allows us to know the

degree of coverage of the tests carried out and, on the

other hand, as the results of the system do not change

as long as the set of rules of the AI model is not

modified, it has facilitated the evaluation of the

subcharacteristics of functional adaptability of the

system, due to the fact that it has only been necessary

to carry out one execution of the set of test cases

defined for the system, as there were not going to be

any changes in the results. However, these aspects

should be considered for future evaluations where the

AI systems are not defined by rules, as a greater

number of test case executions will be necessary,

which could be a possible cause of deviation from the

evaluation planning.

With the completion of this study and the

conclusions reached, the door is open to carry out

other use cases with AI systems with different

characteristics, with the aim of adapting and

ICSOFT 2025 - 20th International Conference on Software Technologies

260

improving the functional suitability assessment

environment for AI systems.

ACKNOWLEDGEMENTS

This research has been supported by the ADAGIO

project (Alarcos' DAta Governance framework and

systems generation) financed by JCCM Consejería de

Educación, Cultura y Deportes, and FEDER funds

(SBPLY/21/180501/000061), the CASIA project:

Calidad de Sistemas de Inteligencia Artificial (EXP.

13/23/IN/002) funded by the Junta de Comunidades

de Castilla La Mancha, the AIMM project: Artificial

Intelligent Maturity Model (EXP. 13/24/IN/057)

funded by the Junta de Comunidades de Castilla La

Mancha and the UNION project within the

framework of the UCLM Research Plan, 85% co-

financed by the European Regional Development

Fund (ERDF) UNION (2022-GRIN-34110).

REFERENCES

Borg, M., 2021. The AIQ meta-testbed: pragmatically

bridging academic AI testing and industrial Q needs.

Vienna, Springer, pp. 66-77.

Calero, C., Moraga, M. & Piattini, M., 2021. Software

Sustainability. 1 ed. Cham: Springer.

Comisión Europea, 2020. LIBRO BLANCO sobre la

Inteligencia Artificial. Un enfoque europeo orientado a

la excelencia y confianza, Bruselas: COM.

Felderer, M. & Ramler, R., 2021. Quality Assurance for AI-

Base Systems: Overview and Challeges (Introduccion

to Interactive Session).. Viena, Springer, pp. 33-42.

Gao, J., Tao, C., Jie, D. & Lu, S., 2019. What is AI software

testing? and Why.. San Francisco, IEEE.

Hamada, K. I. F. M. S. M. M. et al., 2020. Guidelines for

Quality Assurance of Machine Learning-based

Artificial Intelligence.. Pittsburgh, USA, Word

Scientific, pp. 335 -341.

Horneman, A., Melligener, A. & Ozkava, I., 2019. AI

Engineering: 11 Foundational Practises.

Recommendations for decision makers from expert in

software engineering, cibersecurity, and applied

artificial intelligence, Pittsburgh: Carnegie Mellon

University Software Engineering Institute.

ISO/IEC, 2011. ISO/IEC 25010:2011. Ginebra:

International Organization for Standardization.

ISO/IEC, 2023. ISO/IEC 25040:2023. Ginebra:

International Organization for Standardization.

ISO/IEC, 2023. ISO/IEC 25059:2023. Ginebra:

International Organization for Standardization.

ISO/IEC, 2023. ISO/IEC 5338:2023. Ginebra: International

Organization for Standardization.

Lavazza, L. & Morasca, S., 2021. Understanding and

Modeling AI-Intensive System Development. Madrid,

IEEE, pp. 55-61.

Márquez, R., Rodríguez, M., Romero, F. P. & Piattini, M.,

2024. An Artificial Intelligence Maturity Assessment

Framework Based on Interantional Standard. Enviado a

Engineering Applications of Artificial Intelligence.

Maslej, N. et al., 2024. Intelligence Index Report 2024,

Stanford: Stanford University.

Navas, R., 2016. Modelos de Calidad para servicios en el

Cloud. Valencia: Universidad Politécnica de Valencia.

Oviedo, J., 2024. CSIA: Evaluación de la Adecuación

Funcional de Sistemas de Inteligencia Artificial,

Ciudad Real: Grupo Alarcos UCLM.

Oviedo, J., Rodríguez, M. & Piattini, M. 2024. An

Environment for the Assessment of the Functional

Suitability of AI Systems. Pisa, Springer, pp. 21-34.

Oviedo, J., Rodríguez, M. & Piattini, M., 2024. Entorno

para la Evaluación de la Adecuación Funcional en

Sistemas IA. Curitiba (Brasil), Curran Associates, pp.

364-371.

Oviedo, J. y otros, 2024. ISO/IEC quality standards for AI

engineering. Computer Science Review, 54(1), p.

100681.

Piattini, M., 2024. Gobierno, Gestión y Calidad de la

Inteligencia Artificial. 1 ed. Puertollano (Ciudad Real):

Grupo Alarcos.

Rahman, A., 2019. Quality Consideration for e-Learing

System based on ISO/IEC 25000 Quality Standards.

Putrajaya (Malasia), DBLP Computer Science

Bibliography, pp. 12-17.

Rodríguez, M., Oviedo, J. & Piattini, M., 2016. Evaluation

of Software Product Functional Suitability. A Case

Study. Software Quality Professional, 18(3), pp. 18-29.

Serban, A., Koen, v. d. B., Holger, H. H. & Visser, J., 2020.

Adoption and Effects of Software Engineering Best

Practise in Machine Learning. Bari (Italia), Computer

Science, Engineering.

van Oort, B., Cruz, L., Aniche, M. & van Deursen, A., 2021.

The Prevalence of Code Smells in Machine Learning

projects. Madrid, IEEE, pp. 35-42.

Towards Quality Assessment of AI Systems: A Case Study

261