Perception and Acceptance of an Autonomous Refactoring Bot

Marvin Wyrich

1 a

, Regina Hebig

2 b

, Stefan Wagner

1 c

and Riccardo Scandariato

2 d

University of Stuttgart, Germany

Chalmers, University of Gothenburg, Sweden

Keywords:

Software Bot, Refactoring, Human Agent Interaction, Collaborative Development, Software Engineering.

Abstract:

The use of autonomous bots for automatic support in software development tasks is increasing. In the past,

however, they were not always perceived positively and sometimes experienced a negative bias compared to

their human counterparts. We conducted a qualitative study in which we deployed an autonomous refactoring

bot for 41 days in a student software development project. In between and at the end, we conducted semi-

structured interviews to ﬁnd out how developers perceive the bot and whether they are more or less critical

when reviewing the contributions of a bot compared to human contributions. Our ﬁndings show that the bot

was perceived as a useful and unobtrusive contributor, and developers were no more critical of it than they

were about their human colleagues, but only a few team members felt responsible for the bot.

1 INTRODUCTION

Refactoring has been deﬁned as “the process of

changing a software system in such a way that it does

not alter the external behavior of the code yet im-

proves its internal structure” (Fowler, 1999, p. xvi). It

is essential to continuously go through this process to

improve the quality and maintainability of the source

code, thereby increasing the productivity of develop-

ers (Moser et al., 2008) and avoiding the accumula-

tion of technical debt in the system (Avgeriou et al.,

2016). Some static code analysis tools identify refac-

toring opportunities in the form of code smells, which

are functioning program code that is poorly struc-

tured (Fowler, 1999). Manually removing these code

smells is error-prone, tedious and sometimes chal-

lenging (Bavota et al., 2012; Kim et al., 2011, 2012).

The effort and associated costs may also become too

high to be justiﬁed to a client or project manager,

which prevents developers from having the necessary

resources or organizational support to manually im-

prove the quality of the source code (Yamashita and

Moonen, 2013).

To make the removal of code smells more efﬁcient

and more effective, Wyrich and Bogner (2019) imple-

https://orcid.org/0000-0001-8506-3294

https://orcid.org/0000-0002-1459-2081

https://orcid.org/0000-0002-5256-8429

https://orcid.org/0000-0003-3591-7671

mented an autonomous bot that automatically refac-

tors code and submits its changes to the development

team for asynchronous review in the form of pull re-

quests. A recent study has shown that such changes

with automatically ﬁxed code smells are generally ac-

cepted by developers (Marcilio et al., 2019). How-

ever, pull requests in that study were proposed man-

ually and only a single time after prior consultation

with the project maintainers. Furthermore, we know

that contributions are not only evaluated on their con-

tent, but also on the social characteristics of the con-

tributor (Terrell et al., 2017; Ford et al., 2019). In

the case of contributing bots, identifying them as bots

can be sufﬁcient to observe a negative bias compared

to contributions from humans (Murgia et al., 2016).

We therefore introduced the Refactoring-Bot in a

software development team and had it continuously

contribute refactoring suggestions for 41 days. The

developers knew it was a bot and could interact with

it. In between and at the end, we conducted semi-

structured interviews to answer the following research

questions:

• RQ1: How do developers perceive the participa-

tion of a refactoring bot in their project?

• RQ2: Are developers more or less critical when

reviewing the contributions of a refactoring bot

compared to human contributions?

• RQ3: How do developers think an autonomous

refactoring bot should ideally be designed?

Wyrich, M., Hebig, R., Wagner, S. and Scandariato, R.

Perception and Acceptance of an Autonomous Refactoring Bot.

DOI: 10.5220/0009168803030310

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 1, pages 303-310

ISBN: 978-989-758-395-7; ISSN: 2184-433X

303

2 RELATED WORK

Wyrich and Bogner (2019) describe the Refactoring-

Bot as “an autonomous bot that integrates into the

team like a human developer via the existing ver-

sion control platform”. It currently supports a hand-

ful of refactoring operations to eliminate code smells

reported by the static code analysis tool SonarQube.

These operations are removing unused method pa-

rameters, unused private ﬁelds and commented-out

code, correcting the wrong order of modiﬁers, adding

missing override annotations and immediately return-

ing an expression instead of assigning it to a new vari-

able. It is also possible to interact with the bot via

comments in its pull requests, for example to instruct

further refactoring operations. While the authors have

carefully described their design decisions and discuss

potential success factors for acceptance among devel-

opers, an evaluation of the bot is yet missing.

Wessel et al. (2018) analyzed 351 Open-Source

projects and found that 93 (26%) use bots which

complement other developers’ work. The authors in-

terviewed project maintainers to investigate, among

other things, how contributors and integrators per-

ceive bots during the pull request submission pro-

cess. Most respondents perceived bots as helpful for

most of the tasks and more than 90% of them high-

lighted the relevance of quality assurance tasks. How-

ever, among 14 identiﬁed bots for code or pull request

review there was no bot that automatically provides

refactoring suggestions.

Spence et al. (2014) conducted an experiment in

which participants were either told that they are go-

ing to communicate with a human or that their inter-

action partner is going to be a robot. They then mea-

sured the uncertainty about the upcoming interaction,

anticipated interpersonal liking, and social presence.

Spence et al. (2014) found that participants who be-

lieved they were to communicate with another person

had higher expectations of liking, lower levels of un-

certainty, and higher expectations of social presence

than those who believed they were to communicate

with a robot.

Related to the liking of bots compared to hu-

mans, Murgia et al. (2016) have found in an experi-

ment with developers on Stack Overﬂow that an an-

swer bot is perceived signiﬁcantly more negatively

when it reveals its identity as a bot. Developers rated

the answers of the supposed human as more positive

compared to the identical bot, which revealed its iden-

tity. This attitude towards bots could also have a de-

cisive inﬂuence in our evaluation study. We address

this in particular with RQ2.

Marcilio et al. (2019) evaluated the acceptance

of automatic refactoring proposals. They ﬁxed code

smells in 12 projects and proposed 920 ﬁxes in 38

pull requests. 84% of the pull requests were ac-

cepted, 95% of them without modiﬁcations. The code

changes were performed automatically, but proposed

manually and only in a short period of time to projects

that had previously expressed interest. The reviews

of the maintainers and correction requests were also

responded to manually. This differs from the continu-

ous work with an autonomous bot, which submits this

kind of code changes and is limited in its interaction

possibilities.

3 METHODS

To answer the research questions we had the

Refactoring-Bot work in a student software develop-

ment project over a period of 41 days and without

manual intervention of the authors. After 11 days,

intermediate interviews were conducted with the de-

velopers who had interacted with the bot during that

time, and based on the interview results, we modiﬁed

the parameters of the Refactoring-Bot for the remain-

ing 30 days. After 41 days the ﬁnal interviews with

all project participants took place.

3.1 Participants and Project

The team consisted of 11 bachelor students of Soft-

ware Engineering, of whom eight were male and three

were female. As part of their studies, they must par-

ticipate in a six-month joint software project. In this

case, they had to further develop an existing code base

with about 18k SLOC. During the ﬁrst two months the

students were introduced to the project and introduced

each other to technologies and methods. This was fol-

lowed by the development period, during which the

bot was also introduced. Students were in the fourth

or ﬁfth semester of their studies and had already de-

veloped software together in a smaller team in the

past.

The backend of the software to be developed was

written in Java and the code was hosted as a pri-

vate GitLab project. We use the term “pull request”

throughout the paper, meaning suggestions for code

changes, although GitLab’s terminology uses the term

“merge request”. The functionality is the same.

3.2 Research Design

At the beginning of the six-month project, we asked

the students if they would agree to the introduction

of a refactoring bot to the team at a later date as part

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

304

of a study. About two months later, we informed the

team that the bot would from now on help to improve

the code quality and that we would ask them for their

feedback later.

The ﬁrst phase began, in which we had the bot

create exactly one pull request per day at 3pm. The

bot also checked every minute to see if there were

any new comments on its pull requests that it had

to respond to. After 11 days, we conducted semi-

structured interviews with the only two participants

who had interacted with the bot up to that point to

capture the ﬁrst impression of the participants and

collect optimization suggestions for the second and

longer phase of the study.

The ﬁndings from the 45-minute interim interview

with both participants gave us initial answers to the

research questions and a few of the suggestions for

improvement made by the interviewees could be im-

plemented immediately before the start of the second

phase. From then on the bot created pull requests ev-

ery Tuesday and Thursday between 9am and 6pm ev-

ery half hour, if the number of already open pull re-

quests by the bot was less than four. The timing can

be explained by the fact that the team meetings always

took place on Tuesdays and Thursdays and that it was

assumed that all team members would then be on-site

and could better focus on the pull requests during this

period.

After the ﬁrst phase, we also slightly changed the

description of the pull requests to include a link to

the list of available commands and interaction op-

tions. This was requested by the participants in the

interviews. An example pull request description cre-

ated by the bot is shown in Figure 1. The team was

informed about all changes before the second phase

started.

After 41 days the operation of the bot ended and

we conducted semi-structured interviews again. This

time nine people took part, even though not all of

them were proven to have reviewed a pull request.

However, to evaluate the acceptance of the bot, it is

equally important to discuss the attitude and experi-

ence of those who consciously or unconsciously did

not interact with the bot. On average, the interviews

lasted 20 minutes.

3.3 Interview Guideline

We conducted the interim interview and the ﬁnal in-

terviews following the same procedure. Participants

were invited and informed about the background of

the interview and then had the choice to sign a con-

sent form or not to participate. The interviews were

recorded, then transcribed, manually analysed, trans-

lated into English and summarised with regard to an-

swering the research questions.

During the interview process, we were guided by

an interview guideline. To not interrupt the ﬂow of the

conversation, we adapted the order of the questions to

the respective answers. Occasionally the interviewees

also came back to previously asked questions at a later

time and added something to their comments.

The contents of the interview guidelines only dif-

fered slightly between the interim interview and the

ﬁnal interviews and both served to answer the three

research questions. While the interim interview was

about gaining a ﬁrst impression and starting into

the second phase with small and immediately imple-

mentable optimizations, these changes were speciﬁ-

cally addressed in the ﬁnal interviews. The interview

guidelines were structured as follows:

The ﬁrst question encouraged the interviewees to

reﬂect openly on their interaction with the bot. A sec-

ond question followed and asked directly how the de-

velopers felt about the bot’s behaviour and its pull

requests in terms of quality, usefulness, frequency

and timing. In the ﬁnal interviews, we mentioned

as a third point that we had experimented during the

study on frequency and timing and we asked how the

study participants perceived this and which variant

they would prefer. We also pointed out the different

observations in the two phases (see section 4.2) and

asked the participants for their opinion. With the an-

swers of the participants to these three questions, we

have gained the most insights to answer RQ1. The

fourth question served to answer RQ2 and aimed at

how the participants perceived the pull requests of

the bot compared to those of human developers and

whether they experienced a difference in trust, for ex-

ample. The ﬁfth and last question was derived from

RQ3, whether the interviewee wanted to use the bot

in the future and if so, how the ideal refactoring bot

would look like in their future team.

4 RESULTS

In the following we describe the results of our study,

grouped by each research question.

4.1 Perception of the Bot (RQ1)

“The problem with our IDE is that it points to

bad things, but some developers don’t want to

ﬁx those things or forget about it or are sim-

ply too lazy. And the bot would always do it.

That’s a real beneﬁt.” – P02

Perception and Acceptance of an Autonomous Refactoring Bot

305

Immediately return the expression assigned to variable 'result'.

Hi, I'm a refactoring bot. I found and fixed a code smell and think that it improves the readability of the code.

If something needs to be changed, you can commit to this merge request or tell me what to change in a line specific

comment inside the 'Changes' tab. Here is a list of commands I can understand. Do not forget to tag me ﴾using @﴿

inside the comment.

If you are okay with the changes you can accept the merge request as usual.

Open

Opened 3 minutes ago by Samantha Bo

Edit Close merge request



Figure 1: Description of a pull request proposed by the Refactoring-Bot, Samantha Bo.

The team members perceived the bot and its refactor-

ing suggestions as useful and repeatedly stated that it

offered additional value because it tirelessly improved

code quality, which a developer may not be able to

do because other things seem to be more important.

The bot would break down a big block of refactoring

work and suggest refactoring from time to time while

at the same time working on new functionality would

be possible. The bot would also behave unobtrusively.

One participant was amused that the ﬁrst pull re-

quest of the bot changed code that the team had never

touched. Otherwise, the quality of the suggestions

was found to be good and correct.

Overall, the bot was not often communicated with

via comments in pull requests. However, one partici-

pant noticed the limited interaction possibilities neg-

atively, because he expected the Refactoring-Bot to

have simple communication skills he knew from chat

bots. In the way the bot describes itself, one could

assume that it is possible to talk to it in the same way

as with an intelligent virtual assistant. Another par-

ticipant wished that one could ask the bot why ex-

actly it made a particular change as part of its refac-

toring. For example, it could communicate that it had

to change method calls as a result of removing an un-

used parameter from a signature. One could also bet-

ter understand in larger refactorings why something

was implemented in a certain way and perhaps even

learn something.

Although we did not ask for it, two interviewees

commented that they liked the bots name and proﬁle

picture.

RQ1. The bot was perceived as a useful and unob-

trusive contributor. Interaction possibilities should

be extended to better understand speciﬁc changes

and simple chatbot functionalities would be ex-

pected.

4.2 Acceptance of the Bot (RQ2)

“People also make mistakes sometimes, and

we have proven this often enough.” – P04

Before we answer the question of whether develop-

ers are more or less critical about the bot’s contribu-

tions than those of humans, we investigate the overall

acceptance rate of its pull requests. Figure 2 shows

that a total of 14 pull requests were created, of which

eleven were created in the ﬁrst phase and three in the

second phase, which started on day 16. Ten pull re-

quests were accepted (71.4%) and four were still open

at the end of the study period (28.6%). In two of them,

the bot removed an unused method parameter that ac-

tually should have been used in the method body. The

bot unknowingly pointed out a defect to the develop-

ers that could not be corrected in the short term. They

therefore intentionally left the pull requests open as a

reminder.

As described in 3.2, during the ﬁrst eleven days

one pull request was created each day. After that, pull

requests were created every half hour on Tuesdays

and Thursdays, as long as less than four were open

at the same time. While on average two accepted pull

requests per week and no rejections indicate a gen-

eral acceptance of the Refactoring-Bot, the change

after the ﬁrst phase in the frequency of suggestions

and the limitation of simultaneously open ones led

to a noticeable decrease in the number of processed

pull requests. As a reminder, these changes were im-

plemented based on the responses of the team mem-

bers from the interim interviews. When we addressed

these changes in the ﬁnal interviews, we could ob-

serve different reactions. Two of the participants did

not notice the changes at all. Others liked the idea of

proposing pull requests when developers are on site,

and the idea of having a limit, since developers could

more easily motivate themselves to work through only

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

306

1 5 10 15 20 25 30 35 40

Figure 2: Pull requests created by the bot during the 41-day study period. Two solid vertical lines mark the end of the ﬁrst

phase and the beginning of the second phase. Each green bar represents the time frame from the creation of a pull request to

its acceptance. No pull requests were rejected. Yellow bars represent pull requests that were neither accepted nor rejected by

the end of the study period. Grey circles stand for times when the bot was prevented from creating a new pull request by the

limit of four simultaneously open ones.

a few open pull requests and it would be easier to keep

up with the reviews. However, the majority of respon-

dents did not consider this change useful in retrospect.

Even the participants of the intermediate interview,

who explicitly expressed their preference for limiting

the bot, changed their views to optimize the effective-

ness of the bot. If the bot had much to improve, it

should also make many suggestions for improvement.

As soon as many pull requests accumulate in the list,

the team would be interested in someone taking care

of these so that their own pull requests can be found.

The change in parameters revealed the underlying

problem that only few team members felt responsi-

ble for reviewing the bot’s pull requests. In the inter-

views, we learned that it was also difﬁcult for human

developers to ﬁnd reviewers for their own changes.

Completing one’s own tasks was a higher priority for

team members for various reasons. The bot simply

had the disadvantage of not being able to directly con-

tact the other team members and persuade them to re-

view its changes, just as the other team members did

for their own pull requests. In addition, some team

members were inexperienced with the version con-

trol system and did not dare to merge into the master

branch because they were afraid to break something.

In the end, however, most of the bot’s pull requests

have been accepted. We asked speciﬁcally whether

it made a difference for the developers to review the

changes of a bot compared to those of a human, for

example in terms of trust. The interviewees replied

that, in general, it makes no difference whether the

pull request comes from a bot or a human and that this

attitude has not changed over time. The bot would

suggest simple changes that the team could be sure

would not affect functionality. And both, human and

bot, could sometimes make mistakes.

Participants all felt that it was more important to

look at the content of a pull request and to review

the content thoroughly. However, some interviewees

also noted that their way of approaching the review

would depend on their previous experience with the

person or bot who proposed the changes. “There are

people where I look at the functionality and do not

have to worry about code quality. But then there are

some people for which I have to have a closer look at

their pull requests, because of my recent experience

with their work” – P07. The same would apply to

the Refactoring-Bot. Furthermore, the changes of a

bot would be reviewed with a different focus. They

would know the contributor was a bot that, in turn,

would not know what the original code was for.

Therefore, they primarily checked the code changes

for logical errors. The only danger they saw was that

the bot might suggest simple refactorings for too long

and at some point will propose something riskier that

would then not be reviewed sufﬁciently well.

RQ2. Ten out of 14 pull requests were accepted,

four were still open at the end of the study pe-

riod. Only few people felt responsible for reviewing

pull requests. The bot was disadvantaged because

it could not approach individual team members di-

rectly and ask them for a review. However, the par-

ticipants were neither more nor less critical about

the bot than about their human teammates. The pre-

vious experience with the contributor was decisive

for the thoroughness of the review and changes of

the bot were mostly examined for logical errors.

Perception and Acceptance of an Autonomous Refactoring Bot

307

4.3 The Ideal Refactoring Bot (RQ3)

“Later, in professional life, I am sure someone

will take care of it.” – P08

All interviewees explicitly responded that they would

like to use the bot in a future project. The answers to

the question what the ideal refactoring bot would look

like can be summarized in three points.

First, the frequency of pull requests should be

carefully evaluated. Some consider a ﬁxed limit of

open pull requests to be useful, but would probably

have set it higher than four and in relation to the num-

ber of team members. Others generally argued against

a limit, since these were small changes and the code

quality could only be improved effectively without a

limit.

Second, there was the request that the changes of

the bot should be grouped in a meaningful way, so

that less pull requests but bigger ones are created. At

the beginning smaller pull requests were good to get

used to the bot. Later, however, grouping changes has

the advantage that fewer refactoring commits would

appear in the history and it would be more worthwhile

to test the code of the pull request before merging it

into the master branch.

By far the most frequently raised point was that

one or more people within the team should be made

responsible for reviewing the bot’s pull requests in

a coordinated way. Respondents could also imagine

the bot itself assigning someone to review, at best the

developer who is familiar with the part of the code or

even the one who caused the code smell. In any case,

it would be important to ﬁnd a responsible person,

and some of the team members also think that this

would be easier in a more professional environment.

RQ3. The participants of the study were all in

favour of using the bot in a future project. It was

controversial how often the bot should make sugges-

tions. Participants considered it appropriate to group

changes into fewer and slightly larger pull requests,

which again would affect the preferred frequency of

the bot’s suggestions. They agreed that the success-

ful operation of the bot also required someone to

feel responsible for reviewing the pull requests.

5 DISCUSSION

In the following, we compare our results with those

from existing literature, describe the limitations of our

study and complete the chapter with implications that

raise interesting questions for future work.

5.1 Results

We saw that the bot was accepted by the developers,

and they explicitly stated that they were neither more

nor less critical in reviewing the contributions of the

bot compared to those of humans. This is contrary to

the results of Murgia et al. (2016), who had observed

a clear negative bias against their answer bot on Stack

Overﬂow.

The approval of 71.4% of the proposed pull re-

quests and the rejection of none at the same time con-

ﬁrms the general acceptance and is in line with the re-

sults of Marcilio et al. (2019) that such code changes

are accepted by developers.

Social aspects of the contributor played a role in

recent studies (Terrell et al., 2017; Ford et al., 2019,

e.g.) and we can conﬁrm that there are at least a

few consciously perceived differences from the de-

veloper’s point of view as to whether the contrib-

utor is human or bot. In the latter case, the pro-

posed changes were mostly reviewed for logic errors.

Furthermore, two participants commented unsolicited

that they liked the proﬁle picture and the name of the

bot, which conﬁrms the ﬁndings to the extent that so-

cial aspects are actually perceived. We also found that

previous experience with the bot plays a role in evalu-

ating its code changes. According to the participants,

however, this is no different with humans.

Finally, we have seen that there is potential for im-

provement in the implementation of the bot. Wessel

et al. (2018) analyzed the usage of bots in open source

projects. Most often respondents in their study pro-

posed to make bots smarter and enhance user inter-

action. Additionally, they mentioned the lack of in-

formation on how to interact with them. This is con-

sistent with our ﬁndings that developers were unsure

how to communicate with the bot and expected it to

have at least the capabilities of a simple chat bot. Ad-

ditionally, the most common suggestion to improve

bots in their study from an integrator perspective was

that notiﬁcation and awareness should be improved.

This was necessary, for example, to remind develop-

ers of unresolved issues.

Improving awareness, especially the sense of re-

sponsibility for the bot’s pull requests, might be one

of the most important insights for the successful use

of the Refactoring-Bot. As long as manual interven-

tion by developers is necessary and the intrinsic mo-

tivation of individuals is not high enough, this poses

a threat to the acceptance of such maintenance bots.

One explanation could be the diffusion of responsibil-

ity, a sociopsychological phenomenon in which a per-

son may feel less responsible for actions or inactivity

when others are present (Kassin et al., 2019).

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

308

In our scenario, all participants found the bot use-

ful and understood its value. Because everyone could

have reviewed the pull requests, few people might

have felt responsible for it. Of the nine team mem-

bers interviewed, six stated that they relied on a few

speciﬁc people because they had also reviewed most

of the pull requests of the other team members.

5.2 Limitations

The results of this study should be seen in the light of

some limitations. First, the participants in our study

were relatively inexperienced in developing software

when compared to professional software developers.

This was particularly apparent from the participants’

responses that some were uncertain about processing

pull requests. In addition, the low sense of responsi-

bility for reviewing pull requests could be a limitation

that is generally associated with the study of students.

This might be different in a team of professionals.

Then, the development time of a software project

is typically longer than the time the bot was deployed

in our study. We were limited by the length of the

development phase in the student project and would

have preferred to deploy the bot for a longer period of

time. The long-term use of the bot needs to be inves-

tigated in future studies, as we do not know whether

perception and acceptance will improve or deterio-

rate over time and which design-critical aspects result

from the long-term use.

In addition, the limited duration of the study did

not allow us to wait for the open pull requests from

the ﬁrst phase to be processed. As a result, four pull

requests were already open at the beginning of the

second phase, thus limiting the comparability of the

two phases. Since we mainly made general statements

about the perception and acceptance of the bot over

the entire period, we do not consider the core ﬁnd-

ings to be affected by this. On the contrary, we only

came to an essential understanding of responsibility

issues because these pull requests from the ﬁrst phase

limited the further work of the bot.

The choice of the bot used in the study and its

current state of implementation also have limitations.

The Refactoring-Bot currently only supports compar-

atively simple refactorings. The review behavior as

well as the requirements for frequency and size of

pull requests can change with increasing complexity

of the refactorings. However, a look at the code smells

reported by static code analysis tools such as Sonar-

Qube shows that most of them require rather uncom-

plicated code improvements.

In general, the choice of a refactoring bot and

the context in which it was operated could be a rea-

son for some of the observed differences with related

work, for example with that of Murgia et al. (2016),

in which an answer bot was evaluated on Stack Over-

ﬂow, where the participating developers could as-

sume interaction only with other human developers.

This deviation from expectation could then have led

to the negative bias observed in the study (Spence

et al., 2014). In contrast, the participation of bots

in software projects is becoming increasingly com-

mon (Wessel et al., 2018) and may therefore generate

greater acceptance, as was the case in our study.

5.3 Implications

Developers May Accept Bots – Even if They Are

Identiﬁed as Such

We saw that the bot in our study was accepted by the

developers and even to the extent that the review of

the bot’s contributions was no more critical than that

of human developers. This should create conﬁdence

in the bot community that the automation of software

engineering tasks via a bot interface generally has the

potential to be accepted by developers. Since this has

been different in previous studies in other contexts,

the impact of a bot’s operating context on its design

and acceptance should be investigated more closely.

Bots Need to be Process-sensitive

In addition to context-dependent design, the question

arose as to how process-sensitive a bot must be. In

this case, the bot did not correspond to the process of

the team, which includes that the entity contributing

changes must ﬁnd a reviewer itself. This could easily

be resolved by automatically assigning a suitable re-

viewer to the bot’s pull requests. However, we do not

know if this would actually be enough, or if the un-

derlying problem is that few people feel responsible

for dealing with the bot. The smarter advice might be

to make it difﬁcult to ignore the bot when designing a

demonstrably useful bot whose effectiveness depends

on human intervention.

Bots Need to Be Smart

Finally, bots must be intelligent enough to provide de-

velopers with a satisfying experience of interaction.

This includes describing exactly how to interact with

them and that there exists a basic repertoire of chat bot

functionalities that are common in many applications

nowadays. The bot should also be able to explain its

activities upon request. In our case, the requirement

arose to be able to explain individual code changes of

a refactoring.

Perception and Acceptance of an Autonomous Refactoring Bot

309

6 CONCLUSION

A refactoring bot was developed in the past to over-

come the drawbacks of removing code smells manu-

ally and to make the process more efﬁcient and effec-

tive. As similar bots did not always meet with high

acceptance in the context of software engineering and

since we have little qualitative knowledge about the

perception and acceptance of development bots in

general, we conducted a qualitative study deploying

a refactoring bot to investigate how it is perceived and

accepted in a software development team.

In semi-structured interviews we found that the

bot is perceived as a useful and unobtrusive contribu-

tor. Its contributions are not reviewed more critically

than those of human developers but are more inten-

sively analyzed for logical errors. Although all team

members intend to continue using the bot, only a few

felt responsible for it during the study.

The results have implications for the design of

bots to ensure their successful use. Even a useful bot

will be ignored if its perceived beneﬁt for the indi-

vidual is not great enough and there is a diffusion of

responsibility in the team. Developers accept bots,

but they must be smart and adapted to the context and

process of the development team.

Future work could explore which context vari-

ables predict the acceptance of a bot and investigate

more bots qualitatively. Their long-term use should

be investigated along with how the developers’ per-

ception changes over time. In the context of refac-

toring bots, the effects of proposing more complex

and diverse refactorings on the behavior of developers

should be studied. The ﬁndings would also be of in-

terest to other maintenance bots which regularly pro-

pose very similar changes, possibly leading to a kind

of review blindness. Finally, the results can be un-

derstood as an indication that people appreciated that

a socially unpleasant task was taken over by the bot,

which was to remind people that code quality needs

to be improved. It would be interesting to further ex-

plore the potential of this type of bot contribution.

REFERENCES

Avgeriou, P., Kruchten, P., Ozkaya, I., Seaman, C., and Sea-

man, C. (2016). Managing Technical Debt in Software

Engineering. Dagstuhl Reports, 6(4):110–138.

Bavota, G., De Carluccio, B., De Lucia, A., Di Penta, M.,

Oliveto, R., and Strollo, O. (2012). When Does a

Refactoring Induce Bugs? An Empirical Study. In

2012 IEEE 12th International Working Conference on

Source Code Analysis and Manipulation, pages 104–

113. IEEE.

Ford, D., Behroozi, M., Serebrenik, A., and Parnin, C.

(2019). Beyond the code itself: How programmers

really look at pull requests. In Proceedings of the

41st International Conference on Software Engineer-

ing: Software Engineering in Society, ICSE-SEIS ’19,

pages 51–60, Piscataway, NJ, USA. IEEE Press.

Fowler, M. (1999). Refactoring: Improving the Design of

Existing Code. Addison-Wesley Reading.

Kassin, S., Fein, S., Markus, H. R., McBain, K. A., and

Williams, L. (2019). Social Psychology Australian &

New Zealand Edition. Cengage AU.

Kim, M., Cai, D., and Kim, S. (2011). An empirical inves-

tigation into the role of API-level refactorings during

software evolution. In Proceeding of the 33rd interna-

tional conference on Software engineering - ICSE ’11,

page 151, New York, New York, USA. ACM Press.

Kim, M., Zimmermann, T., and Nagappan, N. (2012). A

ﬁeld study of refactoring challenges and beneﬁts. In

Proceedings of the ACM SIGSOFT 20th International

Symposium on the Foundations of Software Engineer-

ing - FSE ’12, volume 3, page 1, New York, New

York, USA. ACM Press.

Marcilio, D., Furia, C. A., Bonifcio, R., and Pinto, G.

(2019). Automatically generating ﬁx suggestions in

response to static code analysis warnings. 19th SCAM.

Moser, R., Abrahamsson, P., Pedrycz, W., Sillitti, A., and

Succi, G. (2008). A Case Study on the Impact of

Refactoring on Quality and Productivity in an Agile

Team. In Balancing Agility and Formalism in Soft-

ware Engineering, pages 252–266. Springer Berlin

Heidelberg.

Murgia, A., Janssens, D., Demeyer, S., and Vasilescu, B.

(2016). Among the Machines: Human-Bot Interac-

tion on Social Q&A Websites. In Proceedings of the

2016 CHI Conference Extended Abstracts on Human

Factors in Computing Systems - CHI EA ’16, pages

1272–1279, New York, New York, USA. ACM Press.

Spence, P. R., Westerman, D., Edwards, C., and Edwards,

A. (2014). Welcoming our robot overlords: Initial ex-

pectations about interaction with a robot. Communi-

cation Research Reports, 31(3):272–280.

Terrell, J., Koﬁnk, A., Middleton, J., Rainear, C., Murphy-

Hill, E., Parnin, C., and Stallings, J. (2017). Gender

differences and bias in open source: Pull request ac-

ceptance of women versus men. PeerJ Computer Sci-

ence, 3:e111.

Wessel, M., de Souza, B. M., Steinmacher, I., Wiese,

I. S., Polato, I., Chaves, A. P., and Gerosa, M. A.

(2018). The Power of Bots: Understanding Bots in

OSS Projects. Proceedings of the ACM on Human-

Computer Interaction, 2(CSCW):1–19.

Wyrich, M. and Bogner, J. (2019). Towards an autonomous

bot for automatic source code refactoring. In Pro-

ceedings of the 1st International Workshop on Bots in

Software Engineering, BotSE ’19, pages 24–28, Pis-

cataway, NJ, USA. IEEE Press.

Yamashita, A. and Moonen, L. (2013). Do developers

care about code smells? An exploratory survey. In

2013 20th Working Conference on Reverse Engineer-

ing (WCRE), pages 242–251. IEEE.

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

310