“Give Me the Code”: Log Analysis of First-Year CS Students’

Interactions with GPT

Pedro Alves

and Bruno Pereira Cipriano

Lus

ofona University, Portugal

Keywords:

Large Language Models, Programming, Gpt, Interaction Log Analysis.

Abstract:

The impact of Large Language Models (LLMs) in computer science (CS) education is expected to be profound.

Students now have the power to generate code solutions for a wide array of programming assignments. For

ﬁrst-year students, this may be particularly problematic since the foundational skills are still in development

and an over-reliance on generative AI tools can hinder their ability to grasp essential programming concepts.

This paper analyzes the prompts used by 69 freshmen undergraduate students to solve a certain programming

problem within a project assignment, without giving them prior prompt training. We also present the rules

of the exercise that motivated the prompts, designed to foster critical thinking skills during the interaction.

Despite using unsophisticated prompting techniques, our ﬁndings suggest that the majority of students suc-

cessfully leveraged GPT, incorporating the suggested solutions into their projects. Additionally, half of the

students demonstrated the ability to exercise judgment in selecting from multiple GPT-generated solutions,

showcasing the development of their critical thinking skills in evaluating AI-generated code.

1 INTRODUCTION

Large Language Models (LLMs) have been shown to

have the capacity to generate computer code from nat-

ural language speciﬁcations (Xu et al., 2022; Destefa-

nis et al., 2023). Currently, there are multiple avail-

able LLM-based tools which display that behaviour.

Two examples of such tools are OpenAI’s ChatGPT

and Google’s Gemini

The implications of such tools for CS education

are obvious: students now have access to tools that

can generate code to solve programming assignments,

with the capacity to obtain full marks or close to it

(Prather et al., 2023; Savelka et al., 2023; Cipriano

and Alves, 2024b; Finnie-Ansley et al., 2023; Ouh

et al., 2023; Reeves et al., 2023).

There has been extensive research into how com-

puter science teachers should respond to LLMs,

adapting their teaching methods, assessments, and

more (Lau and Guo, 2023; Leinonen et al., 2023; Lif-

ﬁton et al., 2023; Finnie-Ansley et al., 2022; Sridhar

et al., 2023). Some resist (ﬁght), contemplating ways

https://orcid.org/0000-0003-4054-0792

https://orcid.org/0000-0002-2017-7511

https://chatgpt.com

https://gemini.google.com

to prevent students from using chatbots like ChatGPT,

such as blocking access or employing detection tools

for AI-generated text with questionable effectiveness

(OpenAI, 2023). Others embrace this new paradigm,

adapting exercises so that students are encouraged to

make the most of LLMs, with presentations/discus-

sions or non-text-based prompts (Denny et al., 2024),

or analyzing the tool’s capacity to help students (Hel-

las et al., 2023).

Regardless of the instructional strategy adopted by

teachers, be it one of resistance or acceptance, stu-

dents will inevitably turn to ChatGPT, perhaps with

a certain degree of naivety, lacking knowledge about

prompting techniques and an exaggerated faith in its

responses. Moreover, most of their interactions will

occur without direct teacher oversight. With this

study, we want to understand what happens in these

cases: whether they will succeed in their endeavors

or fail miserably, and how can teachers help in their

journey.

We investigate two research questions:

RQ1. Can ﬁrst-year students take advantage of LLMs

for code generation without any speciﬁc or formal

training?

RQ2. Are students able to incorporate solutions pro-

vided by ChatGPT into their projects?

This work makes the following contributions:

198

Alves, P. and Cipriano, B. P.

"Give Me the Code": Log Analysis of First-Year CS Students’ Interactions with GPT.

DOI: 10.5220/0013194400003932

In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025) - Volume 2, pages 198-207

ISBN: 978-989-758-746-7; ISSN: 2184-5026

• Proposes an exercise template that teachers may

use to foster their students’ critical thinking skills

when interacting with LLM-based tools;

• Analyzes the interactions of 69 ﬁrst-year CS stu-

dents with ChatGPT to produce code using the

aforementioned template, but without having re-

ceived any formal training on the matter. This

analysis is based on log ﬁles provided by the stu-

dents documenting their attempts to solve a spe-

ciﬁc exercise within the context of their course

project;

• Presents the results of a post-exercise survey

(N=52) to ﬁnd out the students’ perceptions on its

usefulness.

2 RELATED WORK

Some recent studies evaluated students’ interactions

with LLMs for code generation. One example,

(Denny et al., 2024) analyzed the interactions be-

tween students and ChatGPT in several dimensions

such as prompt lengths and number of attempts, fol-

lowing a methodology where instead of asking the

students to write the code themselves, they had to

write prompts to generate the code for them. In

(Prather et al., 2023), researchers studied how ﬁrst

year students use Copilot (a GPT-based code gen-

eration code, trained on code publicly available on

GitHub) to solve a typical assignment on an intro-

ductory programming course, and found that these

novice students struggle to understand and use Copi-

lot, and are wary of the tool’s implications (i.e. such

as not being able to understand the generated code),

but are optimistic about integrating the tool into their

future development workﬂow. Another study (Prasad

et al., 2023) analyzed the GPT interactions of students

in a upperlevel course on applied logic and formal

methods, using an IDE plugin. The authors of (Babe

et al., 2023a) asked 80 students with a single semester

of programming background to write prompts for 48

problems and found that the students’ prompts can be

ambiguous to LLMs. Finally, (Kazemitabaar et al.,

2023) analyzed the interaction logs of learners aged

10 to 17, recruited from programming bootcamps,

and veriﬁed how the availability of Codex (a GPT-

based LLM) impacted their ability to independently

learn Python. They found evidences of learner’s self-

regulation, with some learners actively adding code to

test the AI-generated code, as well as over reliance,

with some learners submitting the assignment’s in-

structions in a single prompt.

Our study differs from these studies since all of

them analysed student’s interactions using introduc-

tory programming exercises as the basis for the study.

In our study, students had to generate code for a spe-

ciﬁc part of a large project (developed over the course

of approximately 3 months), meaning that GPT’s gen-

erated code would have to be integrated with the re-

maining of the students’ code, thus having an extra

layer of complexity.

3 EXERCISE TEMPLATE

This section describes a template for programming

exercises to be solved using Generative AI tools (e.g.,

ChatGPT, Gemini). The structured approach culti-

vates critical thinking (Naumova, 2023) and requires

no prior training in prompt engineering techniques.

The problem can be stated in any form (textual,

diagram, etc..) and the students must interact with an

LLM to get a good solution, that they can efﬁciently

apply (e.g., incorporate in their course project). Since

LLMs are prone to failing (Babe et al., 2023b; Denny

et al., 2023), students must ask the LLM for two al-

ternative implementations, a well-known prompting

technique (Treude, 2023), which must be documented

within a log ﬁle, to be submitted alongside the exer-

cise solution. This log ﬁle must also include a con-

cluding section wherein the chosen implementation

must be elucidated, along with the rationale behind

the selection. In summary, students must follow these

steps:

1. State the problem by asking a question to Chat-

GPT;

2. Analyze the provided solution;

3. Ask ChatGPT for an alternative solution, eventu-

ally providing further information or constraints;

4. Compare both solutions and write a description of

their ﬁndings;

5. Select one of the versions (i.e. the best one);

6. Integrate the selected version into their own

project.

All these interactions should be recorded for fur-

ther analysis.

To ensure uniformity in format and to clarify the

expected nature of interaction, an exemplar log ﬁle

was provided (shown in Listing 1).

4 EXPERIMENTAL CONTEXT

This study was performed in the scope of a Data

Structures and Algorithms (DSA) course belonging to

"Give Me the Code": Log Analysis of First-Year CS Students’ Interactions with GPT

199

Listing 1: Log format example, supplied to the students.

Human : P l ea s e i m p leme n t a f u n c tion in Java

that can i d e n tify t he l a rgest number in an

array of in t e g er s .

-- - - --- -

ChatG P T : Sure t h i n g ! Here ’s a pos s i b le

im p l eme n t ati o n of a fu n c t i on in Java , which

finds the l a r g es t n u m b e r in an array of

inte g e r s :

public static i nt fin d L arge s t ( i nt [] n u m b e rs )

{

in t lar g e s t = n u m be rs [0]; // start w i th

th e first n u m be r in t he a r r a y

fo r ( int i = 1; i < num b e r s . le ng t h ; i + +)

{

if ( num b e r s [ i ] > la r g e s t ) {

large s t = n u m b e rs [i ]; // update

th e lar g e s t n u m b e r

}

return larg e s t ;

}

This f u n ction takes an array of in t e g e rs

(...)

-- - - --- -

Human : I mple m e n t an al t e rnat i v e v e rs ion of

this f u n ction tha t do esn ’ t i n i tial i z e the "

large s t " v a ri able w i t h th e first n u m be r in

th e array .

-- - - --- -

ChatG P T : C e rtain l y ! Here ’ s a p o ssibl e

alt e r nati v e J ava im p l eme n t ati o n to find the

large s t n u m b e r in an array of i n t e g ers

witho u t i n iti a l i zin g the l a r g e s t v a r iable

with the first number in the array :

public static i nt fin d L arge s t ( i nt [] n u m b e rs )

{

in t lar g e s t = I n t eg er . MIN_ V A L UE ; //

start w ith the s m alles t p o s s ib le v a l u e

fo r an i n t e ger

fo r ( int i = 0; i < num b e r s . le ng t h ; i + +)

{

if ( num b e r s [ i ] > la r g e s t ) {

large s t = n u m b e rs [i ]; // update

th e lar g e s t n u m b e r

}

return larg e s t ;

}

This f u n ction begins by s e t t in g (...)

-- - - ---

GR OUP ’ S C O N CLUS I O N : We chose the f i r s t

option beca u s e it avoids an extra co m p a riso n

at t he b eginn i n g of t he l oop ( th e l o o p ca n

start at po s i t io n 1 i n s t ead of po s i t ion 0) .

a Computer Engineering degree, during the 2022/23

school year. The course takes place in the second

semester of the ﬁrst year, which means that most stu-

dents have only been exposed to one semester of pro-

gramming.

The course’s main assignment is the development

of a project, typically a command line application

that performs queries on a very large data set, pro-

vided in the form of multiple CSV ﬁles. The queries

must be implemented using efﬁcient data structures

and algorithms—for example, some queries will re-

quire the usage of hash tables to achieve good perfor-

mance. Its development is split in two parts: in the

ﬁrst part, students mainly need to implement code for

reading and parsing the input ﬁles, and, in the second

part, they have to focus on implementing the different

queries. This project is either implemented in groups

of 2 students or individually.

The project’s topic varies from year to year, as

does the input ﬁle structure and the required queries.

In this particular year, the project was about songs

and artists, using data extracted from the Million Song

data set

Note that, in this course, the professors and in-

structors allow and even encourage students to in-

teract with GPT. Also, the course’s main project ex-

plicitly asks students to use GPT in some of the re-

quirements. However, students were warned that they

would not be allowed to use GPT during the project’s

defense at the end of the semester. The following sec-

tion describes those requirements.

4.1 The ChatGPT Assignment

To foster a controlled student experimentation of

ChatGPT’s capabilities and its integration into the ed-

ucational framework, the project’s ﬁrst part includes

a speciﬁc ChatGPT exercise.

Students were instructed to employ ChatGPT as

an aid in the reading and parsing a large CSV ﬁle con-

taining information about artists. The ﬁle’s format

is available

. Some of the challenges related to this

ﬁle were handling two different formats for single-

artist songs and multiple-artist songs, handling mul-

tiple artists associated with the same song across dis-

tinct lines, dealing with invalid lines, among others.

There were also technical restrictions: they couldn’t

use hash-based data structures (e.g., HashMap) in the

ﬁrst part.

Students validated their code using an automated

assessment tool (AAT). They could submit multiple

times without penalty and received the execution re-

sults of several unit tests.

To guide their interaction with ChatGPT, this exer-

cise followed the template proposed on Section 3 and

resulted in a log ﬁle detailing all their interactions to

achieve a good solution. Incorporating this log ﬁle

http://millionsongdataset.com/pages/getting-dataset/

https://doi.org/10.5281/zenodo.8430808

CSEDU 2025 - 17th International Conference on Computer Supported Education

200

within the project was optional but carried a weight

of 0.5 points on the evaluation scale ranging from 0 to

20.

This strategic integration of ChatGPT served to

not only motivate students to delve into its usage

under controlled conditions but also to encourage a

structured evaluation process.

5 METHODOLOGY

In order to better understand how the students inter-

acted with ChatGPT to solve the ChatGPT assign-

ment (see Section 4.1), we manually analyzed their

interaction logs, categorizing them in several dimen-

sions. Furthermore, a brief survey was administered

subsequent to the assignment to ascertain the senti-

ments of the students towards the task.

5.1 Categorization

Since the students had been instructed to interact with

ChatGPT using the exercise template described in

Section 3, their logs mostly followed these steps: (1)

ask a question; (2) ask for an alternative solution; and

(3) provide a conclusion. Therefore, we decided to

analyze each step individually: initial prompt, sec-

ond prompt and conclusion. We added a fourth di-

mension called problem, to capture the diversity of

the problems students were asking GPT to help them

for.

We then further categorized each dimension into

the following categorization variables:

• Problem - What was the problem students were

asking GPT help for?

>Abstraction level (high or low) - Some students

asked very concrete and direct questions such as

’How to remove spaces and quotes from a string

in Java?’ while others asked more abstract ques-

tions such as ’Implement a function in Java that

read a .txt ﬁle with the following format (...)’

>Nature (generic or domain-speciﬁc) - If the

stated problem was or wasn’t speciﬁc to the

project. For example ’How to remove spaces and

quotes from a string in Java?’ is a generic prob-

lem, since it can be applied to a variety of prob-

lems and not speciﬁcally to the ’artists ﬁle pars-

ing’ problem

• Initial prompt

>Language - Since our students are native Por-

tuguese speakers, the majority of the interactions

were done in Portuguese, although a small per-

centage of students used English

>Type - This deﬁnes the goal of the prompt

– Ask for code - These prompts usually included

terms like ’implement a function...’ or ’give me

the code...’ which inspired the title of this paper

– Explain how - Ask GPT to explain how they

could solve a certain problem, without explic-

itly asking for code

– Help with error - Students provided an error

they were struggling with (e.g. a compilation

error which they couldn’t understand)

– Analyze code - Students provided code from

their project and asked GPT for errors or possi-

ble improvements in their code.

>Provided context? - Some students copied or

adapted part of the project statement explaining

the rules behind the artists ﬁle, including them

into the initial prompt.

>Provided restrictions - Here we analyzed if

the students provided ChatGPT with technical re-

strictions. In particular, we looked for the words

’java’ and ’hashset/hashmap’ in the prompt. The

ﬁrst was the language the project had to be devel-

oped in and the second was an explicit prohibition

in the project statement (since at this point, they

hadn’t yet learnt these data structures).

>Gave examples? - For prompts of the type ’ask

for code’, we analyzed if the students provided

any examples to guide the model. We considered

examples of input (e.g., lines in the artists ﬁle)

and/or output (e.g., given these parameter values,

the function should return this).

>Function signature - For prompts of the type

’ask for code’, we analyzed if the students pro-

vided the function signature beforehand (i.e., the

name of the function along with its parameters

and return type).

• Second prompt

>Type

– Inexistent - Although the exercise template di-

rected students to ask for alternatives, some stu-

dents provided only the initial prompt

– Give me an alternative - We analyzed if the

students asked for an alternative solution. This

was one of the goals of this exercise.

– Clarify initial prompt - We analyzed if the

second prompt was just a clariﬁcation of the

ﬁrst prompt. This could be an attempt to gen-

eralize the previous solution or provide more

information.

– Ask different question - Some students didn’t

comply with the exercise statement and used

the second prompt to ask for a different ques-

tion

"Give Me the Code": Log Analysis of First-Year CS Students’ Interactions with GPT

201

>Guided alternative? - Here we wanted to ﬁnd

out if, only for the ’give me an alternative’ cases,

the students provided some guidance (e.g., ’Im-

plement again without using replaceAll()’) when

asking for an alternative solution

• Conclusion

>Type

– Decided for one solution - We analyzed if the

students achieved the main goal of the exer-

cise, by choosing one of the alternative solu-

tions given by ChatGPT

– Useful but didn’t use the solution - Some stu-

dents didn’t use any of the solution but have

written they still found the interaction to be use-

ful (e.g., ’We didn’t use any of the solutions.

However we used some of the suggested in-

structions such as replaceAll()’)

– Not useful - Some students didn’t ﬁnd any of

the solutions useful

– Inexistent/invalid conclusion - There are the

cases where the students didn’t write any con-

clusion or the conclusion seems to have no con-

nection with the given prompts.

>Decision criteria - This metric was applied only

to the conclusions of the type ’Decided for one

solution’. All the decisions fell into one of these

categories: simpler/more compact, more ﬂexible/-

generic, and easier to understand.

>Used GPT solution - Here we wanted to ﬁnd

out if the students really used the GPT solution

(complete ou partially) in their projects. We veri-

ﬁed this by manually analyzing the project’s code

for each group, searching for code that is similar

to one of the GPT solutions.

5.1.1 Classiﬁcation Example

As an example, the interaction shown in Listing 2 was

classiﬁed like this:

Problem: low abstraction level; general.

First Prompt: ’ask for code’ type; no context given;

provided a restriction (“must be in Java”); provided

an example but didn’t provide the requested function

signature

Second Prompt: inexistent

Conclusion: useful but didn’t directly use the pro-

vided solution; we conﬁrmed that their project actu-

ally used one of the suggestions made by ChatGPT

regarding the use of replaceAll().

Listing 2: Partial example of log ﬁle supplied by the stu-

dents (group 22), translated from Portuguese.

Hu m a n : How to re m o v e the wh ite sp a c es bet w e e n

th e sym b o l s [, ’, a nd ] in Java ?

1434545625 @ [ ’T he Nati onal ’ ]

---- - - - -

Chat G P T : To re move the whit e space s b e t ween the

symb o l s [, ’, a nd ] i n a st ring in Java , you can

us e reg u l a r e x p r e s s i o n s an d str i n g r e p l a c e m e n t .

Here ’s an ex ample of co d e th at uses re g u l ar

expressions to remove t h ese wh ite sp a ces :

(. . . )

Explanation :

Th e fi r st replaceAll re p l a c e s th e seque n c e ( ...)

------ -

Grou p ’ s Co n c l u s i o n : We didn ’ t use wh a t Cha t G P T

pr o vid e d , bu t we ended up us i n g the fu n c t i o n

th a t it su g g e s t e d : re p l a c eAll ()

5.2 Survey

To complement the log analysis results, in particular

regarding the usefulness of the ChatGPT assignment

(see Section 4.1), we conducted an anonymous ques-

tionnaire a few weeks after the completion of the as-

signment (Cipriano and Alves, 2024a). Of the 154

enrolled students, 52 responded to the questionnaire,

corresponding to a 33.77% participation rate.

The questionnaire was composed of the following

4 questions:

Q1 - In part 1 of the project, you were asked to inter-

act with GPT. Did you? [Yes/No];

Q2 - If you answered “No” to the previous ques-

tion, what is the reason for not interacting with GPT?

[Open-ended];

Q3 - If not for the exercise, would you still have used

GPT? [Yes/No];

Q4 - How useful do you think this exercise was (ask-

ing ChatGPT for help in processing the artists’ ﬁle)?

[Scale:1-5].

6 RESULTS

65 groups (corresponding to 122 students) submitted

a project that passed at least half of the assessment

tests. From those, 37 groups (corresponding to 69

students), submitted a log ﬁle of the respective inter-

action with ChatGPT.

We manually analyzed each of those ﬁles both

quantitatively and qualitatively. A public repository

of these ﬁles is available

https://doi.org/10.5281/zenodo.8430808

CSEDU 2025 - 17th International Conference on Computer Supported Education

202

Table 1: Log analysis results: problem.

problem

abstraction level

high 52.8%

low 47.2%

nature

generic 35.1%

domain 64.9%

6.1 Quantitative

We classiﬁed all the ChatGPT interaction logs follow-

ing the criteria outlined in Section 5, for the 4 steps

involved: problem, ﬁrst prompt, second prompt and

conclusion.

As illustrated by Table 1, the abstraction level of

the problem presented to ChatGPT was evenly dis-

tributed between low and high, with a slight incli-

nation toward the high level (52.8%). The majority

of students posed domain-speciﬁc questions (64.9%)

rather than generic ones. These results were expected,

given the context of using ChatGPT within a speciﬁc

project.

Regarding the initial prompt (see Table 2), most

of the students used their native language (91.7%)

and asked for code (72.2%). Still, a small minority

preferred to ask ChatGPT to explain how they could

solve the problem (19.4%), and an even smaller frac-

tion asked for help identifying errors (8.3%) or ana-

lyzing code (5.6%). Most students didn’t provide any

context (81.8%) but provided restrictions, mainly the

restriction that it had to be in Java (81.3%). Also,

the majority of the students didn’t provide examples

(71.9%) and almost no students provided the signa-

ture of the pretended function (92.9%). Notice that

these last 2 criteria were only applied to interactions

where the students asked for code (which most did).

In reference to the second prompt (see Table 3),

a signiﬁcant number of students didn’t provide it

(27.8%) which was surprising since the students were

explicitly instructed to do so. Still, the majority

asked for an alternative solution on the second prompt

(61.1%) as instructed. Interestingly enough, a few

students opted to use the second prompt to clarify

what they wanted in the initial prompt (11.1%) and

even fewer students just asked a different question

(5.6%). For cases in which students requested an

alternative solution, it is noteworthy that the major-

ity did not provide any speciﬁc guidance for their re-

quest, merely asking for a generic alternative (71.4%).

Only in 28.6% of the cases did the students provide

some guidance by adding more information.

Finally, with respect to the conclusion (see Ta-

ble 4), almost half of the students accepted one of the

solutions provided by ChatGPT (47.2%) and for those

Table 2: Log analysis results: initial prompt.

initial

prompt

language

portuguese 91.7%

english 8.3%

type

ask for code 69.4%

explain how 19.4%

help error 8.3%

analyze code 2.9%

provided

context?

yes, copied 9.1%

yes, adapted 9.1%

no 81.8%

provided

restrictions

must be in java 81.3%

do not use hashset 9.1%

none 21.2%

gave

examples?

no 71.9%

just one 9.4%

several 18.7%

function

signature?

no 92.9%

yes 7.1%

that haven’t, a signiﬁcant portion still found the inter-

action to be useful (25%), with only a small minority

not getting any value from the interaction (11.1%).

For the cases where the students accepted one of the

solutions, the ﬁndings unveil a balanced distribution

of the acceptance criteria. A substantial 35.7% of re-

spondents valued simplicity and compactness as guid-

ing factors. Similarly, an equal 35.7% sought ﬂexibil-

ity and generality. Finally, 28.6% prioritized ease of

understanding. Regardless of what the students wrote

in the conclusion, we found that 72.2% really used

(fully or partially) one of the solutions provided by

ChatGPT.

General Insights. The exercise template encour-

aged a rigid methodology for using ChatGPT: stu-

dents were instructed to use it as assistance in reading

and parsing the artist ﬁle, obtain two alternative im-

plementations, and apply critical thinking to choose

one while justifying their decision. However, how

students utilized it varied considerably. Some chose

to focus on a highly speciﬁc problem (low abstraction

level), as seen in Listing 2, while others posed more

generic problems, with no clear trend towards either

option.

The initial prompting methods employed by stu-

dents also displayed a signiﬁcant degree of diversity.

This was expected as they had no formal training, and

"Give Me the Code": Log Analysis of First-Year CS Students’ Interactions with GPT

203

Table 3: Log analysis results: second prompt.

second

prompt

type

inexistent 27.8%

give me an alternative 61.1%

clarify initial prompt 11.1%

ask different question 5.6%

guided

alternative?

yes

(more information,

restrictions, ...)

28.6%

no,

just ”give me an alternative”

71.4%

Table 4: Log analysis results: conclusion.

conclusion

type

decided for one solution 47.2%

useful but didn’t use the solution

25%

was not useful 11.1%

inexistent/invalid conclusion

13.9%

decision

criteria

simpler/more compact 35.7%

more ﬂexible/generic 35.7%

easier to understand 28.6%

used gpt

solution

yes 44.4%

partially 27.8%

no 27.8%

the project statement did not provide any guidance

in this regard. Nonetheless, the majority of prompts

were aimed at obtaining code, although a substan-

tial portion of students also requested explanations on

how to proceed. Due to their limited training, these

prompts tended to be unsophisticated, lacking con-

text, examples, or the pretended function signature.

Even in terms of restrictions, while most students re-

membered to specify Java, few remembered to indi-

cate that the use of hashsets was not allowed. How-

ever, this didn’t stop most students from getting use-

ful results (RQ1): speciﬁcally, 47.2% of participants

acknowledged acceptance of one of the provided so-

lutions, with an additional 25% expressing utility de-

spite abstaining from utilization.

A considerable number of students (38.9%) did

not adhere to the project’s instructions and failed to

request an alternative implementation. The reasons

for this remain unclear, but we suspect it may be due

to (1) students not carefully reading the project state-

ment, and (2) being an uncommon way to use Chat-

GPT.

For the students who did request an alternative so-

lution, it is interesting to note that a substantial por-

tion (28.6%) attempted to guide the solution by spec-

ifying constraints, despite receiving no indication to

do so.

In the conclusions, it was expected that students

would employ their critical thinking skills to choose

one of the implementations, which occurred in half

of the cases. Even in instances where this did not

occur, a substantial portion of students found the in-

teraction to be valuable. A conﬁrmation of LLM’s

effectiveness of in assisting students is evident, with

only 27,8% not being able to incorporate solutions

provided by ChatGPT into their projects (RQ2).

Also, half of the students managed to obtain two

alternative solutions, with no clear winner among the

decision criteria.

6.2 Qualitative

During the classiﬁcation process, we found some

cases worth discussing individually.

CSEDU 2025 - 17th International Conference on Computer Supported Education

204

6.2.1 The Database Prompt

Group 36 provided the following initial prompt:

I must create a Java database that receives three

ﬁles: songs, artists, and artist details. I need to asso-

ciate songs with artists because there must be some-

thing that relates them, as one artist can have mul-

tiple songs, and multiple songs can belong to other

artists, meaning it is a many-to-many (N:N) relation-

ship. How do I program this?

This prompt mixes concepts from Programming

and Databases, it seems the student wanted to create

a database but it ends with ’How do I program this?’.

ChatGPT answered with a program that interacts with

a relational database, which was not the goal of the

project. What we found interesting in this prompting

is its high abstraction level, for sure the highest on all

of the prompts used by the students. However, this

led to a solution that while correct, could not be used

in the project. Notably, this student was the only one

who had taken the Databases course, which is taught

in the second year, while most of these students were

ﬁrst-year students.

6.2.2 Incomplete First Prompts

Some groups used the second prompt to complement

the ﬁrst prompt, since the results were not satisfac-

tory. This was accounted in the ’clarify ﬁrst prompt’

item, in Table 3.

For example, Group 9 issued the following

prompt: I want to read a line that has this format

”[’Name1’, ’Name2’].” How can I check in Java if

the given String follows this format?

Since ChatGPT’s response was speciﬁc for lines

with two names, the group issued a second prompt

asking for a more general solution: What if I have to

handle more than two names, what do I do?

This second prompt could be prevented if the stu-

dents had asked for a general solution in the ﬁrst

prompt, or provided several examples instead of just

one. This conﬁrms that some students cannot get the

best results out of ChatGPT without formal training

[RQ1].

6.2.3 Arm Wrestling with ChatGPT

Group 32 entered into a kind of “arm wrestling” with

ChatGPT since they were not getting the answer they

needed. They asked ChatGPT to explain what could

be improved in a function they developed and that

was probably not working as expected. ChatGPT sug-

gested that they could remove some code duplication.

However, this wasn’t the cause of the error; in fact,

there was no code duplication at all. After several

prompts from the students, ChatGPT kept insisting

the there was code duplication (possible “hallucina-

tion”) and the students kept rebutting, without suc-

cess. Here is one of the prompts:

so, the issue is on a function that I showed you

after you told me there was code duplication? Are you

sure about this? what about the ﬁrst piece of code i

showed you? (...)

Our analysis of the students’ function revealed no

code duplication issues, indicating that ChatGPT’s

answers were misleading. Interestingly, the students

couldn’t ﬁnd the code duplication nor conceive that

ChatGPT was wrong, so they kept pushing for more

information.

6.2.4 Why Am I Failing the Automated

Assessment Tests?

Group 4’s motivation for asking for help was the fact

that they weren’t passing an automated assessment

test and they were not understanding why. That par-

ticular assessment test exercised edge cases in the re-

quirements which were not taken proper care in their

implementation such as duplicate ids and missing in-

formation. Notice that the source code of the auto-

mated tests wasn’t available to the students.

After describing the artists ﬁle format, the stu-

dents asked ChatGPT for examples of input ﬁles that

would cover all possible scenarios. Although Chat-

GPT provided some examples, they were not exhaus-

tive so it didn’t help them pass the tests. The stu-

dents kept pushing ChatGPT for more scenarios with

no success. At a certain point, they switched their

strategy, explaining to ChatGPT how they had im-

plemented the function and asking for what could be

missing. Still, ChatGPT wasn’t able to help them.

We found interesting that this group opted to im-

plement the function by themselves, abstaining from

assistance provided by ChatGPT. Subsequently, upon

encountering difﬁculties in passing the automated as-

sessment tests, they reached out to ChatGPT, albeit

without success.

6.3 Survey

This section presents the results of the survey con-

ducted after the students completed the exercise

(N=52).

84.6% (44) of the students replied that they inter-

acted with GPT during the project (Q1). The students

that did not interact (15.4%, 8) provided two main

reasons for not doing so (Q2): 6 students replied ”I

didn’t feel the need to”, while 2 students replied ”I

didn’t know how to”.

"Give Me the Code": Log Analysis of First-Year CS Students’ Interactions with GPT

205

Question (Q3) tried to understand if the students

would have used GPT even if there was no require-

ment for doing so. Most students, 69.2% (36) in-

dicated that they would have used GPT anyway, but

30.8% (16) indicated that they would not have used

GPT unless asked to do so. This surprised us, as we

expected more students to indicate they would have

used GPT anyway.

The fourth question (Q4) aimed to assess the per-

ceived utility of the ChatGPT assignment within the

framework of students’ projects. Among the respon-

dents, 5.8% (3) selected Option 1 (‘Useless’), while

21.2% (11) opted for Option 2 (‘Slightly useful’). Op-

tions 3 and 4 were each chosen by 30.8% (16) of par-

ticipants. Additionally, Option 5 (‘Very useful’) was

selected by 11.5% (6) of students. These ﬁndings in-

dicate a prevalent perception of usefulness among the

majority of students, although opinions vary consid-

erably. However, it is noteworthy that a substantial

proportion (27%) regarded the exercise as minimally

or not useful. We hypothesize that this subgroup may

include students who struggled to elicit useful solu-

tions from GPT to apply in their projects.

In conclusion, a vast majority of students (84.6%)

will use GPT in an assignment if directed to do so,

and most students will use it even if it’s not asked

of them, but the percentage is lower (69.2%). Most

students (73%) found the assignment to be useful or

very useful.

7 LESSONS LEARNT

Based on this experiment, we leave some recommen-

dations for teachers wanting to incorporate LLMs in

their classes.

Integrate prompt training into their curriculum,

focusing on providing richer information to LLMs,

such as the problem context, examples, function sig-

natures, etc.

Incorporate designated LLM-based exercises

into students’ projects, encouraging the use of these

tools as aids for tackling complex problems. Cer-

tain students may experience discomfort in utilizing

LLMs, either due to a lack of familiarity with their

operation or apprehension regarding their appropri-

ateness for use. In such cases, they may beneﬁt from

gentle encouragement or guidance to overcome these

barriers.

Use exercise templates that guide students into ap-

proaching LLMs with a critical thinking mindset.

This can be done by asking for multiple solutions and

selecting one of them using criteria such as ﬂexibil-

ity, compactness or ease of understanding. Evaluating

multiple LLM-generated solutions is a skill that we,

as well as other researchers (Treude, 2023; Alves and

Cipriano, 2023), believe will be of great importance

in the future.

8 LIMITATIONS

GPT’s behaviours are not deterministic. Furthermore,

research has shown that they are dynamic, and can

vary greatly over time (Chen et al., 2023). As such, it

is hard to generalize conclusions, since some students

might have had better results than other students, not

due to lack of ‘prompting skill’, but due to the models

themselves.

We expect most of the students to have used Chat-

GPT with GPT model version 3.5, due to it being free.

However, some students might have used the GPT-4

model

. Since we do not have this information, our

results were not controlled for it.

9 CONCLUSIONS

The proposed exercise template mostly achieved its

goal of fostering students’ critical thinking when in-

teracting with LLMs, enabling them to generate more

reﬁned solutions. Our log analysis shows that most

students can effectively use GPT without formal train-

ing. With 72.2% of students incorporating ChatGPT’s

solutions into their projects, we consider the exer-

cise successful. Additionally, our survey (N=52) indi-

cates that 73% of students found the exercise useful,

with some stating they wouldn’t have used GPT other-

wise. However, the limited sophistication in prompt-

ing highlights the need for further training, as students

often required many prompts or failed to achieve sat-

isfactory results.

More research is needed to devise effective strate-

gies for instructing CS students in the proper utiliza-

tion of these tools. It is crucial to convey an awareness

of their limitations and discourage over-reliance, ulti-

mately better preparing for professional life.

ACKNOWLEDGEMENTS

This research has received funding from the European

Union’s DIGITAL-2021-SKILLS-01 Programme un-

der grant agreement no. 101083594.

GPT-4 was released at our location (Europe) in March,

14, 2023, while our course was running.

CSEDU 2025 - 17th International Conference on Computer Supported Education

206

REFERENCES

Alves, P. and Cipriano, B. P. (2023). The centaur

programmer–How Kasparov’s Advanced Chess spans

over to the software development of the future. arXiv

preprint arXiv:2304.11172.

Babe, H. M., Nguyen, S., Zi, Y., Guha, A., Feldman, M. Q.,

and Anderson, C. J. (2023a). Studenteval: A bench-

mark of student-written prompts for large language

models of code. arXiv preprint arXiv:2306.04556.

Babe, H. M., Nguyen, S., Zi, Y., Guha, A., Feldman, M. Q.,

and Anderson, C. J. (2023b). StudentEval: A Bench-

mark of Student-Written Prompts for Large Language

Models of Code. arXiv:2306.04556 [cs].

Chen, L., Zaharia, M., and Zou, J. (2023). How is Chat-

GPT’s behavior changing over time? arXiv preprint

arXiv:2307.09009.

Cipriano, B. P. and Alves, P. (2024a). ”ChatGPT Is Here

to Help, Not to Replace Anybody” - An Evaluation

of Students’ Opinions On Integrating ChatGPT In CS

Courses. arXiv preprint arXiv:2404.17443.

Cipriano, B. P. and Alves, P. (2024b). LLMs Still

Can’t Avoid Instanceof: An investigation Into GPT-

3.5, GPT-4 and Bard’s Capacity to Handle Object-

Oriented Programming Assignments. In Proceedings

of the IEEE/ACM 46th International Conference on

Software Engineering: Software Engineering Educa-

tion and Training (ICSE-SEET).

Denny, P., Kumar, V., and Giacaman, N. (2023). Conversing

with Copilot: Exploring Prompt Engineering for Solv-

ing CS1 Problems Using Natural Language. In Pro-

ceedings of the 54th ACM Technical Symposium on

Computer Science Education V. 1, pages 1136–1142,

Toronto ON Canada. ACM. SIGCSE 2023.

Denny, P., Leinonen, J., Prather, J., Luxton-Reilly, A.,

Amarouche, T., Becker, B. A., and Reeves, B. N.

(2024). Prompt Problems: A New Programming Ex-

ercise for the Generative AI Era. In Proceedings of the

55th ACM Technical Symposium on Computer Science

Education V. 1, pages 296–302.

Destefanis, G., Bartolucci, S., and Ortu, M. (2023). A Pre-

liminary Analysis on the Code Generation Capabili-

ties of GPT-3.5 and Bard AI Models for Java Func-

tions. arXiv preprint arXiv:2305.09402.

Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly,

A., and Prather, J. (2022). The robots are com-

ing: Exploring the Implications of OpenAI Codex

on Introductory Programming. In Proceedings of the

24th Australasian Computing Education Conference,

pages 10–19.

Finnie-Ansley, J., Denny, P., Luxton-Reilly, A., Santos,

E. A., Prather, J., and Becker, B. A. (2023). My ai

wants to know if this will be on the exam: Testing

openai’s codex on cs2 programming exercises. In Pro-

ceedings of the 25th Australasian Computing Educa-

tion Conference, pages 97–104.

Hellas, A., Leinonen, J., Sarsa, S., Koutcheme, C., Ku-

janp

a, L., and Sorva, J. (2023). Exploring the Re-

sponses of Large Language Models to Beginner Pro-

grammers’ Help Requests.

Kazemitabaar, M., Hou, X., Henley, A., Ericson, B. J.,

Weintrop, D., and Grossman, T. (2023). How novices

use llm-based code generators to solve cs1 coding

tasks in a self-paced learning environment.

Lau, S. and Guo, P. (2023). From ”Ban it till we understand

it” to ”Resistance is futile”: How university program-

ming instructors plan to adapt as more students use AI

code generation and explanation tools such as Chat-

GPT and GitHub Copilot.

Leinonen, J., Denny, P., MacNeil, S., Sarsa, S., Bernstein,

S., Kim, J., Tran, A., and Hellas, A. (2023). Compar-

ing Code Explanations Created by Students and Large

Language Models. arXiv preprint arXiv:2304.03938.

Lifﬁton, M., Sheese, B., Savelka, J., and Denny, P.

(2023). CodeHelp: Using Large Language Models

with Guardrails for Scalable Support in Programming

Classes.

Naumova, E. N. (2023). A mistake-ﬁnd exercise: a

teacher’s tool to engage with information innovations,

ChatGPT, and their analogs. Journal of Public Health

Policy, 44(2):173–178.

OpenAI (2023). How can educators respond to students

presenting ai-generated content as their own? https://

shorturl.at/rUDlh. [Online; last accessed 03-October-

2023].

Ouh, E. L., Gan, B. K. S., Shim, K. J., and Wlodkowski,

S. (2023). Chatgpt, can you generate solutions for my

coding exercises? an evaluation on its effectiveness

in an undergraduate java programming course. arXiv

preprint arXiv:2305.13680.

Prasad, S., Greenman, B., Nelson, T., and Krishnamurthi,

S. (2023). Generating Programs Trivially: Student

Use of Large Language Models. In Proceedings of the

ACM Conference on Global Computing Education Vol

1, pages 126–132, Hyderabad India. ACM.

Prather, J., Reeves, B. N., Denny, P., Becker, B. A.,

Leinonen, J., Luxton-Reilly, A., Powell, G., Finnie-

Ansley, J., and Santos, E. A. (2023). “It’s Weird That

it Knows What I Want”: Usability and Interactions

with Copilot for Novice Programmers. ACM Transac-

tions on Computer-Human Interaction, 31(1):1–31.

Reeves, B., Sarsa, S., Prather, J., Denny, P., Becker, B. A.,

Hellas, A., Kimmel, B., Powell, G., and Leinonen, J.

(2023). Evaluating the performance of code genera-

tion models for solving parsons problems with small

prompt variations. In Proceedings of the 2023 Confer-

ence on Innovation and Technology in Computer Sci-

ence Education V. 1, pages 299–305.

Savelka, J., Agarwal, A., An, M., Bogart, C., and Sakr, M.

(2023). Thrilled by Your Progress! Large Language

Models (GPT-4) No Longer Struggle to Pass Assess-

ments in Higher Education Programming Courses.

Sridhar, P., Doyle, A., Agarwal, A., Bogart, C., Savelka, J.,

and Sakr, M. (2023). Harnessing llms in curricular

design: Using gpt-4 to support authoring of learning

objectives.

Treude, C. (2023). Navigating Complexity in Software En-

gineering: A Prototype for Comparing GPT-n Solu-

tions. arXiv:2301.12169 [cs].

Xu, F. F., Alon, U., Neubig, G., and Hellendoorn, V. J.

(2022). A Systematic Evaluation of Large Language

Models of Code. In Proceedings of the 6th ACM

SIGPLAN International Symposium on Machine Pro-

gramming, pages 1–10, San Diego CA USA. ACM.

"Give Me the Code": Log Analysis of First-Year CS Students’ Interactions with GPT

207