Student Classroom Behavior Image Recognition Based on YOLOv7

Siqing Chen

School of Mathematics and Statistics, Hanshan Normal University, Chaozhou, China

Keywords: Intelligent Classroom Management System, Computer Vision, Automatic Recognition, Object Detection

Algorithm.

Abstract: With the rapid development of educational informationization, intelligent classroom management systems

have become the key methods to enhance the teaching quality and efficiency. However, the traditional

classroom management methods mainly depend on teachers’ observation and records, which is not only

inefficient but also susceptible to subjective factors, leading to inaccuracy. In order to overcome these

limitation, this paper uses the computer vision technology to automatically recognize students' classroom

behaviors has become an important research direction in the field of educational technology. Among them,

the You Only Look Once vision7 (YOLOv7) , which famous for its detection speed and accuracy, is well

suited for real-time classroom action recognition and is a leading algorithm in this field. Using YOLOv7,

educators can obtain objective analysis of the classroom conditions and teaching effectiveness to optimizing

instructional strategies and providing personalized learning support. Moreover, the collection and analysis of

student behaviour data do contribute to the school management and education methods optimization,

promoting the development of educational management mode to a more scientific and fine direction.

1 INTRODUCTION

With the rapid development of educational

informatization, intelligent classroom management

systems have become a key method for improving

teaching quality and efficiency. In the past, traditional

classroom management mainly relied on teachers'

observation and manual record-keeping. This method

was not only time-consuming and labor-intensive, but

also prone to errors due to subjective factors.

Teachers might miss important behavioral cues or

misunderstand them, thereby leading to potential

biases in the assessment.

To overcome these limitations and promote the

progress in the field of educational technology,

researchers turned their attention to computer vision

technology. This technology provides a promising

solution for more objective and accurate automatic

identification of students' classroom behaviors. By

leveraging advanced algorithms and machine

learning models, computer vision can analyze real-

time video streams or images to detect and classify

various classroom behaviors.

https://orcid.org/0009-0001-5038-6648

Among the various existing computer vision

algorithms, You Only Look Once version 7

(YOLOv7) stands out as an efficient object detection

algorithm. It combines fast detection speed and high

accuracy, making it particularly suitable for real-time

classroom behavior recognition. YOLOv7 can

process a large amount of data quickly and accurately,

enabling it to keep up with the dynamics of classroom

interactions and provide teachers with timely and

reliable information about students' behaviors.

In conclusion, integrating computer vision

technology, especially YOLOv7, into intelligent

classroom management systems marks an important

step in leveraging advanced technologies to improve

educational outcomes. Through the automatic

identification of classroom behaviors, YOLOv7

enables teachers to focus more on teaching rather than

manual observation, ultimately enhancing the overall

quality and efficiency of the educational process.

2 RESEARCH OBJECTIVE

Utilizing the efficient detection capability of

Chen, S.

Student Classroom Behavior Image Recognition Based on YOLOv7.

DOI: 10.5220/0013703500004670

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 2nd International Conference on Data Science and Engineering (ICDSE 2025), pages 649-653

ISBN: 978-989-758-765-8

649

YOLOv7 (Zhang， 2024), real-time monitoring of

students' behavior in the classroom is conducted to

promptly detect and correct negative behaviors

(Redmon& Farhadi, 2018), such as lack of

concentration and playfulness. By identifying

students' behavior, objective data about classroom

atmosphere and teaching effectiveness can be

provided to teachers, helping them adjust teaching

strategies and improve classroom quality. Analyze

students' behavior patterns, understand their learning

habits and needs, and provide personalized learning

support and guidance for them.

Collect and analyze student behavior data to

provide data support for school management and

educational decision-making, and promote the

scientific and refined management of education.

Explore the application of YOLOv7 in the field of

education, promote the development of computer

vision technology in educational research, and

provide new ideas and methods for research in related

fields.

Classification and Monitoring of Behaviors in

Educational Environments:

Classification and monitoring of behaviors have

become a crucial aspect in the field of educational

technology, and YOLOv7 has emerged as a powerful

tool for achieving this goal. Numerous studies have

utilized the capabilities of YOLOv7 to classify and

detect students' behaviors in classroom settings (Li,

2025). Typically, these studies employ datasets

containing multiple behavior labels to conduct

comprehensive training of the models. The main

focus of these tasks lies in designing a reasonable and

comprehensive labeling system and implementing

effective data augmentation strategies to enhance the

robustness of the models.

Construction of Behavior Recognition Datasets:

A fundamental aspect of any research work in this

field is the construction of high-quality behavior

recognition datasets. These datasets serve as the

cornerstone for training and evaluating models.

Recognizing this, some researchers have embarked

on ambitious projects aimed at collecting behavior

data in diverse classroom environments. These efforts

aim to provide a comprehensive and representative

datasets for the development and improvement of

behavior recognition models.

Real-time Application of YOLOv7 in

Classrooms:

Thanks to the real-time detection capabilities of

YOLOv7, its potential for application in real

classroom environments has attracted widespread

attention. Researchers are currently exploring

methods to integrate YOLOv7 into classroom

management systems, with the ultimate goal of

enhancing teaching effectiveness and student

engagement. Additionally, the real-time feedback

provided by YOLOv7 is invaluable to teachers,

enabling them to adjust teaching strategies in real

time based on students' behavior and engagement

levels (Wang et al., 2022).

3 RESEARCH METHOD

The aim of this experiment is to use the YOLOv7

object detection algorithm(Zhang et al. , 2021) to

achieve image recognition of three typical behaviors

among students in the classroom: "raising hands",

"reading", and "writing". The research content mainly

includes the following aspects:

Dataset construction: Collect and annotate

classroom images containing behaviors such as

"raising hands," "reading," and "writing," and

construct high-quality training and validation

datasets.

Model training: Use YOLOv7 algorithm to train

the dataset, optimize model parameters, and improve

the accuracy and robustness of behavior recognition

(Zhang & Li, 2025).

Performance evaluation: Evaluate the trained

model through a test set and analyze its recognition

performance in classroom scenarios, including

metrics such as accuracy, recall, and real-time

performance.

Application validation: Deploy the model in an

actual classroom environment to verify its feasibility

and practicality in practical applications.

The research objective is to develop an efficient

and accurate student classroom behavior recognition

program, providing technical support for intelligent

classroom management, helping teachers to grasp

students' learning status in real time, optimize

teaching management strategies, and improve

classroom efficiency and learning outcomes.

Created by Chengdu Neusoft College, it contains

5686 images and 45578 tags, covering six behaviors:

raising hands, reading, writing, using mobile phones,

lowering heads, and lying on the table. This

experiment only tests three behaviors: raising hands,

reading, and writing. The dataset covers different

scenarios from kindergarten to university, and was

evaluated using the YOLOv7 algorithm with an

average accuracy of 80.3%, as shown in Figure 1.

This dataset aims to provide a solid foundation for

research on student behavior detection.

Original address: https://github.com/Whiffe

/SCB-dataset?tab=readme -ov-file

ICDSE 2025 - The International Conference on Data Science and Engineering

650

Figure 1: algorithm evaluation diagram for YOLOv7

(Picture credit: Original)

4 RESEARCH RESULT

4.1 Model Structure

The YOLOv7 model used in this experiment is an

efficient single-stage object detection algorithm,

whose core structure includes a backbone network, a

feature pyramid network, and a detection head.

YOLOv7 has undergone multiple optimizations

based on the YOLO series, resulting in higher

detection accuracy and faster inference speed. The

specific structure is as follows:

a. Backbone (Chen et al., 2021): Using

CSPDarknet as the backbone network, multi-level

features are extracted through a cross stage local

connection (CSP) structure to enhance the model's

feature extraction capability.

b. Feature Pyramid Network (Neck) (Chen, 2024):

Using Path Aggregation Network (PANet) and multi-

scale feature fusion techniques, combining shallow

and deep feature information to enhance the model's

detection ability for small and large targets.

c. Head(Zhou, 2024): Based on the anchor

mechanism, the classification branch and regression

branch are used to predict the category and bounding

box position of the target, respectively. At the same

time, a dynamic label allocation strategy is introduced

to optimize the training process.

YOLOv7 also introduces Model Scaling

technology(Tu, 2023), which achieves a balance

between model performance and computational

efficiency by adjusting the depth, width, and

resolution of the network, making it more suitable for

practical application scenarios. This experiment

utilizes the advantage of YOLOv7 to fine tune the

model for classroom behavior recognition tasks, in

order to achieve accurate detection of "raising hands",

"reading", and "writing" behaviors

4.2 Experimental Result

Figure 2: F1-Confidence Curve (Picture credit: Original)

Figure 3: Precision-Confidence Curve (Picture credit:

Original)

Figure 4: Precision-Recall Curve (Picture credit: Original)

Student Classroom Behavior Image Recognition Based on YOLOv7

651

Figure 5: Recall-Confidence Curve (Picture credit: Original)

F1-Confidence Curve: Figure 2 shows the F1 scores

of the model at different confidence thresholds. The

F1 score for 'hand raising' behavior is the highest,

indicating that the model performs the best in

recognizing hand raising behavior.

Precision-Confidence Curve:

Figure 3 shows the accuracy of the model at

different confidence thresholds. At high confidence

(0.902), "all classes" achieved the highest accuracy of

1.00, indicating that the model's predictions were

very accurate at high confidence.

Precision-Recall Curve:

Figure 4 shows the accuracy of the model at

different recall rates. The accuracy of the "hand

raising" behavior is the highest (0.662), indicating

that the model not only has high accuracy in

recognizing hand raising behavior, but also has a

relatively good recall rate.

Recall-Confidence Curve:

Figure 5 shows the recall rate of the model at

different confidence thresholds. The recall rate of 'all

classes' is highest at a medium confidence level

(0.606), indicating that the model can recognize most

behavior instances at this confidence level.

Based on these charts, this article can draw the

following conclusions:

The model performs the best in identifying

"raising hands" behavior, whether in terms of F1

score, accuracy, or recall.

The accuracy of the model is very high under high

confidence, which may mean that the model can make

accurate predictions in very certain situations.

The recall rate decreases with increasing

confidence, which is expected because higher

confidence means that the model will only predict

more certain instances, which may result in missing

some correct behavior instances.

These charts provide a comprehensive perspective

on the performance of the model, helping to

understand its performance at different confidence

thresholds and providing a basis for further

optimizing the model. As shown in Figure 6.

Figure 6: performance chances during model training

(Picture credit: Original)

5 CONCLUSIONS

This experiment utilized the YOLOv7 algorithm to

conduct image recognition for three types of

classroom behaviors: "raising hands", "reading", and

"writing". By constructing and annotating a high-

quality datasets, we trained and optimized the model,

thereby enhancing its accuracy and robustness in

recognition. The experimental results demonstrated

that the model performed best in recognizing the

"raising hands" behavior, and exhibited extremely

high accuracy even at high confidence levels. As the

confidence level increased, the recall rate also

improved.

And, YOLOv7 can effectively be applied to

classroom behavior recognition, providing technical

support for intelligent classroom management

systems and helping teachers understand student

dynamics in real time, thereby improving teaching

efficiency. Future work will focus on further

optimizing the model to enhance its ability to

recognize more behaviors and exploring its

applicability in different teaching environments to

achieve wider application.

These findings not only confirmed the

effectiveness of the YOLOv7 algorithm in the field of

classroom behavior recognition but also provided

solid technical support for intelligent classroom

management systems. By capturing and analyzing

students' behavior data in real time, teachers can more

intuitively understand the classroom atmosphere and

student dynamics, enabling them to adjust teaching

strategies in a timely manner and improving teaching

efficiency. In addition, intelligent classroom

management systems can provide data support for

personalized learning, helping students receive more

precise learning guidance.

Looking ahead, researchers will continue to

optimize the YOLOv7 model to enhance its ability.

At the same time, researchers will explore the

potential of this algorithm in different environments

ICDSE 2025 - The International Conference on Data Science and Engineering

652

to achieve wider application. Researchers believe that

with the continuous progress and improvement of

technology, intelligent classroom management

systems will play an increasingly important role in the

education field, creating a more efficient and

convenient learning environment for teachers and

students. Therefore, conducting in-depth research and

exploration on the YOLOv7 algorithm and its

application in classroom behavior recognition has

important practical significance and broad

application prospects.

REFERENCES

Chen, L., Yang, J., Cao, T., Zheng, Y., Wang, Y., Zhang,

B., Lin, Z., & Li, W. 2025. A Self-distillation Object

Segmentation Method Based on Transformer Feature

Pyramid. Journal of Electronics & Information

Technology.

Chen, Y., Li, J., & Liang, J. 2024. Multi-scale feature cross

fusion for automatic grading network of sketch images.

School of Software Engineering, South China Normal

University.

Li, K. 2024. Research and application of Classroom

Abnormal behavior Detection Algorithm based on

YOLOv7 (Master's thesis, Xi 'an University of

Architecture and Technology). Master's

https://link.cnki.net/doi/10.27393/d.cnki.gxazu.2024.0

00163 doi: 10.27393 /, dc nki. Gxazu. 2024.000163.

Redmon, J., & Farhadi, A. 2018. YOLOv3: An incremental

improvement. arXiv preprint arXiv:1804.02767.

Tu, C. 2023. Research on Lightweight Technology and

Application based on YOLOv5 Target Detection

Network (Master's Thesis, Southwest Jiaotong

University). Master of

https://link.cnki.net/doi/10.27414/d.cnki.gxnju.2023.0

00866.

Wang, C., et al. 2022. YOLOv7: A high-performance

detector for real-time object detection. arXiv preprint

arXiv:2207.08430.

Zhang X. 2024. Research on Multi-rotor UAV Recognition

Algorithm and System based on YOLOv7 Series

(Master's Thesis, Jiangxi University of Science and

Technology). Master of https://link.cnki.n

et/doi/10.27176/d.cnki.gnfyc.2024.000293doi:10.2717

6/d.cnki.gnfyc.2024.000293.

Zhang, Y., & Li, J. 2025. Scale-free network Robust

optimization algorithm against malicious attacks based

on multi-granularity integration. Journal of kunming

university of science and technology (natural science

edition) 1-20. Doi: 10.16112 / j.carol carroll nki. 53-

1223 / n. 2025.01.231.

Zhang, Y., et al. 2021. Classroom behavior recognition

based on deep learning technology. Journal of

Educational Technology.

Zhou, X., Wang, K., Zhou, X., & Han, J. 2024. An electric

vehicle helmet wearing detection algorithm based on

YOLOv10n. Electronic Measurement Technology.

Student Classroom Behavior Image Recognition Based on YOLOv7

653