
study. This choice is supported by two key factors:
first, as a small city, it lacks standardized crime data,
making it an ideal candidate for testing alternative
data collection methods. Second, its significant his-
torical context - having once been ranked as Brazil’s
sixth most violent city (Mendes, 2024), before achiev-
ing its lowest homicide rates in recent history - pro-
vides a rich background for analysis. Although our
study centers on Alvorada, the framework we have
developed can be applied to any city lacking standard-
ized crime data. Our approach encompasses the fol-
lowing steps:
1. Data Collection: Web scraping of crime-related
news from two local news platforms (G1
1
and O
Alvoradense
2
) to create a comprehensive crime
incident dataset.
2. AI-Powered Data Analysis: Implementation of
advanced AI techniques to extract and process rel-
evant information from the collected news arti-
cles, converting unstructured text data into a struc-
tured dataset with standardized crime categories
and details.
3. Geographical Mapping and Analysis: Generation
of crime heat maps and spatial analysis tools to
identify potential hotspots and patterns across the
city, providing visual insights into crime distribu-
tion.
4. Predictive Modeling: Development and applica-
tion of machine learning models to analyze crime
patterns and generate probability-based predic-
tions, offering insights into potential future crime
trends and risk areas.
Through the application of this framework, we ad-
dress two key challenges: the lack of standardized
crime data at the municipal level and the potential in-
tegration of web scraping and machine learning tools
into smart city safety planning. While our approach
provides innovative solutions for data collection and
analysis, it is designed to augment rather than re-
place human analysts, serving as a complementary
tool in their decision-making process. Our method-
ology focuses specifically on crime pattern analysis
and prediction, acknowledging that the broader so-
cioeconomic causes of criminal behavior lie beyond
the scope of this research.
This paper is organized as follows: Section 2
presents related work in crime prediction and pattern
analysis; Section 3 discusses the state of the art in
web scraping and AI applications for crime analysis;
Section 4 details our methodology and experimental
1
https://g1.globo.com/
2
https://oalvoradense.com.br/
setup, including the implementation of web scraping
and machine learning components; Section 5 presents
our results and analysis of the crime dataset; and fi-
nally, Section 6 concludes the paper and discusses fu-
ture work.
2 RELATED WORK
A comprehensive scientometric analysis of crime pre-
diction and pattern analysis (CPPA) research over the
past decade reveals the growing integration of artifi-
cial intelligence in this field. According to the anal-
ysis, researchers have developed five distinct clus-
ters of AI methodologies applied to crime prediction,
demonstrating the field’s rapid evolution and diversi-
fication. The study highlights how the combination
of AI with traditional criminological approaches has
enabled more sophisticated and accurate crime pre-
diction models (Kaur and Saini, 2024).
The analysis particularly emphasizes the emer-
gence of several key research trends: the increasing
use of machine learning algorithms for pattern recog-
nition in criminal behavior, the development of pre-
dictive modeling techniques, and the growing impor-
tance of data preprocessing and feature selection in
crime analysis. These findings align with our ap-
proach of combining web scraping with AI analysis,
particularly in addressing the challenges of data col-
lection and standardization in small cities.
What makes this analysis particularly relevant to
our work is its identification of the growing trend to-
ward integrating diverse data sources and AI method-
ologies in crime prediction. While many studies focus
on large urban centers with established data collection
systems, our work extends these principles to small
cities where traditional data sources may be limited
or unavailable. This adaptation of advanced AI tech-
niques to local contexts represents an important evo-
lution in the field of crime prediction and analysis.
3 STATE OF THE ART
Web scraping typically involves three basic steps -
fetching, extracting, and transforming data - but it is
often treated as a peripheral tool rather than a core
methodology in research (NR et al., 2023). Our
study takes a more comprehensive approach, partic-
ularly given the sensitive nature of crime data col-
lection and analysis. We propose an enhanced web
scraping framework (illustrated in Figure 1) that em-
phasizes careful supervision and methodological rigor
throughout the data collection process.
Mapping and Predicting Crimes in Small Cities Using Web Scraping and Machine Learning
889