Design and Implementation of an Intelligent Data Analysis Platform
Combining Large Models and Cloud Computing
Dayi Wang and Pengpeng Liu
Naval Research Institute, Beijing 100161, China
Keywords: Large Model, Cloud Computing, Intelligent Data Analysis Platform, Design & Implementation.
Abstract: The design and implementation of an intelligent data analysis platform combining large models and cloud
computing will effectively solve the problem of large-scale heterogeneous data processing. In this paper, the
platform is designed and implemented, which will improve the efficiency of data processing based on the
distributed architecture of cloud computing, and improve the accuracy of data analysis by combining large
models. In the process of research, this paper gradually builds a complete platform from data collection and
processing, storage, analysis and prediction. The final experimental data shows that the platform has low
latency when processing massive data, and at the same time, the prediction accuracy is 96%, which has high
reliability and scalability. Research has shown that the platform not only addresses the challenges of today's
data growth and complexity, but also provides extremely powerful technical support for tomorrow's intelligent
data analysis.
1 INTRODUCTION
Because of the rapid development of big data, how to
effectively process and analyze massive
heterogeneous data has become a hot research topic.
Some researchers have proposed that these problems
can be solved based on enhanced computing power
and traditional data processing tools, but due to the
large amount of data, traditional methods cannot
adapt to complex application scenarios. Some
researchers have also proposed that data can be
decomposed based on distributed intelligent data for
parallel computing, but this method is not efficient
enough to deal with real-time data and cannot
effectively handle the problem of sudden data surge.
This paper uses the combination of intelligent
algorithms and cloud computing to solve these
problems based on the scalability of cloud computing
and the deep learning ability of large models. This
method can achieve efficient data processing and
accurate analysis, and can adapt to complex data
scenarios to provide truly effective and intelligent
support for urban management and business analysis.
2 RELATED WORKS
2.1 Large Model Theory
Large models are a new technology that has emerged
in the field of data analysis and artificial intelligence
in recent years, which is mainly reflected in its
excellent performance in processing large-scale data
and complex pattern recognition (Ata, Gökce, et al.
2024). Based on a large number of parameters and
hierarchical structures, large models can capture deep
features in the data, so as to play a significant role in
tasks such as image recognition and natural language
processing. The training of large models generally
requires a large dataset and powerful computing
resources (Fei, Jiang, et al. 2024), which is also an
advantage that cloud computing can perform. In the
intelligent data analysis platform, large models can
improve the accuracy of data processing, and based
on deep learning algorithms, potential correlations in
data can be mined to achieve more complex analysis
tasks (Feng and Ji, 2024). For example, in traffic
forecasting, large models can accurately predict
traffic flow changes based on the combination of
historical data and real-time data, and provide
decision support for traffic management.
Wang, D. and Liu, P.
Design and Implementation of an Intelligent Data Analysis Platform Combining Large Models and Cloud Computing.
DOI: 10.5220/0013534300004664
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 1, pages 5-10
ISBN: 978-989-758-763-4
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
5
2.2 Cloud Computing Theory
Cloud computing is a key component of modern
information technology, which emphasizes the
virtualization of computing and storage resources
based on the network and the provision of services to
users (Gao, Qiu, et al. 2024). Cloud computing
features high scalability, flexibility, and high
availability, making it a good choice for large-scale
data analysis. In an intelligent data analysis platform,
cloud computing will provide powerful computing
power, distributed computing, and large-scale data
storage (Guo, Mu, et al. 2024). Based on cloud
computing, intelligent data can dynamically allocate
resources to effectively respond to changes in data
volume and fluctuations in computing tasks. In
addition, it supports parallel computing and multi-
node collaboration, which is very conducive to the
training and inference of large models (Li, Wu, 2024).
For example, in a real-time data analysis of intelligent
data, cloud computing can quickly process real-time
streaming data from multiple large models, and
conduct real-time analysis and response, thereby
improving the response speed and processing
efficiency of intelligent data.
3 METHODS
3.1 Description of the Intelligent Data
Analysis Process
The intelligent data analysis platform has as many as
6 main modules, each of which is responsible for a
specific task, so that the data can be processed and
analyzed efficiently. The data acquisition module is
responsible for obtaining raw data from a variety of
sources. It supports real-time and batch data streams,
based on API interfaces, cloud computing servers,
and database connections to obtain data (Li, Ma, et al.
2024). In addition, it can also realize data cleaning
and format conversion to ensure that the input data
can meet the analysis needs. The data storage module
is responsible for managing and storing all the data on
the platform. It can be beneficial for both structured
and unstructured data storage, such as databases,
distributed file intelligence data, etc. In addition, it
also has a data backup and recovery mechanism,
which can effectively ensure the security and
durability of data. The data processing module is
responsible for preprocessing and formatting the
collected data. It provides parallel computing and
distributed processing capabilities to address the
analysis needs of large-scale datasets. The model
training module is responsible for using the data from
the platform (Su and Yang, 2024) to complete the
training of the machine learning model. It enables
distributed training, automatic hyperparameter
tuning, and flexible switching of model types based
on task requirements. In addition, the module
provides the evaluation function of training results,
which makes the model highly stable, valid, and
correct. The model prediction module is responsible
for making real-time predictions on new input data. It
can use the trained model in a production
environment and complete the steps to output the
predictions. The module supports high-volume
concurrent forecasting and automatically adjusts
compute resources to respond to potentially
fluctuating forecasted demand (Yang, 2024). The
Visualization & Reporting module is responsible for
presenting the results of the analysis and generating
reports. The module provides a variety of charts and
dashboards to visually display the operation of the
model and its prediction results. The module also
supports automatic report generation and export
functions for decision support.
3.2 The Process of Cloud Computing
In this paper, we apply large model and cloud
computing to process complex datasets in intelligent
data analysis. The model can represent the
intelligence of data as a graph structure and cluster
based on the spectral decomposition of the graph,
which is suitable for various types of intelligent
analysis tasks, such as classification,
recommendation, and cluster analysis. In intelligent
data analysis, a similarity matrix should be
constructed based on the similarity between the
intelligences of the data. Whether the intelligence of
the data refers to user behavior and device cloud
server data, or different categories of documents, their
similarity can reflect the correlation and connection
between the data and the data. For the calculation of
similarity, see Eq. (1).
𝑊

=𝑒𝑥𝑝
|𝑥
−𝑥
|
2
2𝜎
2
(1
)
In this formula,𝑊

the elements representing the
first row and𝑖 column in the matrix 𝑗refer to the
intelligence of the two data, that is,𝑥
the similarity of
the and𝑥
. High similarity means that the intelligence
of the two data has similar characteristics, and low
similarity means that the intelligence of the two data
INCOFT 2025 - International Conference on Futuristic Technology
6
is very different. For example, in data analysis, if the
similarity between the intelligence of two data is high,
it means that they have the same trend and
performance in some specific characteristics. 𝑥
𝑥
Represents the intelligence of the data𝑥
and 𝑥
the
distance between them, commonly used is the
Euclidean distance. The larger the distance,
indicating that they are significantly different; The
smaller distances indicate that they have similar
characteristics. 𝜎A parameter representing the
Gaussian kernel that controls the decay rate of the
similarity value. If it 𝜎is relatively large, it means that
even if the distance is large, its similarity will not
decay too quickly; If the 𝜎comparison is small, the
similarity will decay rapidly. For example, if you
analyze different categories of documents, the larger
ones 𝜎will make the smart data more tolerant of small
differences and identify larger groups. The
construction of Laplace matrix based on similarity
matrix can reveal the graph structure relationship
between data intelligence and data intelligence, and is
conducive to subsequent clustering operations. The
formula for calculating the Laplace matrix is given in
Eq. (2).
𝐿 = 𝐷 𝑊
(2
)
In this equation, 𝐿the Laplace matrix is
depresented, which can be calculated based on the
degree matrix𝐷 and the similarity matrix𝑊. This
value mainly reflects the structural characteristics of
the graph, and is used to describe the relationship
between the intelligence of the data, so as to provide
a solid basis for the decomposition of the eigenvalues
in the future. For example, in a certain intelligent
monitoring intelligence data, the Laplace matrix can
help researchers identify which data intelligence,
such as which device signals, belong to similar
activity patterns. Based on the spectral decomposition
of the Laplace matrix, the𝑘 eigenvectors
corresponding to the minimum eigenvalues will be
obtained to form the eigenmatrix
(
𝑈
)
. Subsequently,
the rows in the feature matrix can be clustered k-
means. See Eq. (3) for this.
Cluster(𝑋)=𝑘-means(𝑈)
(3
)
In this formula, 𝑘the number of clusters is
described, that is, the number of clusters to be divided
into different clusters for the intelligence of the data.
This value can be automatically selected based on the
nature of the data. For example, in the intelligent data
analysis platform, it may be necessary to divide the
cloud computing server data into several categories,
based on which different working states and anomaly
detection can be represented.
(
𝑈
)
The representative
feature matrix is obtained from spectral
decomposition, containing the previous 𝑘feature
vector, which is mainly used to map high-
dimensional data to a new, low-dimensional space to
complete clustering. For example, in a financial risk
analysis intelligence, a feature matrix identifies the
behavior patterns of different users and divides them
into groups based on risk.
3.3 Model Training and Optimization
In the cloud computing environment, the training
process of the model is mainly to construct the phase
degree matrix, calculate the Laplace matrix, and
perform spectral decomposition. Based on parallel
computing, these calculations can be greatly
accelerated, allowing intelligent data to process
massive amounts of data. For example, in a
monitoring intelligent data, the intelligent data can
process the signals of multiple devices in parallel to
quickly identify the working status of the equipment.
During the optimization of the model, the number
of clusters needs to be determined automatically𝑘,
which requires the application of the maximum
spectral gap method. This method is based on
observing the eigenvalues of the Laplace matrix to
find the natural cluster division points, without
manual setting. Eigenvalue gaps represent natural
groupings in the data based on the differences
between the observed eigenvalues. For example,
when analyzing the data in a logistics intelligent data,
the number of clusters is automatically selected, so
that the intelligent data can divide the freight volume
into different intervals, which is convenient for
optimizing scheduling.
In the clustering process, the final impact of noise
can be suppressed based on regularization to avoid
the interference of clustering results. At the same
time, the robustness of the model can be improved.
See Eq. (4) for this.
𝑚𝑖𝑛
T
r
(𝐻
𝐿𝐻) + 𝛼|𝐻|

2
(4
)
In this formula, 𝛼it can be used to control the
strength of the regularization term, which determines
the sensitivity of the model to noise. If the value is
large, then the noise can be suppressed, and if the
value is small, then the model is allowed to cluster
more closely.
Design and Implementation of an Intelligent Data Analysis Platform Combining Large Models and Cloud Computing
7
4 RESULTS AND DISCUSSION
4.1 Improvement Cases of Cloud
Computing and Large Model
Integration
The large-scale smart city project in the eastern
province introduces the intelligent data analysis
platform that combines the large model and cloud
computing designed this time, aiming to solve the
increasingly severe traffic congestion problem and
improve the operational efficiency of urban traffic.
The processed data are unstructured data, qualitative
data and quantitative data, and there is no correlation
between the data, and the data are all mapped and
processed, and the data correlation is strong. In order
to simplify the complexity of data processing,
improve the feasibility of data, take the traffic flow
and the travel demand of citizens as the research field,
because there is also a sharp increase in the problem
in this field, the city manager now needs to use the
intelligent data, based on the data analysis platform to
do accurate traffic flow prediction, traffic accident
warning, road condition monitoring and signal
optimization, as shown in Table 1.
Table 1: Statistics of core data collected on a daily basis
data t
yp
e Data volume
(
TB
)
Traffic monitorin
g
180
Road condition sensing 130
Meteorological data 90
Vehicle dynamics 100
Citizen feedbac
k
40
Emer
g
encies 60
environmental monitoring 50
Table I shows the amount of data collected by
Smart Data from multiple domains on a daily basis,
reflecting that traffic monitoring and road condition
sensing are the main data sources. The intelligent data
continuously obtains data from tens of thousands of
cloud computing servers, surveillance cameras, and
citizen feedback channels every day, covering real-
time traffic flow, traffic accidents, and emergencies.
With more than 500 terabytes of data streaming per
day, this intelligent data needs to be efficiently stored,
distributed, and analyzed in real time using cloud
computing technology. The platform processes more
than 500 terabytes of data per day, covering a number
of key areas, mainly traffic monitoring, road
condition sensing, weather data, and vehicle
dynamics, as shown in Figure 1.
As can be seen from Figure 1, intelligent data is
the comprehensive result of large models and cloud
computing, and is at the top of data processing.
Among them, traffic monitoring data accounts for the
largest proportion, reaching 180 terabytes per day,
which indicates that traffic flow is the core content of
the platform's key monitoring. Based on the real-time
processing of this data, intelligent data can accurately
predict future traffic conditions and propose
signalable optimization schemes to improve road
efficiency.
Underlying data
Integrated Platform
Cloud Computing
Large model
Smart Data
Figure 1: Intelligent data processing process of large model
and cloud computing
4.2 The Degree of Optimization of Key
Indicators in the Platform
The core of intelligent data correlation is to ensure
that all modules can work together efficiently with
each other. Using standardized interfaces and APIs,
there will be a seamless data flow between each
module, and then all aspects of the entire intelligent
data will be closed. Intelligent data will use
middleware to manage the communication between
modules and ensure that information can be delivered
in a timely and accurate manner for specific metric
optimization, as shown in Table 2.
Table 2: Key performance indicators after intelligent data
association
Performance metrics Numerical
optimization quantit
Avera
g
e res
p
onse time
(
ms
)
350
Data
p
rocessin
g
s
p
eed
(
TB/s
)
1.8
Prediction Accurac
y
(
%
)
96
Anomaly Detection Accuracy
(
%
)
93
Equipment failure early
warning rate (%)
90
Traffic Signal Optimization
Improvement Rate (%)
88
INCOFT 2025 - International Conference on Futuristic Technology
8
Table II illustrates the key performance of smart
data, from response time to prediction accuracy and
other key performance indicators for efficient
operation. From the data reflection, the intelligent
data has a strong data processing ability, which can
use the speed of 1.8TB/s to process large-scale data
in real time, and the prediction accuracy of the
intelligent data is 96%, indicating that its application
in traffic prediction and traffic management is very
significant. Moreover, the anomaly detection
accuracy rate of intelligent data is 93%, and the early
warning rate of equipment failure is 90%, which
means that it can effectively identify traffic
anomalies, such as accidents, congestion, and sudden
road closures. At the same time, it also shows that the
intelligent data can show high reliability in equipment
monitoring and maintenance management, and timely
detect and prevent potential equipment failures to
ensure the continuous and stable operation of the
intelligent data, as shown in Figure 2.
Figure 2: Levels of intelligent data analysis
As shown in Figure 2, the support of the cloud
computing platform will make data analysis very
intelligent and flexible to adapt to the needs of
concurrent data processing, and further enable the
intelligent data to still operate intelligently when
processing massive amounts of data. In addition,
intelligent data association also attaches great
importance to security management, such as
encrypted transmission, access control, and logging,
which can ensure the reliability and data security of
intelligent data (Zhao, Wu, et al. 2024). In addition,
with the automatic scaling function of the cloud,
intelligent data can flexibly allocate computing
resources according to changes in business needs, so
as to achieve high availability and dynamic
scalability.
4.3 Statistics on the Effect of Intelligent
Data Processing in Different
Months
In a predictive analytics intelligence data, a larger
regularization coefficient can make the intelligence
more stable when it finds anomalous data. The
parallel computing capability based on the cloud
computing platform can accelerate the process of
matrix factorization and clustering, especially the
process of similarity matrix and eigenvalue
calculation. For example, in a smart city management
intelligent data, the intelligent data can process a large
amount of cloud computing server data based on
parallelization to make real-time judgments and
optimize resource allocation, and the specific
statistical results are shown in Table 3.
Table 3: Statistics on the number of monthly processing
events of intelligent data
Month Data processing capacity
(
TB
)
Januar
y
1200
Februar
y
1350
March 1400
A
p
ril 1550
Ma
y
1600
Table 3 shows the amount of data processed by
the smart data in different months, showing that the
data processing demand for the smart data continues
to grow over time. From the perspective of monthly
data processing, it can be seen that the amount of data
processed by this intelligent data has increased over
time. The amount of data processed in January was
1,200 TB, and in May it has increased to 1,600 TB, as
shown in Figure 3.
Figure 3: Intelligent scoring and statistical ratio
As can be seen from Figure 3, with the access of
more cloud computing servers and large models, the
load of intelligent data gradually increases, and the
Design and Implementation of an Intelligent Data Analysis Platform Combining Large Models and Cloud Computing
9
platform has good scalability to adapt to the growing
demand for data processing. On the whole, the
intelligent data analysis platform has excellent
performance in various aspects such as traffic flow
prediction and anomaly detection, and intelligent data
scalability for equipment failure warning. The
platform can provide city managers with intelligent
traffic management and optimization solutions, and
has high flexibility and scalability to cope with future
data growth.
5 CONCLUSIONS
This paper designs and implements an intelligent data
analysis platform that combines large models and
cloud computing, and demonstrates its strong ability
to process large-scale, multi-source heterogeneous
data. Based on the combination of cloud computing
technology and large models, the platform has
intelligent parallel processing, distributed computing
capabilities, and can ensure fast response and stability
when dealing with massive data. In addition, the
application of large models in the platform can
improve the accuracy and prediction ability of data
analysis, and then cope with complex data
environments. In short, in this paper, the elastic
expansion capability provided by cloud computing
can ensure the continuous and dynamic expansion of
the platform, and provide a strong technical guarantee
for future intelligent data analysis. At the same time,
it lays the foundation for the future intelligent
development. Although this article has been
improved in many aspects, there will still be errors
and omissions, and I hope that the data analysis part
can be expanded in the future. There are some
limitations in this study, mainly because the
application time of large models is short, and more
practical cases are needed to support it, and related
research will be focused on in the future.
REFERENCES
Ata, Y., Gökce, M. C., & Baykal, Y. (2024). Intelligent
Reflecting Surface Aided Vehicular Optical Wireless
Communication Systems Using Higher-Order Mode in
Underwater Channel. Ieee Transactions on Vehicular
Technology, 73(8), 11196-11208.
Fei, J., Jiang, X., Yang, H. W., Fan, K., Che, Y. M., Sun,
B., et al. (2024). Research and Development of a Big
Data Application Platform for Intelligent Blast Furnace
Intensive Management and Control. Acs Omega, 9(23),
24674-24684.
Feng, L. Y., & Ji, Y. F. (2024). LEARNERS BEHAVIOUR
PREDICTION AND ANALYSIS MODEL FOR
SMART LEARNING PLATFORM USING DEEP
LEARNING APPROACH. Scalable Computing-
Practice and Experience, 25(5), 3876-3885.
Gao, H. H., Qiu, B. Y., Wang, Y., Yu, S., Xu, Y. S., &
Wang, X. H. (2024). TBDB: Token Bucket-Based
Dynamic Batching for Resource Scheduling Supporting
Neural Network Inference in Intelligent Consumer
Electronics. Ieee Transactions on Consumer
Electronics, 70(1), 1134-1144.
Guo, Q. Z., Mu, L., & Lou, S. (2024). Revolutionizing
travel experiences: An in-depth analysis of intelligent
booking systems and behavioral patterns. Intelligent
Decision Technologies-Netherlands, 18(2), 1477-1494.
Li, C. P., Wu, L. H., Shu, C., Bao, Y. M., Ma, J. C., & Song,
S. H. (2024). Data-driven public health security.
Chinese Science Bulletin-Chinese, 69(9), 1156-1163.
Li, M. G., Ma, M., Wang, L., Pei, Z., Ren, J., & Yang, B.
(2024). Multiagent Deep Reinforcement Learning
Based Incentive Mechanism for Mobile Crowdsensing
in Intelligent Transportation Systems. Ieee Systems
Journal, 18(1), 527-538.
Su, Y. J., & Yang, S. Y. (2024). A User-friendly Cloud-
based Multi-agent Information System for Smart
Energy-saving. Journal of Internet Technology, 25(2),
293-300.
Yang, C. (2024). The development and application of an
intelligent detection and evaluation system for drilling
fluid. Journal of Thermal Analysis and Calorimetry,
149(8), 3415-3425.
Zhao, X. Y., Wu, Z. Q., Liu, Y. L., Zhang, H. L., Hu, Y. R.,
Yuan, D., et al. (2024). Eyecare-cloud: an innovative
electronic medical record cloud platform for pediatric
research and clinical care. Epma Journal, 15(3), 501-
510.
INCOFT 2025 - International Conference on Futuristic Technology
10