Design and Implementation of an Intelligent Data Analysis Platform

Combining Large Models and Cloud Computing

Dayi Wang and Pengpeng Liu

Naval Research Institute, Beijing 100161, China

Keywords: Large Model, Cloud Computing, Intelligent Data Analysis Platform, Design & Implementation.

Abstract: The design and implementation of an intelligent data analysis platform combining large models and cloud

computing will effectively solve the problem of large-scale heterogeneous data processing. In this paper, the

platform is designed and implemented, which will improve the efficiency of data processing based on the

distributed architecture of cloud computing, and improve the accuracy of data analysis by combining large

models. In the process of research, this paper gradually builds a complete platform from data collection and

processing, storage, analysis and prediction. The final experimental data shows that the platform has low

latency when processing massive data, and at the same time, the prediction accuracy is 96%, which has high

reliability and scalability. Research has shown that the platform not only addresses the challenges of today's

data growth and complexity, but also provides extremely powerful technical support for tomorrow's intelligent

data analysis.

1 INTRODUCTION

Because of the rapid development of big data, how to

effectively process and analyze massive

heterogeneous data has become a hot research topic.

Some researchers have proposed that these problems

can be solved based on enhanced computing power

and traditional data processing tools, but due to the

large amount of data, traditional methods cannot

adapt to complex application scenarios. Some

researchers have also proposed that data can be

decomposed based on distributed intelligent data for

parallel computing, but this method is not efficient

enough to deal with real-time data and cannot

effectively handle the problem of sudden data surge.

This paper uses the combination of intelligent

algorithms and cloud computing to solve these

problems based on the scalability of cloud computing

and the deep learning ability of large models. This

method can achieve efficient data processing and

accurate analysis, and can adapt to complex data

scenarios to provide truly effective and intelligent

support for urban management and business analysis.

2 RELATED WORKS

2.1 Large Model Theory

Large models are a new technology that has emerged

in the field of data analysis and artificial intelligence

in recent years, which is mainly reflected in its

excellent performance in processing large-scale data

and complex pattern recognition (Ata, Gökce, et al.

2024). Based on a large number of parameters and

hierarchical structures, large models can capture deep

features in the data, so as to play a significant role in

tasks such as image recognition and natural language

processing. The training of large models generally

requires a large dataset and powerful computing

resources (Fei, Jiang, et al. 2024), which is also an

advantage that cloud computing can perform. In the

intelligent data analysis platform, large models can

improve the accuracy of data processing, and based

on deep learning algorithms, potential correlations in

data can be mined to achieve more complex analysis

tasks (Feng and Ji, 2024). For example, in traffic

forecasting, large models can accurately predict

traffic flow changes based on the combination of

historical data and real-time data, and provide

decision support for traffic management.

Wang, D. and Liu, P.

Design and Implementation of an Intelligent Data Analysis Platform Combining Large Models and Cloud Computing.

DOI: 10.5220/0013534300004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 1, pages 5-10

ISBN: 978-989-758-763-4

2.2 Cloud Computing Theory

Cloud computing is a key component of modern

information technology, which emphasizes the

virtualization of computing and storage resources

based on the network and the provision of services to

users (Gao, Qiu, et al. 2024). Cloud computing

features high scalability, flexibility, and high

availability, making it a good choice for large-scale

data analysis. In an intelligent data analysis platform,

cloud computing will provide powerful computing

power, distributed computing, and large-scale data

storage (Guo, Mu, et al. 2024). Based on cloud

computing, intelligent data can dynamically allocate

resources to effectively respond to changes in data

volume and fluctuations in computing tasks. In

addition, it supports parallel computing and multi-

node collaboration, which is very conducive to the

training and inference of large models (Li, Wu, 2024).

For example, in a real-time data analysis of intelligent

data, cloud computing can quickly process real-time

streaming data from multiple large models, and

conduct real-time analysis and response, thereby

improving the response speed and processing

efficiency of intelligent data.

3 METHODS

3.1 Description of the Intelligent Data

Analysis Process

The intelligent data analysis platform has as many as

6 main modules, each of which is responsible for a

specific task, so that the data can be processed and

analyzed efficiently. The data acquisition module is

responsible for obtaining raw data from a variety of

sources. It supports real-time and batch data streams,

based on API interfaces, cloud computing servers,

and database connections to obtain data (Li, Ma, et al.

2024). In addition, it can also realize data cleaning

and format conversion to ensure that the input data

can meet the analysis needs. The data storage module

is responsible for managing and storing all the data on

the platform. It can be beneficial for both structured

and unstructured data storage, such as databases,

distributed file intelligence data, etc. In addition, it

also has a data backup and recovery mechanism,

which can effectively ensure the security and

durability of data. The data processing module is

responsible for preprocessing and formatting the

collected data. It provides parallel computing and

distributed processing capabilities to address the

analysis needs of large-scale datasets. The model

training module is responsible for using the data from

the platform (Su and Yang, 2024) to complete the

training of the machine learning model. It enables

distributed training, automatic hyperparameter

tuning, and flexible switching of model types based

on task requirements. In addition, the module

provides the evaluation function of training results,

which makes the model highly stable, valid, and

correct. The model prediction module is responsible

for making real-time predictions on new input data. It

can use the trained model in a production

environment and complete the steps to output the

predictions. The module supports high-volume

concurrent forecasting and automatically adjusts

compute resources to respond to potentially

fluctuating forecasted demand (Yang, 2024). The

Visualization & Reporting module is responsible for

presenting the results of the analysis and generating

reports. The module provides a variety of charts and

dashboards to visually display the operation of the

model and its prediction results. The module also

supports automatic report generation and export

functions for decision support.

3.2 The Process of Cloud Computing

In this paper, we apply large model and cloud

computing to process complex datasets in intelligent

data analysis. The model can represent the

intelligence of data as a graph structure and cluster

based on the spectral decomposition of the graph,

which is suitable for various types of intelligent

analysis tasks, such as classification,

recommendation, and cluster analysis. In intelligent

data analysis, a similarity matrix should be

constructed based on the similarity between the

intelligences of the data. Whether the intelligence of

the data refers to user behavior and device cloud

server data, or different categories of documents, their

similarity can reflect the correlation and connection

between the data and the data. For the calculation of

similarity, see Eq. (1).

𝑊



=𝑒𝑥𝑝−

|𝑥



−𝑥



2𝜎



)

In this formula,𝑊



the elements representing the

first row and𝑖 column in the matrix 𝑗refer to the

intelligence of the two data, that is,𝑥



the similarity of

the and𝑥



. High similarity means that the intelligence

of the two data has similar characteristics, and low

similarity means that the intelligence of the two data

INCOFT 2025 - International Conference on Futuristic Technology

is very different. For example, in data analysis, if the

similarity between the intelligence of two data is high,

it means that they have the same trend and

performance in some specific characteristics. 𝑥



−

𝑥



Represents the intelligence of the data𝑥



and 𝑥



the

distance between them, commonly used is the

Euclidean distance. The larger the distance,

indicating that they are significantly different; The

smaller distances indicate that they have similar

characteristics. 𝜎A parameter representing the

Gaussian kernel that controls the decay rate of the

similarity value. If it 𝜎is relatively large, it means that

even if the distance is large, its similarity will not

decay too quickly; If the 𝜎comparison is small, the

similarity will decay rapidly. For example, if you

analyze different categories of documents, the larger

ones 𝜎will make the smart data more tolerant of small

differences and identify larger groups. The

construction of Laplace matrix based on similarity

matrix can reveal the graph structure relationship

between data intelligence and data intelligence, and is

conducive to subsequent clustering operations. The

formula for calculating the Laplace matrix is given in

Eq. (2).

𝐿 = 𝐷 − 𝑊

)

In this equation, 𝐿the Laplace matrix is

depresented, which can be calculated based on the

degree matrix𝐷 and the similarity matrix𝑊. This

value mainly reflects the structural characteristics of

the graph, and is used to describe the relationship

between the intelligence of the data, so as to provide

a solid basis for the decomposition of the eigenvalues

in the future. For example, in a certain intelligent

monitoring intelligence data, the Laplace matrix can

help researchers identify which data intelligence,

such as which device signals, belong to similar

activity patterns. Based on the spectral decomposition

of the Laplace matrix, the𝑘 eigenvectors

corresponding to the minimum eigenvalues will be

obtained to form the eigenmatrix

(

𝑈

)

. Subsequently,

the rows in the feature matrix can be clustered k-

means. See Eq. (3) for this.

Cluster(𝑋)=𝑘-means(𝑈)

)

In this formula, 𝑘the number of clusters is

described, that is, the number of clusters to be divided

into different clusters for the intelligence of the data.

This value can be automatically selected based on the

nature of the data. For example, in the intelligent data

analysis platform, it may be necessary to divide the

cloud computing server data into several categories,

based on which different working states and anomaly

detection can be represented.

(

𝑈

)

The representative

feature matrix is obtained from spectral

decomposition, containing the previous 𝑘feature

vector, which is mainly used to map high-

dimensional data to a new, low-dimensional space to

complete clustering. For example, in a financial risk

analysis intelligence, a feature matrix identifies the

behavior patterns of different users and divides them

into groups based on risk.

3.3 Model Training and Optimization

In the cloud computing environment, the training

process of the model is mainly to construct the phase

degree matrix, calculate the Laplace matrix, and

perform spectral decomposition. Based on parallel

computing, these calculations can be greatly

accelerated, allowing intelligent data to process

massive amounts of data. For example, in a

monitoring intelligent data, the intelligent data can

process the signals of multiple devices in parallel to

quickly identify the working status of the equipment.

During the optimization of the model, the number

of clusters needs to be determined automatically𝑘,

which requires the application of the maximum

spectral gap method. This method is based on

observing the eigenvalues of the Laplace matrix to

find the natural cluster division points, without

manual setting. Eigenvalue gaps represent natural

groupings in the data based on the differences

between the observed eigenvalues. For example,

when analyzing the data in a logistics intelligent data,

the number of clusters is automatically selected, so

that the intelligent data can divide the freight volume

into different intervals, which is convenient for

optimizing scheduling.

In the clustering process, the final impact of noise

can be suppressed based on regularization to avoid

the interference of clustering results. At the same

time, the robustness of the model can be improved.

See Eq. (4) for this.

𝑚𝑖𝑛



(𝐻



𝐿𝐻) + 𝛼|𝐻|



)

In this formula, 𝛼it can be used to control the

strength of the regularization term, which determines

the sensitivity of the model to noise. If the value is

large, then the noise can be suppressed, and if the

value is small, then the model is allowed to cluster

more closely.

Design and Implementation of an Intelligent Data Analysis Platform Combining Large Models and Cloud Computing

4 RESULTS AND DISCUSSION

4.1 Improvement Cases of Cloud

Computing and Large Model

Integration

The large-scale smart city project in the eastern

province introduces the intelligent data analysis

platform that combines the large model and cloud

computing designed this time, aiming to solve the

increasingly severe traffic congestion problem and

improve the operational efficiency of urban traffic.

The processed data are unstructured data, qualitative

data and quantitative data, and there is no correlation

between the data, and the data are all mapped and

processed, and the data correlation is strong. In order

to simplify the complexity of data processing,

improve the feasibility of data, take the traffic flow

and the travel demand of citizens as the research field,

because there is also a sharp increase in the problem

in this field, the city manager now needs to use the

intelligent data, based on the data analysis platform to

do accurate traffic flow prediction, traffic accident

warning, road condition monitoring and signal

optimization, as shown in Table 1.

Table 1: Statistics of core data collected on a daily basis

data t

e Data volume

(

)

Traffic monitorin

180

Road condition sensing 130

Meteorological data 90

Vehicle dynamics 100

Citizen feedbac

Emer

encies 60

environmental monitoring 50

Table I shows the amount of data collected by

Smart Data from multiple domains on a daily basis,

reflecting that traffic monitoring and road condition

sensing are the main data sources. The intelligent data

continuously obtains data from tens of thousands of

cloud computing servers, surveillance cameras, and

citizen feedback channels every day, covering real-

time traffic flow, traffic accidents, and emergencies.

With more than 500 terabytes of data streaming per

day, this intelligent data needs to be efficiently stored,

distributed, and analyzed in real time using cloud

computing technology. The platform processes more

than 500 terabytes of data per day, covering a number

of key areas, mainly traffic monitoring, road

condition sensing, weather data, and vehicle

dynamics, as shown in Figure 1.

As can be seen from Figure 1, intelligent data is

the comprehensive result of large models and cloud

computing, and is at the top of data processing.

Among them, traffic monitoring data accounts for the

largest proportion, reaching 180 terabytes per day,

which indicates that traffic flow is the core content of

the platform's key monitoring. Based on the real-time

processing of this data, intelligent data can accurately

predict future traffic conditions and propose

signalable optimization schemes to improve road

efficiency.

Underlying data

Integrated Platform

Cloud Computing

Large model

Smart Data

Figure 1: Intelligent data processing process of large model

and cloud computing

4.2 The Degree of Optimization of Key

Indicators in the Platform

The core of intelligent data correlation is to ensure

that all modules can work together efficiently with

each other. Using standardized interfaces and APIs,

there will be a seamless data flow between each

module, and then all aspects of the entire intelligent

data will be closed. Intelligent data will use

middleware to manage the communication between

modules and ensure that information can be delivered

in a timely and accurate manner for specific metric

optimization, as shown in Table 2.

Table 2: Key performance indicators after intelligent data

association

Performance metrics Numerical

optimization quantit

Avera

e res

onse time

(

)

350

Data

rocessin

eed

(

TB/s

)

1.8

Prediction Accurac

(

)

Anomaly Detection Accuracy

(

)

Equipment failure early

warning rate (%)

Traffic Signal Optimization

Improvement Rate (%)

INCOFT 2025 - International Conference on Futuristic Technology

Table II illustrates the key performance of smart

data, from response time to prediction accuracy and

other key performance indicators for efficient

operation. From the data reflection, the intelligent

data has a strong data processing ability, which can

use the speed of 1.8TB/s to process large-scale data

in real time, and the prediction accuracy of the

intelligent data is 96%, indicating that its application

in traffic prediction and traffic management is very

significant. Moreover, the anomaly detection

accuracy rate of intelligent data is 93%, and the early

warning rate of equipment failure is 90%, which

means that it can effectively identify traffic

anomalies, such as accidents, congestion, and sudden

road closures. At the same time, it also shows that the

intelligent data can show high reliability in equipment

monitoring and maintenance management, and timely

detect and prevent potential equipment failures to

ensure the continuous and stable operation of the

intelligent data, as shown in Figure 2.

Figure 2: Levels of intelligent data analysis

As shown in Figure 2, the support of the cloud

computing platform will make data analysis very

intelligent and flexible to adapt to the needs of

concurrent data processing, and further enable the

intelligent data to still operate intelligently when

processing massive amounts of data. In addition,

intelligent data association also attaches great

importance to security management, such as

encrypted transmission, access control, and logging,

which can ensure the reliability and data security of

intelligent data (Zhao, Wu, et al. 2024). In addition,

with the automatic scaling function of the cloud,

intelligent data can flexibly allocate computing

resources according to changes in business needs, so

as to achieve high availability and dynamic

scalability.

4.3 Statistics on the Effect of Intelligent

Data Processing in Different

Months

In a predictive analytics intelligence data, a larger

regularization coefficient can make the intelligence

more stable when it finds anomalous data. The

parallel computing capability based on the cloud

computing platform can accelerate the process of

matrix factorization and clustering, especially the

process of similarity matrix and eigenvalue

calculation. For example, in a smart city management

intelligent data, the intelligent data can process a large

amount of cloud computing server data based on

parallelization to make real-time judgments and

optimize resource allocation, and the specific

statistical results are shown in Table 3.

Table 3: Statistics on the number of monthly processing

events of intelligent data

Month Data processing capacity

(

)

Januar

1200

Februar

1350

March 1400

ril 1550

1600

Table 3 shows the amount of data processed by

the smart data in different months, showing that the

data processing demand for the smart data continues

to grow over time. From the perspective of monthly

data processing, it can be seen that the amount of data

processed by this intelligent data has increased over

time. The amount of data processed in January was

1,200 TB, and in May it has increased to 1,600 TB, as

shown in Figure 3.

Figure 3: Intelligent scoring and statistical ratio

As can be seen from Figure 3, with the access of

more cloud computing servers and large models, the

load of intelligent data gradually increases, and the

Design and Implementation of an Intelligent Data Analysis Platform Combining Large Models and Cloud Computing

platform has good scalability to adapt to the growing

demand for data processing. On the whole, the

intelligent data analysis platform has excellent

performance in various aspects such as traffic flow

prediction and anomaly detection, and intelligent data

scalability for equipment failure warning. The

platform can provide city managers with intelligent

traffic management and optimization solutions, and

has high flexibility and scalability to cope with future

data growth.

5 CONCLUSIONS

This paper designs and implements an intelligent data

analysis platform that combines large models and

cloud computing, and demonstrates its strong ability

to process large-scale, multi-source heterogeneous

data. Based on the combination of cloud computing

technology and large models, the platform has

intelligent parallel processing, distributed computing

capabilities, and can ensure fast response and stability

when dealing with massive data. In addition, the

application of large models in the platform can

improve the accuracy and prediction ability of data

analysis, and then cope with complex data

environments. In short, in this paper, the elastic

expansion capability provided by cloud computing

can ensure the continuous and dynamic expansion of

the platform, and provide a strong technical guarantee

for future intelligent data analysis. At the same time,

it lays the foundation for the future intelligent

development. Although this article has been

improved in many aspects, there will still be errors

and omissions, and I hope that the data analysis part

can be expanded in the future. There are some

limitations in this study, mainly because the

application time of large models is short, and more

practical cases are needed to support it, and related

research will be focused on in the future.

REFERENCES

Ata, Y., Gökce, M. C., & Baykal, Y. (2024). Intelligent

Reflecting Surface Aided Vehicular Optical Wireless

Communication Systems Using Higher-Order Mode in

Underwater Channel. Ieee Transactions on Vehicular

Technology, 73(8), 11196-11208.

Fei, J., Jiang, X., Yang, H. W., Fan, K., Che, Y. M., Sun,

B., et al. (2024). Research and Development of a Big

Data Application Platform for Intelligent Blast Furnace

Intensive Management and Control. Acs Omega, 9(23),

24674-24684.

Feng, L. Y., & Ji, Y. F. (2024). LEARNERS BEHAVIOUR

PREDICTION AND ANALYSIS MODEL FOR

SMART LEARNING PLATFORM USING DEEP

LEARNING APPROACH. Scalable Computing-

Practice and Experience, 25(5), 3876-3885.

Gao, H. H., Qiu, B. Y., Wang, Y., Yu, S., Xu, Y. S., &

Wang, X. H. (2024). TBDB: Token Bucket-Based

Dynamic Batching for Resource Scheduling Supporting

Neural Network Inference in Intelligent Consumer

Electronics. Ieee Transactions on Consumer

Electronics, 70(1), 1134-1144.

Guo, Q. Z., Mu, L., & Lou, S. (2024). Revolutionizing

travel experiences: An in-depth analysis of intelligent

booking systems and behavioral patterns. Intelligent

Decision Technologies-Netherlands, 18(2), 1477-1494.

Li, C. P., Wu, L. H., Shu, C., Bao, Y. M., Ma, J. C., & Song,

S. H. (2024). Data-driven public health security.

Chinese Science Bulletin-Chinese, 69(9), 1156-1163.

Li, M. G., Ma, M., Wang, L., Pei, Z., Ren, J., & Yang, B.

(2024). Multiagent Deep Reinforcement Learning

Based Incentive Mechanism for Mobile Crowdsensing

in Intelligent Transportation Systems. Ieee Systems

Journal, 18(1), 527-538.

Su, Y. J., & Yang, S. Y. (2024). A User-friendly Cloud-

based Multi-agent Information System for Smart

Energy-saving. Journal of Internet Technology, 25(2),

293-300.

Yang, C. (2024). The development and application of an

intelligent detection and evaluation system for drilling

fluid. Journal of Thermal Analysis and Calorimetry,

149(8), 3415-3425.

Zhao, X. Y., Wu, Z. Q., Liu, Y. L., Zhang, H. L., Hu, Y. R.,

Yuan, D., et al. (2024). Eyecare-cloud: an innovative

electronic medical record cloud platform for pediatric

research and clinical care. Epma Journal, 15(3), 501-

510.

INCOFT 2025 - International Conference on Futuristic Technology