Research and Application of Prosperity Index of Tomato Industry

Chain Based on Power Big Data Analysis

Jun Ma

, Yaping Zhao and Jing Pan

State Grid Xinjiang Electric Power Co., LTD. Changji Power Supply Company, Changji, Xinjiang, 831100, China

Keywords: Big Data, Cloud Computing, Power Consumption Information, Data Collection, Data Grip Analysis.

Abstract: In order to solve the serious performance and scalability bottlenecks faced by power data analysis system in

the era of big data, and better meet the needs of production, marketing and other systems, the advantages of

cloud computing technology are analyzed, and the architecture and key technologies of power big data anal-

ysis system based on cloud computing are proposed. Based on Hadoop and hive, a multi-dimensional index

and SQL are designed for the characteristics of power big data Automatic translation tools and hybrid storage

model supporting data update are three performance improvement technologies to realize the upgrading and

optimization of traditional power data analysis system. The actual deployment experience of Zhejiang electric

power information acquisition system shows that compared with the traditional electric power data analysis

system, the system achieves an average performance advantage of 5 times at 1/8 of the hardware cost. It

proves that cloud computing technology can significantly improve the performance of power big data query

and analysis and effectively reduce costs.

1 INTRODUCTION

Smart grid technology is the deep integration of tra-

ditional power technology and information, control,

automation and other technologies. Through collect-

ing a large amount of data in power generation, trans-

mission, distribution, power consumption, dispatch-

ing, marketing and other links, in-depth analysis and

mining are used to guide the decision-making and op-

timization of each link, so as to improve the produc-

tion efficiency of power enterprises, enhance the sta-

bility of power grid operation, and better meet the

needs of power customers (Cai, Cao, Xu, et al., 2021).

With the continuous promotion of the construc-

tion of smart grid, the scale of smart grid is expanding

day by day. Various heterogeneous distributed data

sources such as smart meters, sensors and information

systems continue to generate massive data, which is

called power big data. Power big data is regarded as

an important support for power grid intelligence. For

example, installing sensors on power generation

equipment or transmission equipment and collecting

equipment operation status, external temperature,

wind and other information can dynamically adjust

the equipment operation status and give full play to

the maximum efficiency of the equipment. Due to the

fundamental role of power big data, the collection,

transmission, storage, processing and mining of

power big data has become an important content of

smart grid research and construction (Zhang, Zhao,

Li, 2020).

At present, most power data analysis systems are

based on relational databases, which have the disad-

vantages of slow analysis speed and poor scalability.

It is difficult to meet the data storage and analysis re-

quirements of power big data in the era of smart grid,

and has become the bottleneck of smart grid construc-

tion. Aiming at the common requirements of data

analysis and application in smart grid and the typical

characteristics of power big data, this paper proposes

a set of power big data analysis platform based on

cloud computing. The platform has been actually de-

ployed and put into operation in State Grid Zhejiang

electric power company, and achieved good results.

2 POWER BIG DATA AND ITS

CHARACTERISTICS

Power big data refers to a large amount of data gen-

erated by smart grid in power generation, distribution,

transmission, marketing and management. It is gener-

ated by a large number of sensors deployed on various

devices, smart meters installed in the homes of power

Ma, J., Zhao, Y. and Pan, J.

Research and Application of Prosperity Index of Tomato Industry Chain Based on Power Big Data Analysis.

DOI: 10.5220/0011751900003607

In Proceedings of the 1st International Conference on Public Management, Digital Economy and Internet Technology (ICPDI 2022), pages 555-559

ISBN: 978-989-758-620-0

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

555

users, customer feedback collected by the marketing

system and other data sources and converged to the

centralized data center for unified storage manage-

ment (Tang, Ding, 2021).

Power big data is one of the supporting elements

for building a stable, reliable, efficient and energy-

saving smart grid. By analyzing power big data, we

can improve the lean management level of smart grid,

formulate a more scientific production plan, optimize

energy transmission scheduling, and establish a more

accurate user behavior model.

Power big data has the 3V characteristics of large

volume, many types and fast speed, which are specif-

ically reflected in:

(1) Large volume. With the in-depth promotion of

the construction of smart grid, terminal data collec-

tion equipment such as device sensors and smart me-

ters have been intensively deployed, and the scale of

collected data will increase exponentially, reaching

the TB or even Pb level. Taking Zhejiang Province as

an example, there are 22million power users in the

province. If all smart meters are installed, according

to the requirements of State Grid Corporation of

China to collect one piece of power consumption in-

formation every 15 minutes, 2.1 billion pieces of

power consumption records will be added every day

(Wang, Bai, Dong, et al., 2021).

(2) There are many types. In addition to the tradi-

tional structured data, production management, mar-

keting and other systems produce a large number of

semi-structured and unstructured data such as audio

and video materials. The diversity of data types re-

quires the diversity of storage and processing technol-

ogies. This paper focuses on the data acquisition and

processing system of power consumption infor-

mation, which still focuses on structured data, and

does not discuss the processing of semi-structured

and unstructured data.

(3) Speed. The collection and processing of power

big data have extremely fast speed. The surge in the

number of terminals requires the storage system to

meet the requirements of high-throughput data access

hundreds of thousands of times per second.

In addition, power big data also has some unique

characteristics. According to the "white paper on the

development of power big data in China (2013)",

power big data also has the characteristics of 3E;

(1) Data is energy. Power big data contains ex-

tremely important information such as users' power

consumption rules and optimal transmission schedul-

ing strategies. This information plays a unique and

huge role in reasonably arranging production and re-

ducing energy consumption losses, and promotes the

reduction of energy consumption and sustainable de-

velopment of the power grid, thus embodying the

characteristics of data as energy.

(2) Data is exchange. Through the interaction and

aggregation with big data of other industries, and in-

depth mining and analysis, the information contained

in power big data has extremely important reference

value for the country's high-level decision-making

and economic situation judgement (Chen, Li, Cui et

al., 2019).

(3) Data is empathy. Power big data provides a

new way for State Grid Corporation of China to

timely and accurately discover and meet the needs of

users. Empathy is empathy. Both production and mar-

keting rely on power big data to provide more high-

quality, safe and reliable power services to the major-

ity of power users, so as to achieve the goal of com-

mon development.

3 CLOUD COMPUTING AND ITS

ADVANTAGES

Cloud computing is a new large-scale distributed

computing model, which originates from the demand

of Internet companies for a large number of compu-

ting and storage resources and the pursuit of scalabil-

ity, high performance, high availability and other

characteristics. Cloud computing aggregates a large

number of distributed and heterogeneous resources,

providing users with powerful massive data storage

and computing capabilities. Cloud computing pro-

vides users with on-demand services through virtual-

ization, dynamic resource allocation and other tech-

nologies, avoids resource waste and competition, and

improves resource utilization and application perfor-

mance. Cloud computing provides horizontal scaling

and dynamic load balancing capabilities, that is,

Cloud Computing supports adding new nodes to the

data center at runtime, and the system will automati-

cally migrate part of the load to the new nodes, and

maintain the load balance between nodes, thereby en-

hancing the business carrying capacity of the whole

system. Resources in the cloud computing environ-

ment are organized in the form of data centers. A data

center contains thousands or even tens of thousands

of nodes. Nodes are interconnected through high-

speed networks to jointly provide users with compu-

ting and storage resources. Cloud computing has de-

veloped very rapidly. At present, it has gone out of

the laboratory, and a series of mature products and

technologies have emerged. In addition to Internet

ICPDI 2022 - International Conference on Public Management, Digital Economy and Internet Technology

556

companies, it has been widely used in many tradi-

tional industries, such as telecommunications, retail,

finance and so on.

4 POWER BIG DATA ANALYSIS

SYSTEM BASED ON CLOUD

COMPUTING

The big data analysis technology based on cloud com-

puting has a relatively complete reference architec-

ture and software implementation, and has been ap-

plied in some industries. However, most of these sys-

tems come from the Internet industry, and their de-

sign, implementation and functional characteristics

fully reflect the needs and characteristics of Internet

big data and its business. When applied to industries

other than the Internet, the general practice is to cus-

tomize, develop and optimize the existing system

based on the full and detailed analysis of the big data

characteristics and business needs of the industry.

There are three main differences between power

big data and Internet big data.

(1) In the Internet scenario, typical big data appli-

cations need to scan the entire data set sequentially.

Therefore, the distributed parallel big data analysis

system hive or impala does not provide good support

for indexing; In power big data analysis, multi-di-

mensional region query is very common. Because

there is no index, it will lead to access a large number

of unnecessary data and significantly reduce the per-

formance of query execution. It is necessary to design

an appropriate index structure and corresponding data

retrieval mechanism according to the characteristics

of multi-dimensional region query.

(2) The typical feature of Internet big data is

"write once and read many times". For this data fea-

ture, both distributed file system (HDFS) and hive do

not provide data rewriting (update or delete) mecha-

nism, and the purpose of rewriting data can only be

achieved indirectly by covering all existing data. In

the power big data business scenario, there are a large

number of data rewriting statements. Executing these

queries in a way that covers the existing data will lead

to the problem of low execution efficiency. There-

fore, there is an urgent need to provide a data rewrit-

ing mechanism in the existing system.

(3) Big data query language designed by Internet

companies according to their own business needs. For

example, HQL is only a subset of SQL, while power

data analysis systems are mostly written in standard

SQL language, which requires a lot of manpower and

time to translate tens of thousands of existing SQL

statements into equivalent HQL statements. There-

fore, it is necessary to design a tool to automatically

translate SQL language into HQL language, so as to

improve the migration speed of legacy applications

and realize the seamless and smooth migration of

power data analysis business.

Based on the above analysis, in order to meet the

needs of smart grid for in-depth analysis and mining

of power big data, aiming at the typical characteristics

of power big data and its business logic, combined

with the latest progress of cloud computing technol-

ogy and actual industry deployment experience, a

power big data analysis system based on Cloud Com-

puting is developed.

The system is based on the distributed parallel

computing framework (Hadoop), uses hive as the data

analysis software, and develops key technologies

such as multidimensional index based on grid file,

SQL to HQL automatic translation tool based on

query rewriting, and hybrid storage model supporting

data update according to the characteristics of power

big data, which comprehensively enhances the perfor-

mance and ease of use of hive. This system has been

successfully applied in Zhejiang electric power con-

sumption information acquisition system. Compared

with the original system based on relational database,

it greatly improves the system performance and re-

duces the system cost.

5 APPLICATION EXAMPLES

This paper uses Changji tomato enterprise as a case

for implementation analysis. Collect the quarterly

data of the primary indicators from 2019 to 2021, se-

lect the industry sales revenue as the benchmark indi-

cator, and obtain the leading and consistent indicators

and the weight of each indicator through the analysis

of the process mentioned above.

From the perspective of enterprise prosperity

analysis, COFCO Tunhe Manas tomato products Co.,

Ltd. had the largest growth rate in the first half of

2021, and Xinjiang Yanyang tiantianhu tomato prod-

ucts Co., Ltd. had the largest growth rate in the second

half of 2020; However, the prosperity index of Xin-

jiang Zhongji Red Tomato Industry Co., Ltd. fang-

caohu branch and Xinjiang Tometo tomato science

and Technology Development Co., Ltd. in the first

half of 2021 and the second half of 2020 showed a

significant negative growth, which should be brought

to the attention of relevant government departments.

In 2019, the average prosperity index of Changji

Research and Application of Prosperity Index of Tomato Industry Chain Based on Power Big Data Analysis

557

tomato enterprises was 56.252, and in 2020, the aver-

age prosperity index of Changji tomato enterprises

was 58.33, 2.08 (3.57%) higher than that in 2019;

From January to June 2019, the average prosperity in-

dex of tomato enterprises was 7.08. From January to

June 2020, the average prosperity index of tomato en-

terprises was 5.16, a year-on-year decrease of 1.92 (-

37.21%); From January to June 2021, the average

prosperity index of tomato enterprises was 6.09, an

increase of 0.93 (15.27%) year-on-year in 2020. In the

first half of 2020, the prosperity index of Changji to-

mato industry fell significantly, as shown in Figure 1.

（Photo credit: Original)

Figure 1: Prosperity index of tomato enterprises in Changji

Prefecture in recent three years.

In combination with the changes in the prosperity

index of Changji tomato enterprises in 2019, 2020

and January June 2021, the changes in the prosperity

index of Changji tomato enterprises are mainly di-

vided into two periods, namely, the off-season and

peak season, as shown in the following figure.

Off season (January July, December): at this time,

tomatoes have not yet reached the stage of large-scale

mature harvest, there are a small number of green-

house planting and harvest, and the power consump-

tion scale of tomato enterprises is low. Judging from

the changes of the prosperity index in 2019 and 2020,

with the rapid spread of the domestic epidemic, the

production and operation of tomato enterprises in

Changji Prefecture have been affected to a certain ex-

tent. Some enterprises have encountered problems

such as insufficient orders, tight sales, poor logistics

and financial difficulties, resulting in the prosperity

index in the first half of 2020 being significantly

lower than that of the previous year. The average

prosperity index in the off-season in 2020 (5.19) de-

creased by 2.11 compared with that in 2019 (7.3).

Judging from the changes of the prosperity index in

2020 and 2021, with the effective control of the do-

mestic epidemic, the average prosperity index (6.09)

from January to June 2021 increased by 0.93 year-on-

year (5.16) in 2020, and it is expected that the pros-

perity index will continue to increase in the second

half of the year.

Peak season (August November): with a large

number of mature tomatoes harvested, the power con-

sumption scale of tomato enterprises in Changji Pre-

fecture has increased, the production and operation

situation has gradually improved, and the enterprise

prosperity index has increased rapidly. Judging from

the changes of the prosperity index in 2019 and 2020,

with the rapid spread of the domestic epidemic, the

production and operation of tomato enterprises in

Changji Prefecture have been affected to a certain ex-

tent, and the upward trend of the enterprise prosperity

index has slowed down. From the end of July to the

beginning of September 2020, due to the closure of

the epidemic in Xinjiang, the prosperity index in Au-

gust 2020 was significantly lower than that in August

2019, and the highest prosperity index in the peak

season of 2020 (310.19) was 25.61 lower than that in

2019 (335.8). As shown in Figure 2.

（Photo credit: Original)

Figure 2: Monthly prosperity index in recent three years.

6 CONCLUSION

On the whole, in 2019, 2020 and the first half of 2021,

with the easing of the epidemic and the continuous

drive of the Xinjiang government's various measures

to increase production and efficiency, and phased pol-

icies to reduce the cost of electricity for enterprises,

the prosperity index of tomato enterprises in Changji

Prefecture gradually rebounded, and the production

and operation of tomato enterprises showed an up-

ward trend.

ICPDI 2022 - International Conference on Public Management, Digital Economy and Internet Technology

558

REFERENCES

Cai Y, Cao S, Xu J, et al. Study on Data Management of

Electric Power Distribution Dispatch and Cutting-

through Based on Big Data[J]. Journal of Physics: Con-

ference Series, 2021.

Chen Z, Li D, Cui C, et al. A Multi-database Hybrid Storage

Method for Big Data of Power Dispatching and Con-

trol[C]// 2019 IEEE SmartWorld, Ubiquitous Intelli-

gence & Computing, Advanced & Trusted Computing,

Scalable Computing & Communications, Cloud & Big

Data Computing, Internet of People and Smart City In-

novation (SmartWorld/SCALCOM/UIC/ATC/CB

DCom/IOP/SCI). IEEE, 2020.

Liang Chen, Weilian Su, Qingsheng Yang, et al. Analysis

Method of Sensing Information in Electric Power Dis-

tribution Based on Edge Computing[M]. 2020.

Tang X, Ding C. Information Security Terminal Architec-

ture of Power Transportation Mobile Internet of Things

Based on Big Data Analysis[J]. Wireless Communica-

tions and Mobile Computing, 2021.

Wang Z Q, Bai W J, Dong A Q, et al. A Global Data Model

for Electric Power Data Centers[J]. Journal of Signal

Processing Systems, 2021, 93(2).

Zhang S, Dai H, Shi Z, et al. The Construction of Electric

Power Big Data Analysis Platform Prospects in Smart

Grid Application[C]// 2020 IEEE/IAS Industrial and

Commercial Power System Asia (I&CPS Asia). IEEE,

2020.

Zhang Z, Zhao J, Li Z. Preliminary Analysis of The Appli-

cation and Development of Big Data, Cloud Compu-

ting, IoT, Mobile Internet, AI, and Blockchain In The

Power Scene[C]// ICITEE2020: The 3rd International

Conference on Information Technologies and Electrical

Engineering. 2020.

Research and Application of Prosperity Index of Tomato Industry Chain Based on Power Big Data Analysis

559