Design of Heterogeneous Data Warehouse Architecture for Supply

Chain Management System

Ruiqin Lin

, Wenan Tan

, Pan Liu

and Lu Zhang

College of Resources and environmental engineering, Shanghai Polytechnic University, Jinhai Road, Shanghai, China

College of Computer and Information Engineering, Shanghai Polytechnic University, Jinhai Road, Shanghai, China

Information Technology Center, Shenzhen Easttop Supply Chain Management CO.,LTD., Zhouhai Road, Shanghai, China

Keywords: Data Warehouse, Data Quality, Data Integration, Big Data Development.

Abstract: Multiple systems are covered by the supply chain management system portal website, which includes tasks

such as basic data entry, bill of lading generation, customs declaration, transportation, expenditure settlement,

statistical reports, and more. Each system is spread among several departments, and data is kept by numerous

departments at various phases. The storage format and semantics are vastly different. Data is complicated and

varied, and data quality is challenging to ensure during the data integration process. Data is frequently lost,

and storage types are incompatible with one another. As a result, appropriate technical solutions are required

to ensure the supply chain management system's data quality after data integration. This article focuses on the

current state of the supply chain management system data and the issues that it faces. The business

requirements for the unified and standardized storage of supply chain management system data are derived

from the description of the challenges. Finally, it situates the primary issues discussed in this article within

the framework of this company.

1 INTRODUCTION

1.1 Background in the Industry

The supply chain industry's informatization has been

strengthened internally as a result of the rapid

development of Internet technology, and industry

data has shown explosive growth (

HUANG, 2021).

Massive data contains enormous value, and how to

mine these values more effectively and quickly has

steadily become the focus of data owners' attention.

The information system's basic data is a valuable

resource that has a significant impact on the

enterprise's economic development and management,

and serves as the foundation for scientific

management and decision-making. Currently, most

supply chain management systems spend a

significant amount of money and time developing

online transaction processing OLTP business systems

and office automation systems to record various

transaction processing related data. These data have a

lot of commercial worth. Enterprises did not make the

best use of their existing data resources, wasting more

time and money while also missing out on the best

opportunity to make critical business decisions. Most

traditional data warehouses are still in use in the

business, and the majority of existing supply chain

management systems are built using old methods,

such as acquiring pricey large-scale servers. Database

fragmentation divides the data on this basis. The data

is stored on a disk array, which makes system growth

and upgrading more difficult and expensive, and the

entire system is tightly coupled, making it impossible

to meet the demands of high efficiency,

dependability, and economy. As a result, figuring out

how to turn data into information and knowledge via

various technical means has become a major barrier

in improving the company's fundamental

competitiveness. ETL technology is the most

important technological tool among them.

1.2 Scenario of a Business

The data from the supply chain management system

is spread across several departments. Different

departments are in charge of developing, managing,

and maintaining various enterprises, and different

departments keep track of various basic business

data. Basic information is frequently kept and defined

Lin, R., Tan, W., Liu, P. and Zhang, L.

Design of Heterogeneous Data Warehouse Architecture for Supply Chain Management System.

DOI: 10.5220/0011183600003440

In Proceedings of the International Conference on Big Data Economy and Digital Management (BDEDM 2022), pages 441-445

ISBN: 978-989-758-593-7

441

independently. Each department's basic data is like a

data island that can't be linked to other departments'

data. This type of data island manifests itself mostly

in physical and logical features. Different business

departments understand and define data at their own

business level, and there is no linkage

communication; data is stored and maintained

independently in different departments, and there is

no linkage communication; data is stored and

maintained independently in different departments,

and there is no linkage communication; data is stored

and maintained independently in different

departments, and there is no linkage communication.

Different departments' data semantics differ,

resulting in a disparity in data semantics that subtly

increases the difficulty of data cooperation and data

association between departments. Figure 1 shows the

association approach, however these push data

packets are frequently associated and related. The

data push industry is complicated by this highly

connected, low coherent, and inaccurate data

representation. Department-based data, considerable

autonomy, and a lack of cohesive management ideas

characterize the current data push situation. There is

a danger of data inconsistency because information

on the present stage of the project cannot be obtained

centrally and must be gathered from multiple

departments of numerous firms.

Figure 1: The original way of business data association.

1.3 Scholar of Research

The creation of a new type of hierarchical data

warehouse has become a current research hotspot

with the onset of the big data era (Di Tria, et al.,

2014). To collect, process, and store unstructured

data, Wu Wentai (Wu, et al., 2018) and others

employed Hadoop technology. The challenge of

enterprise data processing is handled by combining

big data fusion technology with a data warehouse.

(Zhang, 2017) evaluated Hive technology's logistics

data warehouse and offered a specific logistics data

warehouse deployment plan. Simultaneously, as

corporate complexity has increased, big data

development has evolved from just performing

computations on data to being more standardized and

process-oriented. (ZHANG, 2017, Dongya, 2017,

Jinghua, 2017)

1.4 Solution

As a result, it is required to upgrade the current state

of data in the supply chain management system in

order to make data easier to gather, store, interpret,

and value. Data storage can be standardized, and data

representation standards can be specified consistently

for logical data islands. Physical data islands are

being phased out in favor of a unified supply chain

BDEDM 2022 - The International Conference on Big Data Economy and Digital Management

442

management system data warehouse and the

application of ETL technologies to the supply chain

management system data integration company to

complete the supply chain management system. Data

integration is used to store data information centrally

and offer initialization data for each system. The

coupling of data between various departments may be

decreased, the interdependence of multiple

departments can be reduced, and the data push

process can be simplified by establishing a data

warehouse. Figure 2 depicts the new and better

business data association condition.

Figure 2: The new and improved business data association situation.

2 MANUSCRIPT PREPARATION

Because of the characteristics of big data, such as the

rapid increase in data volume, various and complex

data types, extremely low data value density, and fast

processing speed, traditional data warehouse

architecture has become increasingly unable to meet

actual needs in the context of big data applications.

This work builds a set of data warehouse architecture

against the backdrop of big data on the basis of

traditional data warehouse architecture, as shown in

Figure 3, to address the unified storage and

administration requirements of supply chain

management system data.

In-depth research of business demands, in-depth

understanding of the company's business, and the

establishment of an effective model are among the

issues to be resolved. Data warehouse modeling

ensures the rationality of the data warehouse storage

structure, efficient data query efficiency, and storage

space savings by consulting relevant literature,

analyzing cutting-edge hotspots, and implementing

technical routes; data warehouse modeling, the

storage structure of each information system is

different, data warehouse modeling ensures the

rationality of the data warehouse storage structure,

efficient data query efficiency, and storage space

savings by consulting relevant literature, analyzing

cutting-edge hotspots, and implementing technical

routes; Reduce data loss; ETL tools are used in the

process of implementing ETL, the ETL process is

designed, and the data is loaded into the data

warehouse; ETL tools are used in the process of

implementing ETL, the ETL process is designed, and

the data is loaded into the data warehouse.

The data acquisition layer, data processing layer,

data storage layer, and data application layer are the

four tiers of the architecture. The data warehouse has

subject-oriented and integrative properties. To face a

choice, data integration is used to solve a specific

subject query and visualize the analytical results. As

a result, the foundation of a data warehouse is high-

quality data. Through layered management, the data

warehouse architecture realizes the step-by-step

completion of work, and the processing logic of each

layer becomes easy.

2.1 Data Acquisition Layer

To handle company data and offer raw data for data

warehouses, employ relational databases. This is the

step in which the source data is acquired. Other types

Design of Heterogeneous Data Warehouse Architecture for Supply Chain Management System

443

of data, such as database script data, text data, and so

on, are collected using open source tools, Sqoop, or

program codes. The Oracle data source stores all data

information; this layer is the foundation of the overall

architecture.

Figure 3: Design of a heterogeneous data warehouse architecture.

2.2 Data Processing Layer

Pre-processing, pulling distant mapping data using

scripts, and then querying and analyzing the data

according to the system's business requirements are

all required before doing data analysis on the original

data stored in the data warehouse. The code execution

time consumption will be recorded as a log log during

the full job procedure. The automatic program will

evaluate this log, and the start and end times of

various programs will be emailed to the management

and maintainer. At the same time, the query analysis

and processing result data will be exported to the

Mysql database. These are the data to which we must

pay attention in order to assess the quality of later data

integration. This procedure involves performing ETL

operations on the obtained source data, which

includes extraction, conversion, and loading. The

ETL process can be described as an ETL task script,

which is stored in the ETL task resource library in the

form of metadata, and at the same time Combine

Hadoop, Spark, Zookeeper, and other big data

processing technologies to improve task scheduling

and monitoring; the ETL process can be described as

an ETL task script, which is stored in the ETL task

resource library in the form of metadata, and at the

same time Combine Hadoop, Spark, Zookeeper, and

other big data processing technologies to improve

task scheduling and monitoring. The data processing

layer is in charge of the data warehouse architecture's

key tasks. Not only must the data quality be

guaranteed, but the efficiency of ETL task scheduling

and resource usage must also be increased as much as

feasible during this process.

2.3 Data Storage Layer

Designing dimensional models and data marts can

help to support and improve data storage. The data

storage model divides ETL task levels while

providing data storage. It can also handle distributed

data storage and relational database storage, such as

Hbase and Hive, as well as integrated data migration.

2.4 Data Application Layer

Front-end data analysis and interface display, such as

multi-dimensional analysis and report visualization,

are supported. The application layer performs all of

the activities of the Web display module, follows the

B/S structure, creates J2EE projects, selects the MVC

mode, and uses the Web terminal to display the data

in the form of charts. Before and after, the data is

transmitted using ajax asynchronous transmission,

and the data is in json format.

BDEDM 2022 - The International Conference on Big Data Economy and Digital Management

444

3 RESULTS & DISCUSSION

To satisfy the current requirements for big data

integration and migration storage, the data warehouse

integrates associated big data processing architectural

technologies. This article starts with the construction

of a supply chain management data warehouse and

divides it into two sections: data integration under the

data processing layer and ETL task scheduling and

monitoring, which includes data quality assurance

technology after data integration and the study of

ETL task script scheduling execution strategies.

4 CONCLUSIONS

This chapter primarily introduces the current specific

business background of supply chain management

data, as well as the various problems encountered,

and then constructs a supply chain management data

warehouse architecture against this backdrop, before

describing and analyzing the architecture's hierarchy.

ACKNOWLEDGEMENTS

This work was financially supported by the National

Natural Science Foundation of China (61672022,

U1904186), Shanghai Polytechnic University Key

Discipline Electronic Information Special Master

Program Project (XXKZD1604). At the same time,

I'd want to express my gratitude to my teachers,

Wenan Tan and Pan Liu, at this time. They provided

me with numerous scholarly and helpful comments

and assisted me in revising my work during the

writing process. He also gave me the opportunity to

practice teaching in addition to these things.

REFERENCES

Di Tria, F., Lefons, E., & Tangorra, F. (2014). Design

process for big data warehouses. In 2014 International

Conference on Data Science and Advanced Analytics

(DSAA) (pp. 512-518). IEEE.

HUANG B. (2021). Research on Intelligent Transformation

of Logistics Industry in the Era of Big Data. J. Journal

of Technical Economics & Management. 12, 118-121.

Wu, W., Lin, W., Hsu, C. H., & He, L. (2018). Energy-

efficient hadoop for big data analytics and computing:

A systematic review and research insights. Future

Generation Computer Systems, 86, 1351-1367.

ZHANG R. (2017). Based on the Hive of data warehouse

logistics research and design of the big data platform. J.

Electronic Design Engineering. 25(09), 31-35.

ZHANG, Q., Dongya, W. U., & Jinghua, Z. H. A. O.

(2017). Big data standards system. Big Data Research,

3(4), 11.

Design of Heterogeneous Data Warehouse Architecture for Supply Chain Management System

445