Research on Satellite Autonomous Fault Detection and Recovery
Framework
Xunjia Li
1, a
, Xin Ma
1, b
, Tao Zhang
1, c
, Xiaodong Han
2, d
and Yajie Liu
1, e, *
1
College of System Engineering, National University of Defense Technology, Changsha 410073, China
2
Institute of Telecommunication Satellite, China Academy of Space Technology, Beijing 100081, China
e
Corresponding author: liuyajie@nudt.edu.cn
Keywords: Autonomy, Fault detection, Recovery, Task re-planning, Artificial intelligence.
Abstract: During the operation of the satellite in orbit, the operating state may change due to the fault. The impact of
satellite failures on satellites is enormous. In this paper, satellite fault detection and recovery is taken as the
research object, and an automatic fault detection and recovery framework (AFRF) for fault recognition and
recovery of satellite autonomous operation and task re-planning is proposed. Artificial intelligence methods
are used in the framework to implement on-board rapid diagnosis faults, autonomous fault repair, and
autonomic task re-planning. The experimental results show that the proposed framework can solve the
satellite fault problem well and has a good engineering application prospect.
1. INTRODUCTION
A satellite is a platform that runs in space and can
perform a variety of shooting, communication, and
navigation tasks. The normal operation of the
satellite is the basic condition for the successful
completion of the mission. Due to the influence of
various factors such as the environment and its own
equipment on the satellite, the fault may occur
during the operation of the satellite. The impact of
the fault on the satellite is serious. After the fault
occurs, the satellite will suspend the task being
performed and enter the satellite fault diagnosis and
recovery mode. After the satellite passes the fault
diagnosis and detection, it will resume normal
operation and can continue to perform tasks. A fault
self-detection diagnosis and recovery framework is
set up, and artificial intelligence is used to detect
various data in the satellite state to determine the
occurrence of the fault. After detecting the fault, the
satellite enters the emergency mode to achieve rapid
satellite state recovery and task re-planning, so that
the satellite can start performing new tasks in the
fault repair.
With the increasing complexity of satellite
systems, various sensors obtain massive telemetry
parameters and perform anomaly detection for
telemetry data. Currently, there are four methods for
anomaly detection of satellite telemetry data: manual
monitoring combined threshold-based method,
expert system-based method, model building method
based on expert experience and data driven method.
Telemetry data tends to be regular and periodic, and
the data will fluctuate within a large range. For these
characteristics, this paper focuses on data-driven
methods. The data-driven method does not require
prior knowledge and data distribution requirements,
and is highly scalable and can be detected in real
time for streaming data. Among many methods,
prediction-based methods are currently hot topics
(Yang, Y., & Hou, N, 2013). The prediction based
anomaly detection methods include ARMA
(Weizheng, L. I., & Qiao, M, 2014), LSSVM (Bing,
C., Gang, L., Hongzheng, F., & Li, A, 2014), and
RVM [4], dynamic Bayesian network (Yairi, T.,
Kawahara, Y., Fujimaki, R., Sato, Y., & Machida,
K, 2006), ANN (Sadeghi, B. H. M, 2000; Elman, J.
L, 1990) and so on. This paper mainly uses artificial
neural network algorithm.
Satellite mission planning research is also an
important part of the satellite field. Literature (Song,
Y., Huang, D., Zhou, Z., & Chen, Y, 2018) proposed
an autonomous satellite autonomous re-planning
method, and set up three task insertion algorithms.
In (He, Y, Xing, L., & Chen, Y, 2016), from the
perspective of software design, an automatic mission
planning software based on new satellite is designed.
Li, X., Ma, X., Zhang, T., Han, X. and Liu, Y.
Research on Satellite Autonomous Fault Detection and Recovery Framework.
DOI: 10.5220/0008385602930298
In Proceedings of 5th International Conference on Vehicle, Mechanical and Electrical Engineering (ICVMEE 2019), pages 293-298
ISBN: 978-989-758-412-1
Copyright
c
2020 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
293
Literature (Song, Y. J., Zhou, Z. Y., Zhang, Z. S.,
Yao, F., & Chen, Y. W, 2019) considers several
problems in satellite mission planning and proposes
a generalized solution framework including mobile
edge computation. Literature (Zheng, Z., Jian, G., &
Gill, E, 2018) studied the multi-star system and
designed a multi-star synergy. Literature (Song, Y. J.,
Zhang, Z. S., Sun, K., Yao, F., Chen, Y. W, 2019)
solved the problem of satellite data downlink
mission planning by using genetic algorithm. This
paper will consider both satellite fault detection and
satellite mission planning, and design an efficient
and versatile autonomous operation framework.
The structure of this paper is as follows. In the
second part, the satellite autonomous fault detection
and recovery framework proposed in this paper will
be introduced. The third part will adopt the effect of
an application scenario verification framework.
Finally, the conclusions and prospects of this paper
will be given.
2. AUTONOMOUS FAULT
DETECTION AND RECOVERY
FRAMEWORK(AFRF)
Using satellite autonomous fault detection and
recovery technology can quickly diagnose faults,
allowing satellites to return to normal operating
conditions and perform new tasks in a short period
of time. The autonomous fault detection and
recovery framework provides a solution for the
satellite to quickly detect and diagnose faults and
transition from an abnormal state to a normal
working state. First, we will analyze the functional
requirements of the autonomous fault detection and
recovery framework for satellites to meet
autonomous operation and fault recovery. After that,
we will give the overall structure of AFRF and give
the specific content of each module.
2.1 Functional Requirements
Combined with the characteristics of satellite
autonomous operation and execution tasks,
according to the high reliability and high stability
operation requirements of satellites, the functional
requirements for rapid fault detection, autonomous
fault recovery and on-board mission re-planning are
proposed for satellite autonomous fault detection
and recovery.
Fast fault detection the occurrence of the fault is
serious for the satellite, which will result in the
satellite not functioning properly or even the satellite
being scrapped. The satellite needs to establish a
complete fault detection mechanism, monitor the
satellite health status according to the data of each
sub-system on the star, and adjust the abnormal sub-
system to a normal level. When a fault occurs,
quickly locate the sub-system in which the fault
occurred based on the satellite's existing fault
template library and determine the specific fault
location.
Autonomous fault recovery Satellites will not
function properly after a satellite failure, which
requires a quick return of the satellite to its normal
state. The traditional way of using ground
commands for satellite fault repair has strict
requirements on satellite position, and the timeliness
of repair is not high. Autonomous fault repair allows
the satellite to initiate the fault repair process at the
same time as the fault occurs, and the satellite can be
quickly restored to normal operation in a short
period of time, and the task can be continued.
On-board task re-planning after the satellite
failure occurs, the original task will be aborted until
the fault is repaired. At this time, since the satellite
position is different from the position before the
fault, many tasks will not be executed. The satellite
should regenerate a new mission execution plan after
the fault is fixed and complete the subsequent tasks
according to the new scheme.
2.2 Overall Design
Throughout the framework, the main functions
completed are training and improvement of fault
detection models, fault detection and re-planning,
and fault repair. Among them, the training and
improvement of the fault detection model is carried
out on the ground, because the ground has more
sufficient computing resources than the satellite,
which can better support the training and
improvement process of the artificial intelligence
model. Fault detection and re-planning, fault repair
is performed on the satellite. Satellite self-execution
fault diagnosis and repair has the ability to quickly
detect faults, quickly diagnose faults and repair, and
ensure that the satellites return to normal operation
in a short time.
ICVMEE 2019 - 5th International Conference on Vehicle, Mechanical and Electrical Engineering
294
Anomaly detection
Fault template matching
Fault detection
Generate a new set of
tasks
Task replanning
Autonomous fault repair Satellite status update
Fault and repair data
downlink
Ground
Fault Detection and
Replanning
Fault Repair
Fault Detection
Model Training and
Improvement
Offline training model
and inject satellite
Further model training
Figure 1. AFRF overall framework.
The artificial intelligence model is trained and
injected into the satellite before the satellite starts to
run. After acquiring the satellite fault data, the
model is transmitted to the ground for retraining of
the model to improve the prediction accuracy. The
improved model is re-injected to the satellite when
the satellite has an available satellite
communications link.
After the satellite is injected into the satellite, the
operational status of each satellite system is
monitored according to the operational data of each
satellite subsystem, and the model is used to
determine whether there is a possibility of failure. If
it is determined according to the model that the
satellite has a possibility of failure, the fault is
automatically detected. After the satellite detects the
cause of the fault, the template library is used for
autonomous fault matching and autonomous repair
is performed. After the repair process is completed,
the satellite parameters are updated and the fault and
repair information is transmitted to the ground via
the star link.
After satellite autonomous fault repair, the
satellite returns to the normal state, and the
corresponding data is provided to the task planning
part. The task planning part updates the task status
according to the data, generates a new task set, and
performs task re-planning according to the new task
set.
Through the ground model training and learning
in the whole framework, the satellite independently
detects faults and repairs faults, and re-plans the
tasks to be performed according to the new mission
situation after the satellites resume normal operation.
Our proposed AFRF can detect faults and fix faults
more quickly, adjust the task execution plan in time,
and let the satellites detect the faults and repair
process after rapid detection, and continue to run
normally and complete various types of tasks.
2.3 Module Design
In AFRF, the core part is fault diagnosis and
identification, autonomous fault repair, and task re-
planning. In the following, the module design will
be carried out separately for these three parts.
2.3.1 Satellite Fault Diagnosis Module
Design
The satellite fault diagnosis module includes satellite
data anomaly detection and fault identification. The
data anomaly detection function uses artificial
intelligence tools to run on ground, run on the star,
and use new data to further improve the model's way
of running. The fault identification function uses the
satellite's existing fault template library for fault
analysis and identification based on the abnormality
found by the anomaly detection function, and
accurately identifies the satellite subsystem and the
specific fault cause in which the fault occurs in the
shortest possible time.
The module adopts neural network-based
prediction method. The algorithm is based on
Research on Satellite Autonomous Fault Detection and Recovery Framework
295
traditional time series prediction technology. It is
assumed that the telemetry data has temporal
correlation. The new data can be obtained by
historical data recursively by establishing a time
window model. The range of new data is passed.
The established model predicts the mean and
variance. When the real data arrives, if the real data
is in the interval, no abnormality occurs; otherwise,
an abnormality is considered.
Considering that BP neural network (Sadeghi, B.
H. M. 2000) has the disadvantages of slow learning
speed, limited network promotion ability and cannot
effectively process different historical time
information in data, Elman neural network (Elman, J.
L. 1990) overcomes the above shortcomings, and
Elman neural network will hide the previous
moment. The information in the layer-containing
unit is used as part of the input data of the unit at the
current time, so that the units in the network have a
memory function, and can be continuously updated
as the data changes, thereby better learning
information and rules in different time series. . It can
realize the modeling of static systems and the
mapping of dynamic systems. Its computing power
and network stability have obvious advantages over
BP neural networks.
2.3.2 Satellite Autonomous Fault Repair
Module Design
After the satellite fault diagnosis module finds and
detects the cause of the fault, the satellite performs
emergency treatment and satellite fault repair
according to the fault plan. The specific process of
fault repair is determined according to the severity
of the fault. After the fault repair is completed, the
satellite summarizes the fault information and the
repair result and then transmits it to the ground to
facilitate the ground control center to decide whether
to further repair the faulty satellite or take other
measures to ensure the normal operation of the
satellite in orbit.
2.3.3 Satellite Independent Task Re-
planning Module Design
The satellite returns to normal operation after the
fault is repaired and needs to perform new tasks. The
autonomous task re-planning module is used to add
the tasks that are not successfully executed and will
be executed in the future to the to-be-planned task
set, and use the on-orbit dynamic programming
algorithm to solve the task set and generate a new
task execution plan. The satellite performs the
corresponding tasks in the next phase based on the
results of the re-planning.
3. APPLICATION EXAMPLES
In order to verify the validity of the proposed AFRF
framework, we conducted a complete application
example experiment. First of all, in the process of
satellite operation, there may be a situation of
failure. The autonomic fault detection is performed
by using Elman neural network for telemetry data.
The detection result is shown in Figure 2. The blue
line is the real data, the red line is the prediction
data, and the two green lines are the upper and lower
bounds of the prediction interval respectively. The
fault occurs at about 1400 and 1700. It can be seen
from the figure that when there is no fault, the real
data is basically in the prediction interval. When the
fault occurs, the real data value is outside the
prediction interval, thus detecting the abnormality,
which proves the accuracy of the method.
After fault detection, the satellite uses the AFRF
framework for autonomous fault repair, and the task
can be re-executed by restoring the satellite to
normal state. We use three commonly used heuristic
rules for autonomous re-planning. The goal of the
plan is the completion rate of task re-planning. The
results are shown in Table 1.
Table 1. Task re-planning result.
Instance
Heuristic1
Heuristic2
Heuristic3
25
0.92
0.92
0.96
50
0.92
0.92
0.92
75
0.88
0.88
0.89
100
0.90
0.88
0.92
125
0.83
0.83
0.85
150
0.74
0.80
0.81
ICVMEE 2019 - 5th International Conference on Vehicle, Mechanical and Electrical Engineering
296
Figure 2. Fault detection result.
As can be seen from Table 1, the use of these
three heuristic rules can quickly achieve satellite
mission re-planning. After completing the re-
planning of the mission, the satellite performs the
mission according to the new mission, and the value
of the satellite is utilized. It can be seen from the
above experiments that the AFRF framework
proposed by us is effective for fault detection and
repair and task re-planning, and can well restore the
satellite to normal operation.
4. CONCLUSION
Satellite fault detection and repair is of great
significance for ensuring the normal operation of the
satellite and successfully completing the mission.
This paper takes satellite fault detection and repair
as the research object, analyzes the satellite fault
repair and re-planning process, and proposes a
satellite autonomous fault detection and recovery
framework. In our proposed satellite autonomous
fault detection and recovery framework, it includes
ground model training and improvement, on-board
autonomous fault detection and re-planning, and
satellite autonomous fault repair walking. We use
artificial intelligence methods in the framework to
detect satellite faults using artificial intelligence. The
task re-planning is implemented by a heuristic
algorithm after the satellite resumes normal
operation, allowing the satellite to function better.
In the following research, we will try our use of
the autonomous fault detection and repair
framework on the actual application platform. At the
same time, the verification of multiple artificial
intelligence models under the framework of
autonomous fault detection and repair, the selection
of the most appropriate model method is also the
next step.
REFERENCES
Bing, C., Gang, L., Hongzheng, F., & Li, A. (2014).
Method of satellite anomaly detection based on least
squares support vector machine. Computer
Measurement & Control.
Elman, J. L. (1990). Finding structure in time. Cognitive
Science, 14(2), 179-211.
He, Y., Xing, L., & Chen, Y. (2016). Software design of
autonomous mission planning for new imaging
satellite. International Conference on Space
Operations.
Sadeghi, B. H. M. (2000). A bp-neural network predictor
model for plastic injection molding process. Journal of
Materials Processing Technology, 103(3), 411-416.
Song, Y., Huang, D., Zhou, Z., & Chen, Y. (2018). An
emergency task autonomous planning method of agile
imaging satellite. Eurasip Journal on Image & Video
Processing, 2018(1), 29.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time
-6.8
-6.6
-6.4
-6.2
-6
-5.8
-5.6
-5.4
-5.2
-5
Value
Abnormal detection results based on Elman neural network
True data
Prediction data
Interval bound
Interval bound
Research on Satellite Autonomous Fault Detection and Recovery Framework
297
Song, Y. J., Zhang, Z. S., Sun, K., Yao, F., Chen, Y. W.
(2019) A Heuristic Genetic Algorithm for Regional
Targets’ Small Satellite Image Downlink Scheduling
Problem, International Journal of Aerospace
Engineering, 2019, 1-13.
Song, Y. J., Zhou, Z. Y., Zhang, Z. S., Yao, F., & Chen, Y.
W. (2019). A framework involving mec: imaging
satellites mission planning. Neural Computing and
Applications, 1-12.
Weizheng, L. I., & Qiao, M. (2014). Fault detection for in-
orbit satellites using an adaptive prediction model.
Chinese Journal of Space Science, 34(2), 201-207.
Yairi, T., Kawahara, Y., Fujimaki, R., Sato, Y., &
Machida, K. (2006). Telemetry-mining: A machine
learning approach to anomaly detection and fault
diagnosis for space systems. IEEE International
Conference on Space Mission Challenges for
Information Technology. IEEE Computer Society.
Yang, Y., & Hou, N. (2013). Data series forecasting and
anomaly detection methods based on online least
squares support vector machine. Control Conference.
IEEE.
Zheng, Z., Jian, G., & Gill, E. (2018). Onboard
autonomous mission re-planning for multi-satellite
system. Acta Astronautica, 145, 28-43.
ICVMEE 2019 - 5th International Conference on Vehicle, Mechanical and Electrical Engineering
298