Predictive Monitoring and Anomaly Detection in Industrial Systems

Archana Burujwale, Anuj Tadkase, Tejas Hirve, Yash Kolekar and Sumedh Bambal

Department of Computer Science and Engineering(Artificial Intelligence) Vishwakarma Institute of Technology, Pune,

411037, Maharashtra, India

Keywords: Anomaly Detection, Ensemble Learning, Isolation Forest, Local Outlier Factor, One-Class SVM, Robust

Covariance, Industrial Sensors, Real-Time Monitoring, Operational Efficiency, Machine Learning, Time-

Series Analysis, Oil and Gas Industry, Artificial Intelligence, Cloud Computing.

Abstract: In today’s industrial activities, the detection of anomalies in real-time is important for safety and efficiency

primarily in the oil and gas sector. This paper introduces a strange finding detection system which processes

time-series data from different pumps, ball-bearings and chemical sensors like CH4 , CO2 , O2 , temperature,

humidity and pressure. The system employs various machine learning models such as Isolation Forest, Local

Outlier Factor, Robust Covariance and a One- Class SVM. The individual models detect anomalous sensor

behavior, and the ensemble model combines their predictions through majority voting. The solution suggested

will resolve data quality issues, provide businesses with actionable insights for better decision- making,

lowered operational costs and better safety by addressing critical anomalies on time.

1 INTRODUCTION

In today’s industrial world, oil and gas sectors need

to operate efficiently and safely. Many damages and

designs used in the chemicals and oil and gas

industries are highly volatile and hazardous in nature.

As such, they are highly susceptible to leaks and gas

releases, equipment malfunctions, and external

events such as changes in temperature and humidity.

Pipelines carry gases including methane (CH4),

carbon dioxide (CO2), and oxygen (O2) at high

pressure with temperature controls and integrity

checks in storage vessels. If these parameters stray

too far from the ideal, the consequences can be dire—

poor safety for workers and even damage to our

environment. For instance, uncontrolled growth in a

pipeline’s pressure can trigger an explosion while a

rise or drop in temperature can cause the gas flow to

be inefficient, or even condense, causing long- term

damage.

The major objective of this research is to design

an automated monitoring system which detects the

anomalies in the multiple datasets such gas pressure,

temperature, humidity, etc. Our system uses some

machine learning models so that it is easier to monitor

and detect them namely Isolation Forest, Local

Outlier Factor, Robust Covariance (Elliptic

Envelope), and One-Class SVM to identify and

monitor outlier and anomalous behaviour in sensor

data. With the ability to pick up on anomalies in real-

time, companies can quickly solve issues and prevent

unexpected downtime.

Figure 1: Process flow of Oil and Gas production

Once an anomaly occurs such as an increase in

pressure or temperature drop, alert can be raised

enabling the operator to take instant action. This

activity to shut down certain processes is activated

automatically for safety for workers and the

environment from escalation. Our system helps you

comply with regulations, operate efficiently, and

Burujwale, A., Tadkase, A., Hirve, T., Kolekar, Y. and Bambal, S.

Predictive Monitoring and Anomaly Detection in Industrial Systems.

DOI: 10.5220/0013583700004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 1, pages 683-691

ISBN: 978-989-758-763-4

683

improve safety for workers and the environment by

addressing your issues as they arise.

2 LITERATURE SURVEY

Ahmed, M., Mahmood, A. N., & Hu, J.

(2016).Reviews various techniques used to detect

network anomalies which including statistical

methods and machine learning approaches.

Hodge, V. J., & Austin, J. (2004). Author

discusses different outlier detection techniques for

machine learning methods.

Zong, B., et al. (2018). The authors disuses tha

combination of deep autoencoders with Gaussian

mixture models for anomaly detection.

Wang, F. Y., et al. (2018). The author develops

real-time anomaly detection system using machine

learning models used for IoT applications in

industries.

Yu, L., et al. (2018). Investigates various machine

learning models for detecting anomalies and

providing insights.

Munir, A., & Saeed, M. (2020). The authors

review predictive maintenance techniques which

includes machine learning methods for anomaly

detection.

Alhassan, S. A. T., et al. (2021).Author discussed

challenges faced in using machine learning for

anomaly detection.

Choudhary, S., & Patel, P. (2021).Authors used

ARIMA models for time series forecasting in

chemical processing.

Liu, Y., et al. (2021). Focuses on anomaly

detection in industrial control system using different

machine learning model to compare their

performance metrics.

Iglewski, J. & P. B. (2019). The authors presented

a framework for real-time anomaly detection in

manufacturing systems using machine learning

techniques.

Ghafoor, K. et al. (2020). This paper investigates

the effectiveness of deep learning methods for

detecting anomalies in industrial time series data.

Zhang, Y., & Jiang, H. (2019). The authors review

various methods for anomaly detection in industrial

applications detected by integration of machine

learning and domain knowledge.

Li, Y., et al. (2021). This study develops an

anomaly detection model for predictive maintenance

in industrial settings by utilizing different sensors

data and machine learning models.

Marjanovic, O., et al. (2020). This paper uses a

hybrid approach for anomaly detection in smart

manufacturing environments combining rule-based

and machine learning techniques.

Roy, S., & Chowdhury, P. (2019). The authors

propose a real-time anomaly detection system for

industrial IoT applications.

3 TOOLS USED

3.1 Apache Kafka

Kafka is utilized for data streaming with efficient

collection, processing and dissemination of large

volumes of operational data from various sources.

3.2 Amazon S3

S3 is used to store all factory data, including sensor

readings, anomaly detection results, and historical

data.

3.3 Python

Python is the primary programming language used in

this research because of its extensive libraries and

frameworks supporting data analysis, machine

learning, and data visualization.

3.4 Scikit-learn

Scikit-learn is used to implement various machine

learning models which includes Isolation Forest and

Autoencoder which detects anomaly through

supervised and unsupervised learning techniques.

3.5 TensorFlow/Keras

TensorFlow, along with its high-level API Keras, is

utilized for developing and training deep learning

models, especially for Autoencoder architectures.

3.6 Tableau

Tableau is used for visualizing the results of anomaly

detection, providing insights into operational trends,

anomalies, and key performance indicators, thereby

enabling better decision-making based on data.

3.7 Colab Notebook

It is used in this research for developing and

documenting overall data analysis and machine

learning workflow.

INCOFT 2025 - International Conference on Futuristic Technology

684

3.8 Amazon athena

A serverless interactive query service that allows

users to analyse data stored in Amazon S3 using

standard SQL. It simplifies querying large datasets

and integrates seamlessly with the AWS ecosystem

for data storage and analysis.

4 METHODOLOGY

Figure 2: Methodology Diagram

4.1 Collection of Diverse records from

industry for database:

The foundation of our research is built upon two

key datasets each serving a critical role in monitoring

industrial operations and detecting anomalies:

4.1.1 Dataset

This dataset comprises sensor readings from various

chemical and environmental parameters, including

methane (CH4), carbon dioxide (CO2), oxygen (O2),

temperature, humidity, pump pressure, gauge

pressure, and barometric pressure. These

measurements are essential for tracking gas

emissions, environmental conditions, and ensuring

regulatory compliance. Anomalies in this data can

indicate potential leaks, hazardous gas levels, or

equipment malfunction, which could impact both

safety and operational efficiency.

Figure 3: dataset

4.2 Data Streaming Using Apache

Kafka

To facilitate real-time data processing and ensure

timely anomaly detection, we utilize Apache Kafka

as a distributed streaming platform. The architecture

consists of:

4.2.1 Producers

Sensors and equipment (pumps, bearings) act as

producers, sending real-time data to Kafka topics.

4.2.2 Consumers

Data processing applications subscribe to these topics

to consume and analyze the incoming data in real-

time.

4.3

Data Preprocessing

4.3.1 Data Cleaning

4.3.2 Removing Duplicates

Scan the datasets to identify and remove duplicate

entries.

4.3.3 Handling Missing Values

Employ techniques such as interpolation for time-

series data or imputation based on domain knowledge

to handle missing data.

4.3.4 Correcting Erroneous Entries

Flag and correct unrealistic sensor readings (e.g.,

negative pressure values) based on expected

operational ranges.

4.4 Normalization

4.4.1 Min-Max Scaling

This technique transforms the feature values to a

fixed range, usually [0,1]. Each feature value i, x is

transformed using the formula:

𝑥′  𝑥𝑖  𝑚𝑖𝑛𝑋/𝑚𝑎𝑥𝑋  𝑚𝑖𝑛𝑌

Standard Scaler is applied to normalize sensor

data, ensuring all features have a mean of 0 and a

standard deviation of 1. This helps the models like

Predictive Monitoring and Anomaly Detection in Industrial Systems

685

Isolation Forest and One-Class SVM to perform

efficiently without being affected by the scale of the

data.

𝑧𝑥𝑖𝜇/𝜎

Figure 4: Z-score

Moving averages and moving standard deviations

are computed over a sliding window for key

variables. For instance, the rolling mean and standard

deviation of methane (CH4) and temperature are

calculated to smooth out short- term fluctuations and

highlight longer-term trends or shifts.

4.5

Timestamp Alignment

In industrial settings, data from multiple sensors and

sources is often recorded at different times. Aligning

these timestamps is critical for analysis, especially

when dealing with time-series data. The process

involves:

4.5.1 Synchronizing Data:

All datasets are synchronized to a common

timestamp, ensuring that data points from different

sensors corresponding to the same time are grouped

together.

Resampling: In the cases where the data is

recorded at some intervals which are inconsistent, we

employ resampling techniques to create a uniform

time- series dataset. We use this to achieve desired

frequency using up-sampling and down-sampling.

Figure 5: Time Series vs CH4

Figure 6: Time series vs Multiple sensors

4.6 Storage of the data

All the preprocessed data is stored securely in

Amazon S3 bucket. It provides durability and

scalability, and enables efficient access to the data for

model training and analysis.

4.7

Feature Engineering

The feature engineering step is very necessary in

developing machine learning model. It enhances

efficiency and accuracy of the model. This process

involves:

4.8

Moving Average:

We generated moving average features for sensor

readings (e.g., temperature, humidity, and pressures)

to capture trends and smooth out short-term

fluctuations. It tells us about inconsistent and

underlying issues in long-term trends.

4.9

Lagged Values Features

These values were created to check past sensor

readings, which can enable our model to learn from

the historical data and identify some temporal

patterns with anomalies.

4.10

Seasonality Indicator

The features which indicates the seasonal patterns

(e.g., a time of day, a day of week ) were created to

check for some periodic behaviour in sensor readings.

4.11

Trend Features

The features which captured the overall trend of time-

series data which helps our models to recognize the

INCOFT 2025 - International Conference on Futuristic Technology

686

long-term increase or decrease in the sensor

measurements.

Figure 7: Correlation matrix

4.12 Distance Metrics

4.12.1 Euclidean Distance

The Euclidean distance is used in the pumps anomaly

detection between the two points x and y in n-

dimensional space is calculated as:

𝑛

𝑑(𝑥, 𝑦) = √∑(𝑥𝑖 − 𝑦𝑖)

𝑖=1

This metric is used in clustering to the group

which has similar operational parameters, such as

barometer, gauge pressure and temp readings from

chemical sensors and pumps to identify outliers in our

data, considering the correlations among the various

features. For example, while monitoring gas

emissions (CH4, CO2, O2) alongside temp and

humidity, Euclidean metric helps us to detect

anomalies by examining how far a point is from the

predicted distribution of normal data.

4.12.2 Manhattan Distance

Computes the distance between two points by

summing the absolute differences of their

coordinates. This distance is particularly used in grid-

like structures and high-dimensional space.

𝑛

𝑑(𝑥, 𝑦) = ∑ |𝑥𝑖 − 𝑦𝑖|

𝑖=1

4.13

Machine Learning Models

Now we will explore various machine learning

models in our project for anomaly detection, with a

study on the following four:

4.13.1 Isolation Forest

This is an ensemble learning method which is

specifically made for anomaly detection in

operational data. This algorithm operates by

constructing a large multitude of decision trees.

Unlike the traditional methods which profile normal

data points, Isolation Forest can isolate anomalies

based on the method that anomalies are different and

few. This process involves randomly selecting of a

feature and then again selecting a split value in

between the max and min value of that particular

feature. Anomalies tend to have shorter paths in the

tree structure as they are very less, which can make

them easier to isolate. This machine learning model is

highly efficient for high-dimensional datasets which

makes it suitable for very complex industrial

applications.

4.13.2 Local Outlier Factor (LOF)

This is a density-based anomaly detection algorithm

which we are using for detection technique which can

identify the anomalies by comparing all the local

density of a point to all of its neighbours. This

measures local density deviation of a particular data

point with respect to its neighbours. If a given point

has a significantly lower density compared to its

neighbours, it is then considered as an outlier. Local

Outlier Factor is particularly effective in identifying

local outliers in our dataset where the density is

varying significantly across different regions. This

density characteristic is very beneficial for detecting

anomalies in industrial data, where normal behaviour

can vary based on the environmental factors.

4.13.3 One-Class Support Vector Machine

(SVM)

The One-class SVM is a variation of the traditional

Support Vector Machine algorithm used for anomaly

detection tasks in data. It learns the boundaries of the

normal dataset by creating a decision function which

encapsulates the max of the data points in the dataset.

The points which fall outside the boundary are been

classified as anomalies. One-Class SVM is very

useful in the scenario where labelled anomalies are

not much frequent, as it primarily relies on the normal

data for some training. This model is very effective

Predictive Monitoring and Anomaly Detection in Industrial Systems

687

for some high-dimensional data which makes it

suitable for applications of industries with sensor

data.

4.13.4 Robust Covariance

The Robust Covariance method in machine learning

is an approach which estimates the covariance of a

data in such a manner that is less sensitive to the

outliers. Unlike the covariance estimation methods,

which can be influenced by anomalous data points,

robust covariance uses the techniques like Minimum

Covariance Determinant to provide with a more

stable estimation of the data variance and the

correlation structure. This method is very useful in

industrial stations where sensor data may be filled

with outliers, allowing for better and accurate

anomaly detection.

After training all the models and evaluating their

individual performance, we will focus on analyzing

the results and visualization using Tableau.

Visualizing the model outputs with the unstructured

data can reveal the critical findings into the

operational part and guide some maintenance

strategies.

5 RESULTS AND DISCUSSIONS

The anomalies detected are in similar number for

many of the sensors and algorithms, it just shows how

interconnected all the values of sensors are , so it’s

better to compare only 1 sensor value for our results

to show better analysis and reduce redundant

analysis.

The dataset is highly correlated which proves that

the systems are interdependent on each other causing

similar anomalies.

5.1 Isolation Forest Algorithm

5.1.1

Effectiveness of the Algorithm

This algorithm is unsupervised learning method

which isolates the anomalies by randomly selecting

features and splitting the data, and the most easily

isolated data is flagged as anomaly. This works

greatly for higher dimensionality of data where the

usual methods fail. The results prove that we can

detect unusual patterns in all the sensor readings.

5.1.2 Sensitivity Factor

The sensitivity of the model is based on parameters

like contamination factor which tells us about the

expected anomalies in the dataset or an estimate about

the anomalies. Adjusting this would either increase or

decrease the number of detected unusual patterns in

the data which in turn affect the recall and precision

of the model. A balance has been achieved by the

results in detecting actual anomalies and minimizing

the fake anomalies or the false positives.

Figure 8: Anomalies in CH4 over time

The plot shows anomalies detected in groups and

clusters especially around the steep drops or steep rise

in sensor values, isolation forest is very effective in

detecting these steep changes especially in early

September and near October as shown in figure. It

shows a strong and balance anomaly detection across

all the parameters.

5.2 LOF

Local Outlier Factor is good for detecting anomalies

and varying trends in various times where the data has

varying spread of points or collection of points at one

point or a dense distribution.

LOF focuses on measuring the density of points

which are local to identify the data which does not fit

in. This is a very effective algorithm for datasets of

higher dimension which are very common in

industrial scenarios.

Figure 9: Anomalies in CH4 using Local Outlier Factor

INCOFT 2025 - International Conference on Futuristic Technology

688

It also detects anomalies in areas with sudden

changes in trend but mostly picks up anomalies in

areas with higher density regions , where the value of

sensor changes in drastic ways as shown in figure

above.

This is a good way to catch anomalies which

isolation forest cannot detect where there are less

significant changes in values, which in turn suggests

that it is sensitive to local density variations.

5.3 Robust Covariance

This is a mathematical statistics based approach used

to get a guess of the covariance of a dataset while

minimizing the effect of the anomalies.

This is a technique useful in outlier detection

where datasets structure could be distorted by these

values. It uses gaussian distribution to detect

anomalies in effective way.

Figure 10: Anomalies in CH4 using Robust Covariance

It is usually able to detect anomalies in similar or

lesser proportion as compared to other algorithms, it

flags anomalies which are having a greater deviation.

This method is largely conservative as it does not

detect anomalies in gradual changes like LOF. The

data is giving similar anomalies for this algorithm,

which suggests that LOF and Robust Covariance are

equally effective for this dataset but with a slight edge

towards robust covariance.

5.4 One- Class SVM

One-Class SVM’s results are influenced by

parameters like kernal type and regularization of data.

With changesin these parameters, we can adjust how

sensitive the model is and because of such

adjustments it could also over-detect the number of

anomalies which you could see in gauge pressure

chart which is shown in comparative analysis of

various algorithms below.

It excels at flagging anomalies with non linear

trends making it one of the better options for this

dataset as compared to many other algorithms. But it

is much more sensitive which makes it over detect

anomalies.

Figure 11: Anomalies in CH4 using One-Class SVM

5.5 Comparative Analysis of

Algorithms

Figure 12: Comparison of anomaly detection methods

Overall Performance of Robust Covariance and

Isolation Forest were quiet similar and balanced in

terms of detection of anomalies. Robust Covariance

is preferred whenever there is a need for conservative

approach.

Local Outlier Factor excels for sensors like

Temperature and Humidity where local density was a

major factor influencing the model. It was less

effective for gauge and barometric pressures as it

might have missed anomalies with lesser densities. It

shows some high performance for Gauge pressure but

that is also a result of sensitivity , which may have

caused the model to falsely flag non anomalous data

also. But for majority of the sensors it showed poor

performance especially for the main “CH4” sensor

where other algorithms performed better.

5.6 Ensemble Learning

All the models which you could see above are merged

or their predictions are taken on whether a datapoint

is normal or an anomaly and based on the majority

decision we classify the point as anomaly.

Predictive Monitoring and Anomaly Detection in Industrial Systems

689

So if 3 of 4 models classify a point as something

which is deviating from the usual trend we have to

flag it as anomaly. This approach is very good for

removing false positives or data which is falsely

labelled as an anomaly.

Figure 13: Ensemble anomalies in CH4

Therefore it makes ensemble learning a very

reliable way to detect anomalies as it works on basis

of majority vote and is helpful in not falsely

identifying non-anomaly points.

Figure 14: Comparison between different anomaly

detection methods

6 CONCLUSIONS

We can say that this research presents a proper real-

time anomaly detection system tailored towards oil

and gas industry with real time data-streaming and

real time notifications through smtp client sent to the

officials in- charge.

By using various machine learning approaches

like isolation forest, local outlier factor, support

vector machine , robust covariance and also an

ensemble of all the models we have tried to get the

best way in which the anomalies could be detected.

Isolation Forest and Robust Covariance Factor

seemed to give the most accurate results and were

great at detecting most of the deviations in the data

which is very important in industrial system on an

individual level. Models Like Local Outlier Factor

and Support Vector Machine excelled at detecting

anomalies in high density sensor data . Ensemble of

these models just made the systems more reliable and

trust worthy as it made sure that the detected

anomalies were taken on basis of majority vote ,

thereby excluding the false positives

Implementation of these systems in real time

would just improve the outputs and reduce any

manual efforts in detection of these anomalies. These

systems would also reduce any manual errors which

may occur and also enhance the safety and boost the

production without any delays because of

maintenance related delays.

ACKNOWLEDGEMENTS

We are extremely thankful to Mrs. Shailaja Uke, Mrs.

Sangita Jaybhaye and Mrs. Archana Burujwale for

their guidance and support in this research. Their

consistent feedback and expertise in this domain has

greatly helped to give a proper direction to our work.

Additionally we are thankful to our institution for

providing us with the support we needed throughout

our research.

REFERENCES

Ahmed, M., Mahmood, A. N., & Hu, J. (2016). Techniques

for detecting network anomalies: A review of statistical

methods and machine learning approaches in industrial

settings. Journal of Network and Computer

Applications, vol. 84, pp. 23-45. DOI:

10.1016/j.jnca.2017.07.001.

Hodge, V. J., & Austin, J. (2004). Outlier detection

techniques: A survey of machine learning methods and

their applications in industrial environments. Artificial

Intelligence Review, vol. 22, no. 2, pp. 85-126. DOI:

10.1023/B0000045506.53593.b3.

Zong, B., et al. (2018). A novel model combining deep

autoencoders with Gaussian mixture models for

anomaly detection in industrial datasets.

Neurocomputing, vol. 268, pp. 92-99. DOI:

10.1016/j.neucom.2017.07.019.

Wang, F. Y., et al. (2018). Development of a real-time

anomaly detection system using machine learning

techniques for IoT applications in industrial settings.

IEEE Transactions on Industrial Informatics, vol. 14,

no. 4, pp. 1786-1795. DOI: 10.1109/TII.2017.2783939.

Yu, L., et al. (2018). Comparative effectiveness of machine

learning methods for detecting anomalies in industrial

processes. Expert Systems with Applications, vol. 95,

pp. 116-125. DOI: 10.1016/j.eswa.2017.11.003.

Munir, A., & Saeed, M. (2020). Predictive maintenance

techniques: A review of machine learning methods for

anomaly detection in industrial settings. Computers in

INCOFT 2025 - International Conference on Futuristic Technology

690

Industry, vol. 121, pp. 103241. DOI:

10.1016/j.compind.2020.103241.

Alhassan, S. A. T., et al. (2021). Challenges in applying

machine learning for anomaly detection in IoT

environments and future research directions. IEEE

Internet of Things Journal, vol. 8, no. 12, pp. 9860-

9872. DOI: 10.1109/JIOT.2021.3055323.

Choudhary, S., & Patel, P. (2021). Effectiveness of ARIMA

models for time series forecasting in chemical

processing and operational deviation detection. Journal

of Loss Prevention in the Process Industries, vol. 68,

pp. 104306. DOI: 10.1016/j.jlp.2020.104306.

Liu, Y., et al. (2021). Anomaly detection in industrial

control systems: Exploring different machine learning

approaches and their performance metrics. IEEE

Access, vol. 9pp. 32057-32066. DOI:

10.1109/ACCESS.2021.3054936.

Iglewski, J., & P. B. (2019). Framework for real-time

anomaly detection in manufacturing systems using

machine learning techniques and data integration.

Computers in Industry, vol. 106, pp. 1-13. DOI:

10.1016/j.compind.2018.11.007.

Ghafoor, K., et al. (2020). Effectiveness of deep learning

methods for detecting anomalies in industrial time

series data: A comparative analysis. Applied Sciences,

vol. 10, no. 9, pp. 3222. DOI: 10.3390/app10093222.

Zhang, Y., & Jiang, H. (2019). Methods for anomaly

detection in industrial applications: Integration of

machine learning and domain knowledge. Journal of

Process Control, vol. 78, pp. 29-39. DOI:

10.1016/j.jprocont.2019.02.005.

Li, Y., et al. (2021). Anomaly detection framework for

predictive maintenance in industrial settings utilizing

sensor data and machine learning models. Journal of

Manufacturing Systems, vol. 58, pp. 239-254. DOI:

10.1016/j.jmsy.2020.10.008.

Marjanovic, O., et al. (2020). Hybrid approach for anomaly

detection in smart manufacturing environments:

Combining rule- based and machine learning

techniques. Computers in Industry, vol. 118, pp.

103219. DOI: 10.1016/j.compind.2020.103219.

Roy, S., & Chowdhury, P. (2019). Real-time anomaly

detection system for industrial IoT applications:

Addressing data variability and noise challenges. IEEE

Transactions on Industrial Informatics, vol. 15, no. 4,

pp. 2248-2257. DOI: 10.1109/TII.2019.2934487.

Predictive Monitoring and Anomaly Detection in Industrial Systems

691