Cost-aware Machine Learning and Deep Learning for Extremely Imbalanced Data

Cost-aware Machine Learning and Deep Learning for Extremely Imbalanced Data
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : OCLC:1395472641
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Cost-aware Machine Learning and Deep Learning for Extremely Imbalanced Data by : Jishan Ahmed

Download or read book Cost-aware Machine Learning and Deep Learning for Extremely Imbalanced Data written by Jishan Ahmed and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: Many real-world datasets, such as those used for failure and anomaly detection, are severely imbalanced, with a relatively small number of failed instances compared to the number of normal instances. This imbalance often results in bias towards the majority class during learning, making mitigation a serious challenge. To address these issues, this dissertation leverages the Backblaze HDD data and makes several contributions to hard drive failure prediction. It begins with an evaluation of the current state of the art techniques, and the identification of any existing shortcomings. Multiple facets of machine learning (ML) and deep learning (DL) approaches to address these challenges are explored. The synthetic minority over-sampling technique (SMOTE) is investigated by evaluating its performance with different distance metrics and nearest neighbor search algorithms, and a novel approach that integrates SMOTE with Gaussian mixture models (GMM), called GMM SMOTE, is proposed to address various issues. Subsequently, a comprehensive analysis of different cost-aware ML techniques applied to disk failure prediction is provided, emphasizing the challenges in current implementations. The research also expands to create explore a variety of cost-aware DL models, from 1D convolutional neural networks (CNN) and long short-term memory (LSTM) models to a hybrid model combining 1D CNN and bidirectional LSTM (BLSTM) approaches to utilize the sequential nature of hard drive sensor data. A modified focal loss function is introduced to address the class imbalance issue prevalent in the hard drive dataset. The performance of DL models is compared to traditional ML algorithms, such as random forest (RF) and logistic regression (LR), demonstrating superior results, suggesting the potential effectiveness of the proposed focal loss function. In addition to these efforts, this dissertation aims to provide a comprehensive understanding of hard drive longevity and the critical factors contributing to their eventual failure through survival analysis. It employs survival analysis to enhance sampling effectiveness, preferentially including observations associated with higher hazards. Techniques like permutation feature importance, Shapley values, and Cox regression are used to identify the key factors influencing drive failure. This work also lays the groundwork for future research on efficient strategies for handling imbalanced data and predictive maintenance in big data framework.


Cost-aware Machine Learning and Deep Learning for Extremely Imbalanced Data Related Books

Cost-aware Machine Learning and Deep Learning for Extremely Imbalanced Data
Language: en
Pages: 0
Authors: Jishan Ahmed
Categories: Deep learning (Machine learning)
Type: BOOK - Published: 2023 - Publisher:

DOWNLOAD EBOOK

Many real-world datasets, such as those used for failure and anomaly detection, are severely imbalanced, with a relatively small number of failed instances comp
Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance
Language: en
Pages: 309
Authors: Rana, Dipti P.
Categories: Computers
Type: BOOK - Published: 2021-06-04 - Publisher: IGI Global

DOWNLOAD EBOOK

Over the last two decades, researchers are looking at imbalanced data learning as a prominent research area. Many critical real-world application areas like fin
Learning from Imbalanced Data Sets
Language: en
Pages: 377
Authors: Alberto Fernández
Categories: Computers
Type: BOOK - Published: 2018-10-22 - Publisher: Springer

DOWNLOAD EBOOK

This book provides a general and comprehensible overview of imbalanced learning. It contains a formal description of a problem, and focuses on its main features
Imbalanced Learning
Language: en
Pages: 222
Authors: Haibo He
Categories: Technology & Engineering
Type: BOOK - Published: 2013-06-07 - Publisher: John Wiley & Sons

DOWNLOAD EBOOK

The first book of its kind to review the current status and future direction of the exciting new branch of machine learning/data mining called imbalanced learni
Imbalanced Classification with Python
Language: en
Pages: 463
Authors: Jason Brownlee
Categories: Computers
Type: BOOK - Published: 2020-01-14 - Publisher: Machine Learning Mastery

DOWNLOAD EBOOK

Imbalanced classification are those classification tasks where the distribution of examples across the classes is not equal. Cut through the equations, Greek le