Regularized Stacked Autoencoder with Dropout-Layer to Overcome Overfitting in Numerical High-Dimensional Sparse Data

Authors

  • Abdussamad Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Hanita Daud Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Rajalingam Sokkalingam Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Iliyas Karim Khan Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Abdus Samad Azad Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Muhammad Zubair Department of Computer Sciences, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia
  • Farrukh Hassan Department of Computing and Information System, School of Engineering and Technology, Sunway University, 47500 Petaling Jaya, Selangor, Malaysia

DOI:

https://doi.org/10.37934/ard.129.1.6074

Keywords:

sparse data, deep learning, regularized stacked autoencoder

Abstract

High-dimensional sparse numerical data are normally encountered in machine learning, recommender systems, finance and medical imaging. The problem with this type of data is that it has high dimensions (many features) and highly sparse (most values are zero), which is prone to overfitting. The data visualization can be achieved through a neural network architecture called stacked autoencoders. These multilayer autoencoders are designed to reconstruct input data, but overfitting is a major problem. To overcome this problem novel L1 Regularization-dropout technique is introduced to reduce overfitting and boost stacked autoencoder performance. L1 regularization penalizes large weights, simplifying data representations whereas the dropout technique randomly turns off neurons during training and makes the model dependent only on the selected turn-on neurons. The model employs batch normalization to improve the performance of the autoencoder. The approach was implemented on a high-dimensional sparse numerical dataset in the field of cybersecurity to minimize the loss function, measured by Mean Square Error (MSE) and Mean Absolute Error (MAE). The findings were compared to the conventional stacked autoencoder. The study revealed that the suggested method effectively mitigated the issue of overfitting. Stacked autoencoders, when combined with L1 regularisation and the dropout approach, are very successful in handling high-dimensional sparse numerical data in a diverse range of applications.

Downloads

Download data is not yet available.

Author Biographies

Abdussamad, Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

abdussamad_22009779@utp.edu.my

Hanita Daud, Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

hanita_daud@utp.edu.my

Rajalingam Sokkalingam, Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

raja.sokkalingam@utp.edu.my

Iliyas Karim Khan, Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

iliyas_22008363@utp.edu.my

Abdus Samad Azad, Fundamental and Applied Science Department Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

Abdussamad.azad@cqumail.com

Muhammad Zubair, Department of Computer Sciences, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

muhammad_22000228@utp.edu.my

Farrukh Hassan, Department of Computing and Information System, School of Engineering and Technology, Sunway University, 47500 Petaling Jaya, Selangor, Malaysia

farrukhh@sunway.edu.my

Downloads

Published

2025-05-02

How to Cite

Abdussamad, A., Daud, H., Sokkalingam, R., Khan, I. K., Azad, A. S., Zubair, M., & Hassan, F. (2025). Regularized Stacked Autoencoder with Dropout-Layer to Overcome Overfitting in Numerical High-Dimensional Sparse Data. Journal of Advanced Research Design, 129(1), 60–74. https://doi.org/10.37934/ard.129.1.6074
سرور مجازی ایران Decentralized Exchange

Issue

Section

Articles
فروشگاه اینترنتی