Article

PAACDA: COMPREHENSIVE DATA CORRUPTION DETECTION ALGORITHM

Author : M.Shrijani, BP Gitika Krishna, Dr.S.Jayanth

With technological advancements, data and its analysis have evolved beyond simple values and attributes scattered across spreadsheets, becoming a powerful catalyst for transformation across numerous fields. However, data corruption, often stemming from unethical or illegal activities, has emerged as a serious challenge, highlighting the urgent need for effective methods to detect and clearly identify corrupted data within datasets. Identifying and recovering corrupted data is a complex task that demands significant attention, as overlooking it during early stages can lead to major complications in subsequent machine learning or deep learning processes. In this work, we introduce PAACDA (Proximity-based Adamic Adar Corruption Detection Algorithm) and present consolidated results with a specific emphasis on detecting corrupted data rather than merely identifying outliers. Existing state-of-the-art models like Isolation Forest and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) depend heavily on meticulous parameter tuning to achieve high accuracy and recall; nevertheless, they still exhibit considerable uncertainty when it comes to handling corrupted data. The present study focuses on addressing niche performance limitations of various unsupervised learning algorithms when applied to both linear and clustered corrupted datasets.


Full Text Attachment
//