Implementing Data Quality Assurance Frameworks in Distributed Data Engineering Workflows

Authors:
Reddy Srikanth Madhuranthakam

Addresses:
Department of AI DevSecOps-FAMC, Citizens Bank, Texas, United States of America.

Abstract:

Modern data-driven organizations rely on distributed data engineering workflows to integrate and process large data sets across different platforms without any interruptions. Nonetheless, maintaining the quality of data in such complicated situations continues to be a major obstacle. This paper presents a complete framework for data quality assurance (DQA) that is specifically designed for processes in distributed data engineering. The framework includes automatic validation, consistency checks, anomaly detection, and metadata management. It is intended to reduce data quality problems at every step of the workflow, including intake, transformation, and storage. Organizations may improve the precision of their decision-making, decrease operational risks, and increase the dependability of their downstream analytics by applying this approach. Our research shows that incorporating DQA principles into distributed workflows greatly enhances data quality metrics, offering a strong and scalable answer to modern data issues. Finally, with GDPR, HIPAA, and data governance becoming major issues, research into how DQA frameworks align will boost their relevance and implementation. These advancements will make DQA frameworks resilient, scalable, and responsive to data-driven organizations’ growing complexity.

Keywords: Data Quality Assurance; Distributed Workflows; Metadata Management; Anomaly Detection; Data Validation; Transformation and Storage; Reduce Operational Risks; Robust Solution; Significant Challenge.

Received on: 24/06/2024, Revised on: 07/09/2024, Accepted on: 03/11/2024, Published on: 14/12/2024

AVE Trends in Intelligent Computing Systems, 2024 Vol. 1 No. 4, Pages: 241-251

  • Views : 210
  • Downloads : 8
Download PDF