Automation of Research Master Data Management for Dataset Consistency

Authors

  • Ratna Jyothi Kommaraju

Keywords:

master-data management, FAIR principles, dataset consistency, probabilistic record linkage, active learning, research reproducibility, cloud architectures

Abstract

The article addresses the automation of master-data management in research organizations as a key prerequisite for dataset consistency and result reproducibility. The problem is pressing because of the growing volumes of heterogeneous data, the reproducibility crisis acknowledged by most biomedical researchers, and the considerable economic losses associated with manual cleansing and duplicated experiments. The study aims to justify and experimentally confirm the effectiveness of integrating FAIR principles with a multi-layer architecture for data intake, normalization, and golden record creation. The novelty is a holistic method that joins together cloud reference architectures, Data Mesh, and Landing Zone, probabilistic record linkage, graph embeddings, and active learning for dynamically adjusting confidence thresholds, thus reducing the burden imposed on experts while delivering continuous quality metrics. Automated MDM removes 37% data redundancy, reduces researchers’ time spent on cleansing to just 26%, and accelerates integration into machine-learning pipelines by close to one third; besides, it proves an actual economic effect visible already from the estimated annual cost reduction of at least EUR 10.2 billion in the EU. Some known shortcomings about the risk of wrong joins, old records, and people's pushback against using machines will guide further research into changing thresholds, fixing past data issues, and improving human-machine links. This paper is for data-management workers, bioinformaticians, research project bosses, and information-system builders.

Author Biography

  • Ratna Jyothi Kommaraju

    Data Manager, R&D Data strategy and governance, Sanofi (Contractor via Vivid Soft Global Inc),Old Tappan, New Jersey, USA

References

[1] A. Shaikh, H. Harreis, J. Machado, and K. Rowshankish, “Master data management: The key to getting more from your data,” McKinsey, May 15, 2024. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/master-data-management-the-key-to-getting-more-from-your-data (accessed Jul. 14, 2025).

[2] K. D. Cobey et al., “Biomedical researchers’ perspectives on the reproducibility of research,” PLoS Biology, vol. 22, no. 11, pp. e3002870–e3002870, Nov. 2024, doi: https://doi.org/10.1371/journal.pbio.3002870.

[3] M. Barker et al., “Introducing the FAIR Principles for research software,” Scientific Data, vol. 9, no. 622, Oct. 2022, doi: https://doi.org/10.1038/s41597-022-01710-x.

[4] “FAIR Principles,” Go Fair. https://www.go-fair.org/fair-principles/ (accessed Jul. 15, 2025).

[5] M. A. Musen, M. J. O’Connor, E. Schultes, M. Martínez-Romero, J. Hardi, and J. Graybeal, “Modeling community standards for metadata as templates makes data FAIR,” Scientific Data, vol. 9, no. 696, Nov. 2022, doi: https://doi.org/10.1038/s41597-022-01815-3.

[6] H. Koga, “FAIR Data Principles Drive Better Scientific R&D,” Dotmatics, Feb. 07, 2023. https://www.dotmatics.com/fair-data-principles-drive-better-scientific-r-and-d (accessed Aug. 10, 2025).

[7] F. A. Islas, “The Value of Data Catalogs for Data Scientists - Enterprise Knowledge,” Enterprise Knowledge, Jun. 30, 2022. https://enterprise-knowledge.com/the-value-of-data-catalogs-for-data-scientists/ (accessed Jul. 18, 2025).

[8] “Guidance for a Laboratory Data Mesh on AWS,” Amazon Web Services, Inc. https://aws.amazon.com/ru/solutions/guidance/laboratory-data-mesh-on-aws/ (accessed Jul. 19, 2025).

[9] “Cloud-scale analytics data management landing zone overview - Cloud Adoption Framework,” Microsoft Learn, Feb. 21, 2025. https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/architectures/data-management-landing-zone (accessed Jul. 20, 2025).

[10] “Record Linkage & Machine Learning,” US Census Bureau. https://www.census.gov/topics/research/stat-research/expertise/record-linkage.html (accessed Jul. 21, 2025).

[11] M. Vinodkumar and R. Surasani, “Mastering Enterprise Data: MDM Strategies, Tools, and Impacts Across U.S. Industries,” IJNRD, vol. 8, no. 12, 2023, Accessed: Jul. 22, 2025. [Online]. Available: https://www.ijnrd.org/papers/IJNRD2312451.pdf

[12] “Microsoft Purview and Profisee Master Data Management (MDM),” Microsoft Learn, Apr. 04, 2025. https://learn.microsoft.com/en-us/purview/data-governance-master-data-management-profisee (accessed Jul. 23, 2025).

Downloads

Published

2025-09-26

Issue

Section

Articles

How to Cite

Ratna Jyothi Kommaraju. (2025). Automation of Research Master Data Management for Dataset Consistency. American Scientific Research Journal for Engineering, Technology, and Sciences, 103(1), 115-124. https://asrjetsjournal.org/American_Scientific_Journal/article/view/12036