Electronic health records (EHRs) enable machine learning for diagnosis, prognosis, and clinical decision support. However, EHR standards vary by country and hospital, making records often incompatible. This limits large-scale and cross-clinical machine learning. To address this heterogeneity, a metadata repository cataloguing available data elements, their value domains, and their compatibility is an essential tool. This allows researchers to leverage relevant data for tasks such as identifying undiagnosed rare disease patients.
Within the Screen4Care project, we developed S4CMDR, an open-source metadata repository following the ISO 11179-3 standard, based on a middle-out metadata standardisation approach. It automates cataloguing to reduce errors and enables the discovery of compatible feature sets across data registries. S4CMDR supports on-premise Linux deployment and cloud hosting, with state-of-the-art user authentication and an accessible interface. We invite clinical data holders to populate S4CMDR using their metadata to validate it and support further development.
For documentation and source code visit: https://gitlab.sdu.dk/screen4care/Metadata_repository