SafeICU is a freely available database comprising de-identified health-related data from over 2,500 pediatric patients admitted to the pediatric intensive care unit at AIIMS (All India Institute of Medical Sciences), New Delhi, between 2015 and 2025. The AIIMS hosts one of the country’s leading pediatric intensive care units, an independent 8-bed unit that delivers advanced critical care to children with life-threatening illnesses.
The SafeICU database includes information such as demographics, bedside vital signs (recorded every 15 seconds), laboratory test results, medications, caregiver notes, microbiology, and mortality. The unit cares for a broad clinical spectrum, with the most frequent reasons for admission being severe respiratory illnesses (such as pneumonia and respiratory failure), sepsis and septic shock, hypertensive crises, congenital heart disease, and acute kidney injury.
SafeICU database supports a diverse range of analytical studies spanning epidemiology, clinical decision support optimization, and electronic tool development. It is notable for three factors: it is freely available to researchers after user agreement; it encompasses a diverse population of pediatric ICU patients; and it contains high-temporal-resolution data, including vital signs, laboratory results, and medications.
In recent years, hospitals have made significant progress in adopting digital health record systems. Between 2008 and 2014, the proportion of non-federal acute care hospitals with basic digital systems rose from 9.4% to 75.5% [1][2]. By 2021, 96% of hospitals had implemented ONC-certified EHR systems, a substantial increase from just 28% in 2011 [3].
However, despite widespread adoption, true interoperability among these systems remains a persistent challenge, hindering seamless data integration. Consequently, the full potential of hospital data to inform and enhance patient care has yet to be fully tapped. At the same time, concerns about the reproducibility of scientific research have grown, drawing increasing scrutiny from the academic community [4].
SafeICU database integrates de-identified, comprehensive clinical data of pediatric patients admitted to the AIIMS, New Delhi, and makes it widely accessible to researchers under a data use agreement (DUA). The open nature of the data allows clinical studies to be reproduced and improved in ways that would not otherwise be possible.
The SafeICU database was populated with data collected during routine hospital care, without any changes to clinical workflows. This ensured that caregivers were not burdened, and patient care proceeded uninterrupted.
Before data were incorporated into the SafeICU database, they were deidentified in accordance with the HIPAA (Health Insurance Portability and Accountability Act) using structured data cleansing and date shifting. The deidentification process for structured data required removing all identifying data elements listed in HIPAA, including fields such as patient name, doctor name, address, and dates. In particular, dates were shifted into the future by a random offset for each patient in a consistent manner to preserve intervals.
Protected health information was removed from free-text fields, such as diagnostic reports and physician notes, using a rigorously evaluated deidentification system based on extensive dictionary look-ups and pattern-matching with regular expressions. The components of this deidentification system are continually expanded as new data is acquired.
Ethical approval for the study was obtained from the Institute Ethics Committee under the reference number IEC-787/07.10.2022, RP-14/2022. The requirement for individual patient consent was waived because the project did not impact clinical care, and all protected health information was deidentified.
Data within the SafeICU database were recorded during routine clinical care and not explicitly for retrospective data analysis. The SafeICU database is a structured, relational dataset consisting of six primary tables that capture detailed, de-identified information from the PICU, linked through a SUBJECT_ID that enables reconstruction of complete patient trajectories across tables.
Database Structure
Broadly, the dataset comprises the following tables:
PATIENT. Contains patient-level demographic information, including a SUBJECT_ID, gender, age at admission, and outcome at the end of hospital stay (expiration flag).
ADMISSION. Stores encounter-level information, including admission and discharge dates, and discharge outcomes.
LAB EVENTS. Includes results of laboratory investigations such as hematology, biochemistry, and infection markers, along with measurement units and collection timestamps.
MICROBIOLOGY EVENTS. Captures results of microbiological tests, including culture findings, organism identification, and antimicrobial susceptibility.
NOTE EVENTS. Contains clinical notes written by caregivers, documenting the patient’s diagnosis and the medications administered during the PICU stay.
VITAL SIGNS. Vital-sign measurements, covering heart rate, respiratory rate, blood pressure, oxygen saturation, and temperature.
SafeICU is provided as a collection of CSV files. Because the database contains detailed information about patients' clinical care, it must be handled with appropriate care and respect. Researchers must formally request access through a process documented on the SafeICU website. Two key steps must be completed before access is granted:
- Become a credentialed user by completing the SafeICU credentialing application in the portal.
- The researcher must complete a recognized course in protecting human research participants that includes HIPAA requirements CITI Program.
- The researcher must sign a data use agreement that outlines appropriate data use and security standards and prohibits efforts to identify individual patients.
Approval requires at least a week. Once an application has been approved, the researcher will receive emails containing instructions for downloading the database from the SafeICU website.
This page lists the current version of SafeICU. All updated versions will be presented here in a sequential, reverse chronological order. Each version will address a finite set of updates which are tracked using a unique issue number, usually of the form #100, #101, etc.
This research and development was supported by the SAFE-ICU Initiative, funded by the Indian Council of Medical Research (ICMR) under grant award number A1-Ad-hoc/18/2022-A1-Cell; the Centre of Excellence in Healthcare (COEH); the Wellcome Trust/DBT India Alliance Fellowship (IA/CPHE/14/1/501504); and IIIT-Delhi, for supporting the operationalization of this work.
We sincerely thank the D5-ICU/MCB-PICU team at AIIMS, New Delhi, for their continuous support and for fostering a conducive environment for database development. We further thank the Computer Facility and IT teams at AIIMS, New Delhi, and IIIT-Delhi for their technical support throughout this work.
The authors declare no competing financial interests.
Charles, D., King, J., Patel, V. & Furukawa, M. Adoption of Electronic Health Record Systems among U.S. Non-federal Acute Care Hospitals. ONC Data Brief No. 9, 1–9 (2013).
Johnson, A., Pollard, T., & Mark, R. MIMIC-III Clinical Database (version 1.4). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/C2XW26 (2016).
DistilINFO Hospital IT. ONC Outlines Health IT Interoperability Progress in Report to Congress. (2023).
Collins, F. S. & Tabak, L. A. NIH plans to enhance reproducibility. Nature 505, 612–613 (2014).