N8 CIR Internships: Exploring the Potential of Data Science in a Maternal Mental Health Context

This summer, we were fortunate to welcome two interns into our lab to explore how data science and machine learning might support better clinical decision-making in maternal mental health. Together, they tackled a challenging and sensitive topic using synthetic data, exploring risk factors for postnatal depression and safeguarding concerns, and developed tools that could help inform health services in the North East and North Cumbria (NENC).

We spoke to them about their work, their learning, and the wider potential of responsible health data science.

Zain had just completed a Computer Science degree at Newcastle University when he joined the internship. His final year dissertation focused on explainability in machine learning for clinical treatment recommendations, so this opportunity felt like a natural next step.

“I’m really interested in the intersection between data science and social impact,” he said. “I wanted to be part of something where the research connected with people and had the potential to support real-world decision-making.”

Daniela had just completed her second year at Newcastle University when she joined the internship. Over the course of her degree, she has developed a growing interest in the collaborative and analytical side of computer science, particularly in using computing methods and machine learning to extract and interpret data. The opportunity to apply these skills to a socially important issue like maternal mental health immediately appealed to her. “Maternal mental health isn’t something I’d had the chance to explore before,” she explained, “but it’s a vital area of care, and it affects more than just one person. It has consequences for the whole family.”

The Project

The internship focused on understanding the factors that might contribute to poor maternal mental health outcomes during and after pregnancy by identifying groups of individuals with similar progressions of symptoms over time and mapping their geographic distribution. The team also explored potential predictive interventions or pathways for recovery, alongside assessing what kinds of machine learning models might help identify risk earlier.

The team worked with a synthetic dataset that included various demographic, health, and service-related features, simulating the population of the NENC region. This included a wide range of cohorts such as demographics, psychological scores and biological factors, medical history, maternity data, inflammatory and immune markers, vital signs, sleep and lifestyle information, healthcare provider details, interventions, and infant-related factors.

The project had three main strands:

Predicting postnatal depression using a binary version of the Edinburgh Postnatal Depression Score (EPDS), applying classification models like neural networks, and ensemble methods.
Exploring safeguarding risks, to identify situations where additional support might be needed during the perinatal period. The interns developed a set of rules to combine different factors from the data, and estimate how these might relate to the likelihood of extra care being required.
Investigating social, geographic and economic factors, including the role of rurality, coastal location, deprivation, and voluntary sector support in relation to mental health outcomes
To bring this together, the interns created a prototype “patient journey” web application. This interactive dashboard shows a summary of each case, key health indicators over time, predicted outcomes, and possible next steps.

Data Science Approaches

Zain focused on neural network modelling and statistical evaluation techniques. Beyond this, Zain also developed backend systems to handle the data and carried out a focused coastal analysis to explore possible location-based differences.

“I used four different neural networks with increasing complexity, comparing their performance using cross-validation and positive predictive value - which is something clinicians tend to focus on when thinking about risk,” he explained. He also applied SHAP (Shapley Additive Explanations) analysis to understand which features were most influential in predictions. Sleep deprivation, PHQ-9 scores, and some social factors ranked highly.

Daniella focused on supervised learning models, developing the logic behind them, integrating different components of the system, and working on the front-end deployment. She used random forest classifiers to identify the most important features from a larger set and tested models, including XGBoost and support vector machines. She also analysed how different socioeconomic profiles (including housing instability, financial stress, education level and employment status) might relate to increased risk of poor mental health outcomes.

Mapping the data revealed some patterns, although these were limited by the synthetic nature of the dataset.

“I also looked at whether voluntary sector organisations had any impact,” she said. “There wasn’t a strong correlation in this data, but I’d be really interested to explore that further with real-world data, including qualitative data. We know that local support does make a difference, but it’s hard to simulate that convincingly.”

Challenges

One of the biggest challenges the team faced was working with synthetic data. While it enabled them to work without the restrictions of real-world patient confidentiality, it also made it harder to find meaningful patterns.

“There were times where the results didn’t reflect what you’d expect based on the literature or regional insight,” said Zain. “It reminded me that data can only tell you so much, especially when it’s been generated to meet a specification, rather than drawn from real lived experiences.”

While the project was primarily a methodological exploration - using synthetic data to test how machine learning could support risk prediction in healthcare - both interns remained aware of the ethical considerations involved in working with sensitive topics. They were clear that their work was exploratory, and that clinical decisions should never be based on model outputs alone.

“Machine learning can be a helpful tool,” said Daniela, “but it’s not a replacement for professional judgment. It’s about supporting conversations and making patterns easier to see, not making decisions for people.”

What next?

Both interns are continuing their studies, with Zain starting an MSc in AI at Edinburgh University and Daniela completing her final year here at Newcastle University. When asked what advice they’d give to others interested in this kind of work, both emphasised the value of interdisciplinary learning.

“I learned a lot about how to work in a professional, collaborative environment. Everyone was generous with their time and open to our ideas, which made a big difference.” Zain

Daniela highlighted “We had people bringing in different perspectives - from data science and software development to public health and ethics. That mix of expertise helped us make better decisions and kept the work grounded in both the technical and human sides of the problem. It can be easy to forget that every row of data represents a person and their health journey if you don’t take a step back now and then”.

Zain added: “If you’re curious about lots of things and want your work to have impact, this kind of interdisciplinary, human-centred approach is perfect. It’s not always easy, but it’s incredibly rewarding.”

This internship has been a valuable opportunity to explore how data science can be applied responsibly to complex and sensitive health challenges and to support early-career researchers in developing the skills, confidence and insight needed to do that well. We’re hugely grateful to Zain and Daniela for their thoughtful, dedicated contributions over the summer, and to our resident Data Scientist, Oladayo Owoeye, for his excellent supervision, technical guidance, and encouragement throughout.

“Having Oladayo as a mentor pushed me to do my best, and I’m grateful he believed in my ability to take on challenging work. I’m very thankful for the chance to collaborate with the team and to have had such a positive, supportive experience.” Daniela

This placement was supported by the N8 Centre of Excellence in Computationally Intensive Research (N8 CIR), as part of their wider internship scheme designed to give students across the N8 universities hands-on experience working on real-world, high-impact research projects. We’re thankful to N8 CIR for enabling this important work and helping to build the next generation of data science talent in the North of England.