Crowdsourcing human-centric digital health breakthroughs

The Internet is just amazing. In addition to funny cat videos and an infinite supply of reaction GIFs, for us curious scientists it offers a magical opportunity to reach people – a lot of them – at a scale that is difficult to comprehend. And scale is good when you are building new digital health applications.
Crowd crossing the street.
Photo: Brian Merrill, Pixabay

The Crowd Computing research group in the Faculty of Information Technology and Electrical Engineering, University of Oulu, specialises in human-centric digital health: data-driven science in the intersection of computers and the end user, in the field of Human Computer Interaction. We operate on three different areas:

  1. personal data management,
  2. creating new data capture tools, and
  3. contributing new data-driven yet user-friendly applications in healthcare.

In 2024, humans now generate more data than ever. The devices we carry around with us know every move we make, and there is a digital footprint for every action we take. When it comes to building new services based on healthcare data, or conducting related research, data storage and capture become critically important challenges. Consensual access to large crowds of humans and their valuable data is not always easy. To this end, over time our group has become expert in leveraging the many existing crowdsourcing marketplaces online. These networks of human labour were born to feed the hungry mouths of the AI and Machine Learning industry. After all, someone has to label the training data through millions and millions of parallelised microtasks to fuel all the headline-friendly and shiny AI applications out there (including the chatty one).

In an ironic twist, now that AI and different machine learning approaches are becoming easily available, these innovations are making the labour of the humans who helped create them unnecessary. Several online crowdsourcing platforms have been quick to deploy machine learning and large language models to cater for their task-hungry clients, and in the process of doing so to decrease the need for human annotators. Further, many of the workers themselves are using e.g. ChatGPT when answering legitimate questions from requesters, leading to lower quality or invalid data being funneled through the markets.

For us, it makes sense to always look at the bright side of new developments. We see this as a prime opportunity for digital health research, by using the already existing networks of sometimes even millions of people conveniently accessible through a single programmable interface. Using these platforms, we can deploy our digital health data harvesters in new seas of valid and valuable data for new healthcare applications. And further considering the challenges faced by many underpaid and undervalued workers on these platforms, we also see a chance to offer better, well-compensated work. Rather than substituting AI for human roles, our goal is to provide more meaningful opportunities for people.

Keeping humanity at core

So, what can we do with these platforms?

The convenient access to large pools of research participants enables rapidly experimenting with various types of research prototypes in the field of digital health, with authentic end-users. In the past, and precisely from the end-users’ perspective, we have for example looked into the ethics and monetary value of personal healthcare data. In this line of work, we used the online platforms to recruit participants to donate and discuss their own health data through lenses such as ethics and values. This culminated in a series of peer-reviewed publications and a PhD thesis “User perceptions of personal data in healthcare: ethics, reuse, and valuation” in 2023.

Yet, there are cases where the existing networks simply do not cut it.

One case where we got to exercise a fair deal of imagination was our recently published international collaboration on what is one of the planet’s largest crowdsourced studies on Parkinson’s Disease (PD). In the project, we extended our existing tools for end-user friendly ways to harvest health-related data but quickly discovered it was not possible to just tap into existing networks. So, we partnered with many public sector and non-profit PD organisations that serve both patients and caregivers.

Crowdsourcing does not mean you have to limit your options to existing marketplaces, but you can indeed build your own solutions and find the crowds elsewhere! The “elsewhere” can be literally anywhere. Partnering up is always the wise thing to do in the end, but does come with its own challenges. In our case, communicating with PD foundations across different continents was loaded with different types of human communication related challenges. In the end, perhaps the greatest and most humbling lesson here was that sometimes there is no dodging the hard work, and that we need to be humble and patient when dealing with domain experts. We welcome the reader to learn more in the recently published article.

Navigating the next chapter in digital health

Working on better ways to capture the full lived experience of people who have first-hand experience on given health conditions is our number one priority.

To that end, developing new ways to capture objective and quantitative data with tools such as on board sensors of smartphones is critical to understand humans. But it is also increasingly fascinating to make sense of big subjective data, qualitative data collected through questionnaires, audio transcripts or from existing online sources of natural discussions.

New data means new challenges. When we have large datasets of spoken or written messy data from humans, full of subtleties and sometimes irrelevant information as well, its analysis is as difficult as it is rewarding. To this end, we see potential in using Large Language Models as assistive methodological advances in the scientific process itself. It is not that difficult to imagine bootstrapping small local language models that specialise in understanding everything about one specific health condition and is able to act as an AI agent that can help both researchers and the patients in their challenges. These are the future tools that can also help with labour intensive tasks such as qualitative data analysis. But if so, to what degree should we outsource our work to the agents? Because when we stop doing the hard work, we will inevitably also learn less.

Ultimately, the potential of these new AI approaches in human-centric digital health is tremendous. Imagine a chatbot that can, in real time, talk with you using the personality you need at that point, and according to your own current mood. Or from a more scientific standpoint, imagine a process that can make sense of thousands of pages of open-ended interview data, to find the most relevant answers to any arbitrary question you have in mind. We are at a turning point in time, and much of our old ways to work or how to report scientific findings are changing fast.

Time will tell but there is no telling the future. Some years ago, I used to think that our age was boring. Seriously, I thought there would be nothing interesting happening any time soon. But now I think this is all happening almost too fast and it is hard to keep up. In many ways, I miss the boring days, as even the speed of change is just accelerating!

Maybe the best bet is just to lean boldly forward and help build the future instead of just witnessing it.


Simo Hosio
Associate Professor
Ubiquitous Computing
University of Oulu

Simo Hosio leads the Crowd Computing research group in the Center of Ubiquitous Computing, Faculty of Information Technology and Electrical Engineering. His work focuses on Human-Computer Interaction and digital health. Currently, Dr. Hosio is a visitor at the University of Tokyo in Japan. He was recently granted the Research Council of Finland Award for exceptional scientific audacity.