Keywords: family communication, coping, genetic cancer risk, tailored messages, natural language processing (NLP), hereditary breast and ovarian cancer (HBOC)
Project coordinator: Maria C. Katapodi
CeDA collaborators: Gerhard Lauer, Rodrigo C. G. Pena

Project objectives

  • To perform sentiment analysis on narrative data obtained from individuals with predisposition to HBOC using NLP tools. The goal of sentiment analysis will be to identify coping and family communication patterns related to genetic cancer risk.
  • To explore whether sentiment analysis can inform the development of tailored messages related to coping and to family communication about genetic cancer risk. The tailored messages will be integrated in a web-based intervention targeting families concerned with HBOC.


  • This interdisciplinary project involves investigators from multiple disciplines, including epidemiology, genetics, medicine, nursing, psychology, and sociology. Investigators in computer science, digital humanities, and advanced data analytics methods provide additional expertise.
  • In the context of the CASCADE and the DIALOGUE studies, we collect narratives in five languages (Swiss German, French, Italian, English, and Korean) using the same interview guide. The interview guide explores coping responses to genetic cancer risk and the process of communicating this risk among different family members. Narratives are available in digital form and in different formats: video, audio, and transcribed text. In addition to narratives, we also collect quantitative data using established instruments designed to explore psychological and family responses to genetic cancer risk.
  • Using NLP, we try to identify meaningful patterns in language that indicate active coping and ease of communication among mutation carriers that have to inform relatives about the genetic cancer risk in the family. We will integrate NLP findings (i.e., sentiment analysis) with quantitative data to identify participants that demonstrate distinct patterns of coping and family communication.
  • We will explore whether we can analyse these linguistic patterns to generate tailored messages that will be integrated in a family web-based platform.
  • Supervised learning methods for discovering clustering of participants based on patterns of coping and patterns of communication.