Naja Hulvej Rod (University of Copenhagen, Denmark): From associations to effects: Do Big Data hide causal structures?
Disease processes are multifactorial and highly complex, and there is a need to continue to push the research boundaries beyond single-factor data analysis and extend the research methodology framework to incorporate real-world complexities. Thus, it is important to unravel the interdependency between the multiple factors in complex systems so that manipulation of a single or a few modifiable factors can be targets of feasible interventions that eventually will alter the risk profile. Artificial intelligence methods combined with large data materials from registers, biobanks, population studies, wearables and web sources provide us with a golden opportunity to rethink traditional methods for epidemiological research.
Electronic health records and information from nationwide population registers allows us to address patterns of life time exposures and disease trajectories in an unprecedented way. It also carries hopes for improved disease prediction that will allow for early intervention or even prevention of disease onset. Artificial intelligence including various machine-learning approaches is increasingly being used in health science for such disease prediction. These methods allow for far more flexibility between exposures than classical approaches, but the key challenge is that they provide little information about the nature of the underlying causal structure, and some even refer to them as ‘black boxes’. This is particularly problematic when our ultimate goal is to identify targets for prevention or treatment in the disease process.
Thus, if we want to go beyond pure description and prediction, there is a need of bridging machine learning and programming with well-established epidemiological and statistical methods for drawing causal inference. Combining these skills will create a synergy between the flexibility and insights obtained from artificial intelligence with the rigor and theoretical foundation of the causal inference. Ultimately this synergy will help us identify points of intervention and develop more effective and targeted treatment and health interventions.