Graduate Research Degree Awarded - Jiaxin Zhang

Congratulations to Jiaxin Zhang-  Graduate Research Degree Awarded 1st November 2024

Degree: PhD
Supervisors: Prof Margarita Moreno-Betancur, Dr S Ghazaleh Dashti, Prof John B. Carlin, Prof Katherine J. Lee
Advisory Committee: Prof Sarath Ranfanathan, A/Prof Emily Karahalios, Dr Rhys Bowden

Thesis Title: Development and evaluation of methods gfor handling missing data when conducting causal inference in epidemiology studies

Thesis Summary: 

Causal inference in observational studies is a central endeavour in modern epidemiological research. In observational studies, missing data is a common issue that presents risks for the analysis, as valid inference requires unverifiable assumptions about the missingness mechanisms and adopting appropriate missing data methods. When conducting causal inference in the presence of missing data, three key questions that arise are as follows. First, under which missingness mechanisms is the target causal estimand identifiable or recoverable from the observable data? Second, which missing data methods enable unbiased causal effect estimation across different scenarios? And third, how should one conduct sensitivity analysis to assumptions about missingness mechanisms?

In this thesis, I aimed to address these three questions, focusing on the single-point exposure and outcome settings with the average causal effect (ACE) considered as the target causal estimand. The research involved simulation studies and theoretical work. Throughout, particularly when designing simulation studies, I used as motivating example a case study from the Victorian Adolescent Health Cohort Study (VAHCS), which investigated the ACE of adolescent cannabis uses on mental health in adulthood.

Previous literature had investigated the recoverability of the ACE in simple settings, with univariable missingness or in the absence of exposure-confounder interactions, which have limited applicability in practice. As the first aim of this thesis, I investigated recoverability conditions for the ACE under more general scenarios, in particular across a range of so-called canonical missingness directed-acyclic-graphs (m-DAGs) depicting typical multivariable missingness mechanisms encountered epidemiological studies. These recoverability results serve to further the theoretical basis for framing missing data issues in the causal context.

Once recoverability has been considered, missing data should be handled appropriately in the analysis to avoid bias in the ACE estimation. Multiple imputation (MI) is a frequently used method for handling missing data and can be unbiased under the "missing at random" assumption. However, it has been shown that with multivariable missingness, this assumption is very restrictive and further it is not necessary for unbiased estimation. Therefore, key open questions include the multivariable missingness mechanisms depicted by m-DAGs under which MI can provide unbiased estimation, and in particular how MI should be implemented when the target estimand is the ACE. This thesis addressed these questions by evaluating several implementations of MI approaches in simulation studies while considering a range of multivariable missingness mechanisms and estimation methods for the ACE. The results showed that the MI approaches can be approximately unbiased for ACE estimation across a range of missingness mechanisms if the imputation model is compatible or approximately with the substantive outcome model, for example by correctly incorporating any exposure-confounder interactions.

In practice, it is more common that the true missingness mechanisms is unknown. In such circumstances, it is important to assess the sensitivity of ACE estimates to different missingness assumptions. For instance, the aforementioned simulation results showed that large bias may arise when the outcome is a cause of its own missingness, so when this is plausible in epidemiological studies it is important to conduct a sensitivity analysis. One of the appealing strategies for conducting sensitivity analysis is delta-adjustment, which previous literature has implemented in the MI framework for multivariable missingness mechanisms. However, with the default implementation of this approach, the imputation model is at risk of being incompatible with the substantive outcome model if there are exposure-confounder interactions. Therefore, this thesis proposed two novel approaches for implementing the delta-adjustment in MI that achieve compatibility. The results indicated that, compared to the default approach, both proposed approaches reduced the bias in ACE estimates that results from ignoring the compatibility when conducting sensitivity analysis.

The findings of this thesis provide practical guidance for handling missing data when conducting causal inference using epidemiological studies.

Date awarded: 1st November 2024

Jiaxin Zhang GR-Degree