Much has been written about why electronic health (eHealth) initiatives fail. Less attention has been paid to why evaluations of such initiatives fail to deliver the insights expected of them. PLoS Medicine has published three papers offering a “robust” and “scientific” approach to eHealth evaluation.
One recommended systematically addressing each part of a “chain of reasoning”, at the centre of which was the program’s goals. Another proposed a quasi-experimental step-wedge design, in which late adopters of eHealth innovations serve as controls for early adopters.
Interestingly, the authors of the empirical study flagged by these authors as an exemplary illustration of the step-wedge design subsequently abandoned it in favour of a largely qualitative case study because they found it impossible to establish anything approaching a controlled experiment in the study’s complex, dynamic, and heavily politicised context.
The approach to evaluation presented in the previous PLoS Medicine series rests on a set of assumptions that philosophers of science call “positivist”: that there is an external reality that can be objectively measured; that phenomena such as “project goals”, “outcomes”, and “formative feedback” can be precisely and unambiguously defined; that facts and values are clearly distinguishable; and that generalisable statements about the relationship between input and output variables are possible.
Alternative approaches to eHealth evaluation are based on very different philosophical assumptions. For example,
- “interpretivist” approaches assume a socially constructed reality (i.e., people perceive issues in different ways and assign different values and significance to facts)—hence, reality is never objectively or unproblematically knowable—and that the identity and values of the researcher are inevitably implicated in the research process.
- “critical” approaches assume that critical questioning can generate insights about power relationships and interests and that one purpose of evaluation is to ask such questions on behalf of less powerful and potentially vulnerable groups (such as patients).
10 Alternative Guiding Principles for eHealth Evaluation
Lilford et al. identify four “tricky questions” in eHealth evaluation (qualitative or quantitative?; patient or system?; formative or summative?; internal or external?) and resolve these by recommending mixed-method, patient-and-system studies in which internal evaluations (undertaken by practitioners and policymakers) are formative and external ones (undertaken by “impartial” researchers) are summative.
In our view, the tricky questions are more philosophical and political than methodological and procedural.
We offer below an alternative (and at this stage, provisional) set of principles, initially developed to guide our evaluation of the SCR program, which we invite others to critique, test, and refine.
These principles are deliberately presented in a somewhat abstracted and generalised way, since they will need to be applied flexibly with attention to the particularities and contingencies of different contexts and settings. Each principle will be more or less relevant to a particular project, and their relative importance will differ in different evaluations.
- Think about your own role in the evaluation. Try to strike a balance between critical distance on the one hand and immersion and engagement on the other. Ask questions such as What am I investigating—and on whose behalf? How do I balance my obligations to the various institutions and individuals involved? Who owns the data I collect?
- Put in place a governance process (including a broad-based advisory group with an independent chair) that formally recognises that there are multiple stakeholders and that power is unevenly distributed between them. Map out everyone’s expectations of the program and the evaluation. Be clear that simply because a sponsor pays for an evaluation it does not have special claim on its services or exemption from its focus.
- Provide the interpersonal and analytic space for effective dialogue (e.g., by offering to feed back anonymised data from one group of stakeholders to another). Conversation and debate is not simply a means to an end, it can be an end in itself. Learning happens more through the processes of evaluation than from the final product of an evaluation report.
- Take an emergent approach. An evaluation cannot be designed at the outset and pursued relentlessly to its conclusions; it must grow and adapt in response to findings and practical issues which arise in fieldwork. Build theory from emerging data, not the other way round (for example, instead of seeking to test a predefined “causal chain of reasoning”, explore such links by observing social practices).
- Consider the dynamic macro-level context (economic, political, demographic, technological) in which the eHealth innovation is being introduced. Your stakeholder map and challenges of putting together your advisory group should form part of this dataset.
- Consider the different meso-level contexts (e.g., organisations, professional groups, networks), how action plays out in these settings (e.g., in terms of culture, strategic decisions, expectations of staff, incentives, rewards) and how this changes over time. Include reflections on the research process (e.g., gaining access) in this dataset.
- Consider the individuals (e.g., clinicians, managers, service users) through whom the eHealth innovation(s) will be adopted, deployed, and used. Explore their backgrounds, identities and capabilities; what the technology means to them and what they think will happen if and when they use it.
- Consider the eHealth technologies, the expectations and constraints inscribed in them (e.g., access controls, decision models) and how they “work” or not in particular conditions of use. Expose conflicts and ambiguities (e.g., between professional codes of practice and the behaviours expected by technologies).
- Use narrative as an analytic tool and to synthesise findings. Analyse a sample of small-scale incidents in detail to unpack the complex ways in which macro- and meso-level influences impact on technology use at the front line. When writing up the case study, the story form will allow you to engage with the messiness and unpredictability of the program; make sense of complex interlocking events; treat conflicting findings (e.g., between the accounts of top management and staff) as higher-order data; and open up space for further interpretation and deliberation.
- Consider critical events in relation to the evaluation itself. Document systematically stakeholders’ efforts to re-draw the boundaries of the evaluation, influence the methods, contest the findings, amend the language, modify the conclusions, and delay or suppress publication.
Adapted from Why Do Evaluations of eHealth Programs Fail? An Alternative Set of Guiding Principles by Trisha Greenhalgh and Jill Russell
Let me state at the outset that there are many good reasons to reject positivism in the evaluation of a particular program. It may be that the evaluation team determines early on that there is insufficient consensus about the “project goals”, or, even if the goals are known, nobody can seem to agree on a set of observable [i.e. measurable] “outcomes” which would allow one to accurately assess whether or not the project achieved those goals. In addition, while there might be consensus on the first two issues, the inability to experimentally manipulate exposure to the program may introduce too much uncertainty about the “chain of causal reasoning” such that it is impossible to isolate the causal effect of the program against the background variation in the outcomes. However, if you are going to abandon positivism in your evaluation of a program, I have a few ground rules.
1) Whatever the challenges, a positivist frame work should be the default approach in program evaluation, and every effort should be made to achieve consensus on the three issues discussed above before a decision is made to abandon it. This can be a very time-consuming process and it almost always involves a lot of different areas of expertise.
2) If you decide to abandon positivism, this decision needs to be acknowledge prominently in the report. “We tried to evaluate the program, but we failed to achieve consensus on X, so we are going to try to learn more about X by launching another sort of (e.g. exploratory) evaluation.
3) You are not allowed to return to a positivist approach after a decision was made to abandon it. A consensus about the chain of causal reasoning or the measurement of outcomes can NOT be influenced by the data (qualitative or quantitative) collected for the purposes of evaluating the program. It is fine to export the knowledge gained to a future evaluation of the same program, but once you have left the Positivist Ranch, there is no going back.
4) If the evaluation begins under a positivist framework, but the consensus around the project goals, outcomes, and chain of causal reasoning evaporates during the course of the study, that is too bad. The results of the initially planned investigation must be published. The field needs to know about how fragile the consensus actually was, and be fairly warned next time a similar program is evaluated.
5) Finally, p-values can only be reported when an evaluation is “designed at the outset” and “pursued relentlessly to its conclusions”. P-values do not provide good summaries of the risk of mistaken inference under an “interpretivist” or “exploratory” analytical framework.
Thanks!
Sam