Our Charity Evaluation Framework

Why we moved to in-house charity evaluation

The Life You Can Save previously identified its recommended charities by reviewing and aggregating research from other charity evaluators under the guidance of a volunteer panel of experts. Differences in evaluators’ methods and criteria occasionally led to inconsistencies and confusion around why a given charity was included on our list while equally well-regarded charities were left off. Relying on volunteer expertise also limited our ability to consistently add more great charities to our list and provide tailored recommendations to donors.

This document presents an adaptive charity evaluation framework that will be applied to charities working with a wide range of causes and approaches to creating impact. The framework will help us identify more great charities that are working to reduce the burden of poverty for hundreds of millions of people, and offer our supporters greater choice around how and where to give. By expanding the set of choices we offer, we intend to bring in new donors who are passionate about particular causes so they can give better within that cause area. By introducing new donors to the logic of effective giving, we also aim to spread the message of effective altruism to a wider audience.

The framework is based on three core principles of The Life You Can Save: 

(i) We focus on problems in the space of global poverty that affect people at scale, that can be solved, and that receive insufficient funding; 

(ii) We rely on scientific evidence using methods and metrics appropriate to each problem to identify the most impactful solutions;

(iii) We aim to direct the most money possible to high-quality, cost-effective organizations working on these solutions

 

1. How we find the most important causes

To maximize the impact of each dollar, we need to identify cost-effective solutions, and to move money to these solutions. We focus on the poorest countries because it is typically much cheaper to solve basic deprivations (e.g. access to preventative healthcare), which means that the most cost-effective solutions tend to exist in the poorest countries—with some notable exceptions such as climate change, where changing rich country policies has the biggest impact.1

Myriad problems create and sustain the conditions that give rise to global poverty, ranging from access to basic health and economic opportunity, to large-scale deprivations caused by conflict, natural disaster, and climate change. To select from within these problems, or cause areas, we prioritize three criteria: scale, solvability, and neglectedness.2 

  1. Scale is the notion that solving this problem would be highly beneficial (i.e., the amount of good done by mitigating a given percent of the problem). This ensures that we consider the most important problems, including those that affect a vast number of people (e.g. infectious disease or climate disaster) as well as those that cause extreme suffering (e.g. obstetric fistula).
  2. Solvability means that additional resources would go a long way towards solving the problem (i.e., the percent of the problem that a given amount of resources can mitigate). This keeps our focus on problems that can in principle be solved using existing interventions that are backed by rigorous evidence.
  3. Neglectedness, in that our contribution will add substantially to the current resources available to solve the problem (i.e., by providing a given number of dollars or person-hours). This keeps our focus on problems that get insufficient attention, so that our contribution has a high impact and doesn’t crowd out other efforts.

In addition, we value two other criteria that come from our core principles.

  • Unlocking money. A lot of philanthropic money is not cause-neutral—for example, many donors have logistical constraints or strong affiliations to particular causes, geographies, or target populations.3 Depending on the spread of returns to investment within a given cause or geography, moving money from a low to high return intervention could be very cost-effective in terms of impact per dollar.4 We also want to introduce new donors to the logic of effective giving, since that can lead them and others in their network to move toward better giving strategies over time. 
  • Focusing on recipients. We want to keep sight of the expressed needs, preferences, and values of the ultimate recipients of our philanthropic efforts, i.e. the global poor. This means prioritizing causes that the poor identify as important for them, such as schooling, financial education, and jobs.5

These criteria lead us to consider two broad kinds of causes:

  • Causes where the best evidence exists for impactful and cost-effective interventions. This leads us mainly to the global health space, where exhaustive research by GiveWell and others has identified many cheap health and nutrition interventions (malaria control, vitamin A supplementation, vaccine incentives, etc.) with measurable and demonstrably large benefits.
  • Causes that profoundly affect the lives of the poor, but get less attention from the effective giving community. Often this is because impact is hard to measure or attribute, for example in climate change where there is a lot of uncertainty about the potential impact of any individual policy change, or education where the main benefits can be realized years, often decades after treatment. 

In the second type of cause, we look for general consensus among (i) experts that a given problem will drastically affect the extreme poor (e.g. climate change), or (ii) aid recipients confirm that a given cause is essential to them (e.g. education). Because of the measurability issues, we may not have the same level of confidence about the ‘solvability’ of these problems, but they fulfill the other two criteria: they feature prominently in a holistic response to poverty, have a deep bench of committed donors, and receive less attention than they could.

 

2. How we group causes and measure impact

To find the most impactful solutions we first need a common metric on which to compare impact. However, metrics do not easily translate across cause areas: for example, the impact of preventing an infant’s death is hard to compare to the impact of restoring an elderly person’s vision, or increasing a household’s income, or reducing greenhouse gas emissions. Acknowledging this limitation, we consider a few core outcomes that loosely correspond to different cause areas: lives saved, life-years added, income gained, and carbon removed.

  1. Lives saved: the decrease in preventable deaths, for example due to lower infectious disease incidence or safer reproductive practices. Most charities that work on maternal and child health fall into this category, for example, as do charities that work in parts of the humanitarian sector.
  2. Life-years added: the additional quality years of life added, as measured by disability-adjusted life years averted (DALYs). The DALY is a composite unit of health that combines the duration and quality of life, where quality is measured in terms of the loss of functioning, or disability, caused by a given health condition.7 Averting a DALY can be thought of as adding a fully functional year to someone’s life, and is most relevant to charities that actively work to reduce different types of disability, for example through improved nutrition or corrective surgery.
  3. Income gained: the increase in money available to an individual over the course of their lifetime. This is particularly relevant to charities seeking to improve economic opportunity, or to health interventions like deworming that let kids attend more school leading to better education, more jobs, and higher income. 
  4. Carbon removed: the reduction in emissions, in tons of carbon dioxide-equivalent (CO2e) greenhouse gases removed from the atmosphere. This is the standard way to measure decarbonization, which is a core concern of climate change charities.

While it is possible to try to convert across these outcomes (e.g. we could think of a life saved as equal to a lifetime’s worth of DALYs averted), we maintain them separately for a few reasons. 

  • First, while nearly all charities want to save and improve lives, what this means varies substantially across cause areas and interventions. Not all benefits resolve easily into one or two outcomes (e.g. lives saved and income gained)—for example, ending disability can dramatically improve mental health and subjective wellbeing.8 Maintaining additional outcomes allows us to value some of these benefits explicitly.9
  • Second, for the most part there is little data and no consensus on how to value each of these outcomes in terms of the others. Even if there were an effective exchange rate in terms of the monetary value of the average life saved, the uncertainty around this average can rise exponentially with each conversion, leading to increasing amounts of guesswork. 

 

3. How we use evidence to find the most impactful solutions

When looking for the best solutions we turn to the scientific literature to look for evidence on the impact of different interventions. The key to understanding the impact of an intervention is to understand the counterfactual, i.e. what would have happened in its absence. The ‘gold standard’ here is the randomized controlled trial (RCT), which assigns a representative sample of potential recipients—i.e., individuals or communities that would qualify to receive the treatment—randomly to ‘treatment’ and ‘control’ groups, and then offers the intervention only to the treatment group. If there are a large number of recipients in each group, comparing outcomes across both groups gives a precise, attributable measure of the impact of the intervention. If an RCT is not possible, there are other quasi-experimental approaches that can come close to providing this kind of measure.

But the quality and type of evidence available varies substantially depending on the cause area, the outcome that needs to change, and the nature of the intervention itself—for example, many of our recommended charities work closely with the government and the private sector to create, bring to market, or scale promising interventions in the countries where they operate, all of which are harder to track than simple service delivery. Some methodologies are more suited to particular cause areas and strategies, some outcomes are harder to measure than others, and some interventions are much more studied. This leads to a few issues:

First, while in some cases these interventions will have been created and tested by an organization we’re considering (e.g. Evidence Action’s safe water dispensers), in most cases we will have to rely on a body of research that may or may not involve the charities we consider. We have to then consider:

  • External validity: can the results of findings from one context (geography, time period, implementer) be applied to another? For example, the impact of cash transfers in Kenya in 2015 might differ substantially from their impact in Liberia in 2022. Such external validity is highest if the charity under review has implemented the intervention itself, if the intervention was implemented in a similar context, and if there are a high number of studies of the intervention, as summarized in systematic reviews and meta-analyses.
  • Internal validity: can the impact estimates be causally attributed to the charity? In the absence of RCTs, this is harder to resolve. For example, it is easy to measure the impact of fistula surgery in a population where no one has access to fistula surgery—effectively, the entire population has a baseline of zero surgeries and the intervention’s impact is equivalent to the number of surgeries provided. However, if there are multiple providers, a charity could crowd out (or crowd in) others from intervening. Internal validity can be improved by  high-quality external evaluations that include clear metrics of implementation and impact, a rigorous attempt at identifying causality, and some assessment of the internal and external validity of the findings. 

Second, this kind of evaluation research excels at precise measurement of the causal impact of clearly defined interventions, but is ill-suited to large and complex interventions where treatment and control groups of sufficient size are hard to define. It is also ill-suited to measuring long-term impacts, or teasing apart the mechanisms through which impact does or does not occur. Similar to earlier, following only this type of evidence can lead us to leave out highly impactful and cost-effective, but harder-to-measure interventions—for example, policy advocacy for change at the level of the country or the world. In the latter instances, for example changes in national nutrition and climate policies, more appropriate methods are case studies and legislative reviews. These qualitative approaches though are less able to tell us whether the impact was caused by a particular charity’s intervention (for example, would the government have changed its policy anyway? Would other organizations have stepped in?), i.e. whether the findings have internal validity.

Rather than only considering one kind of evidence, we seek the best quality evidence possible given the cause area and strategy—e.g. RCTs for direct delivery of health services, carbon market analysis and policy review for climate change advocacy interventions, etc.10 This has several consequences:

  • By considering different types of evidence based in part on what solutions are most impactful in a given cause area (e.g. policy advocacy to change energy policy), and in part on the strategy and vision of each charity, we are able to welcome diversity in programming and strategy. Among the common strategies we consider are: (i) providing essential information, goods and/or services directly to people, (ii) helping the government create and/or scale effective interventions, (iii) helping the private sector create and/or bring effective interventions to market, and (iv) pushing for large-scale policy or systems change through advocacy. 
  • Because some methodologies come with more uncertainty, charities pursuing strategies more suited to these methods will de facto get less rigorous evaluation. Though we do compare charities to others that have the same cause areas or measure success by the same outcomes, these do not necessarily correspond to the charity’s strategy (e.g. changing national salt iodization policy and providing  corrective surgery to individuals both avert DALYs). We think that this is worth it, because there tends to be a tradeoff between certainty and scale of impact: for example, providing direct services bears relatively less risk but has an impact at the same order as the intervention. In contrast, working with the government or private sector to scale effective interventions, or advocating for policy or systems change are riskier and harder to measure, but could have vastly larger-scale impacts. 
  • This approach also enables us to mobilize non-neutral funds from donors who are committed to particular strategies or who have different risk profiles. When these donors make a giving decision, they can choose to find charities where the scale of potential impact is proportional to the uncertainty involved. 

 

4. How we find great charities focused on these solutions

We try to find outstanding charities definitively focused on our priority cause areas working on the most impactful solutions in the most cost-effective ways.6 We consider three main criteria when evaluating a charity.

  • It must be explicitly focused on the most impactful solutions to a given problem, as judged by the best possible evidence; 
  • It must be extraordinarily cost-effective at implementing these solutions, in terms of its impact on the problem per dollar spent; and
  • It must meet the highest standards of integrity and transparency.

We build a pool of charities using different sources, and then vet them in-house before considering whether or not to recommend them.11 Our data come from the charity’s financial statements and program documents, review of studies by independent organizations who have assessed the charities, conversations with sector experts and philanthropists, and field visits by us or our partners if necessary and feasible. We do not seek to duplicate the efforts of others, so we continue to build on the work done by GiveWell, Giving Green, Founders Pledge and other charity evaluators. We also look to recipients of highly regarded awards that rigorously evaluate their grant recipients (e.g. the Elevate Prize) and to other funders who have a reputation for evidence-based giving (e.g. Focusing Philanthropy and Mulago Foundation) for potential candidates for review.

We come across three types of organizations:

  • Those that implement a single, clearly-defined intervention to address a single, clearly defined problem (e.g. bednets to tackle malaria). 
  • Those that work on multiple related interventions to address a single, clearly defined problem (e.g. bednets and seasonal chemoprevention to tackle malaria). The issue here is that money cannot be clearly assigned to a specific intervention, and may end up supporting an intervention that is less effective or the impacts of which are harder to measure.13
  • Those that work on multiple interventions to tackle multiple problems. The issue here is that money cannot be clearly assigned to a specific problem. 

We are open to considering all three kinds of organizations, based on the broad recognition that those with a track record of doing one thing extraordinarily well are reasonably likely to build strong, competent teams that can do other things reasonably well, innovate and learn from their mistakes, and generally seek to provide excellent services. 

When funding such organizations, we want to ensure wherever possible that they have the fiscal space to experiment within the broad program or “vertical” in which they have demonstrated outstanding impact. For example, if an organization is implementing multiple related interventions to address a single, clearly defined problem, and one of these interventions is highly cost-effective, then we want to encourage innovation within that set of interventions. In all cases, our funding is contingent on their continuing to implement the most effective interventions within that program and continuing to demonstrate impact—i.e., if an organization ceases to implement the most cost-effective intervention, or radically changes the approach through which it is delivered, we will reconsider recommending it.

 

5. How we estimate cost-effectiveness

Before diving into our approach, it is important to remember that all cost-effectiveness calculations are built on subjective valuations such as the value of life at different ages, the extent of suffering alleviated by a given intervention, and our best guesses about recipient utility and preferences. Small changes in many of these underlying assumptions can lead to widely differing answers. We therefore use this kind of analysis to make sure that the charity is cost-effective within some reasonable range—for example, compared to other charities seeking to change the same outcome,14 and do not try to claim that an intervention or the organization that implements it is the most cost-effective. Rather, we claim that giving money to that organization is a great bet and is highly likely to do good.

First, we assign a core outcome to each charity against which its cost-effectiveness is measured. So for example, charities that seek to increase child survival are assigned to “lives saved”, those that tackle blindness to “DALYs averted”, and those that give out cash grants to “income gained”. 

The calculation needs to take into account at least two broad issues: the extent to which the intervention can, in an ideal setting, meaningfully impact a given problem, and the extent to which the charity in question can implement the intervention successfully. 

  • We think of the first issue as impact risk, i.e. the probability, conditional on implementation, that the intervention has an impact on the outcomes we care about. One way to think of impact risk is to consider all the things that need to happen after the intervention is implemented for the core outcome to be affected, and think about how likely it is that each of these happens. In this vein, each cost-effectiveness calculation lists out all the main steps from intervention to impact, and uses the strength of the academic evidence to assign an appropriate discount based on the probability that the assumptions underlying each step fail. 
  • The second issue is implementation risk, i.e. the probability that giving the charity money leads to the intervention being successfully implemented. We can think of implementation risk as all the steps the charity needs to take from the point when it receives money to when the intervention has been implemented, and all the things that need to happen for each step to succeed. We list out these steps based on the charity’s strategy and project documents. When organizations use more than one approach, we consider their resource allocation and strategic focus as ways of determining what they actually do. 

We then calculate two basic parameters: the charity’s cost of delivering an output (dollars per output), and how those outputs impact a core outcome (outputs per outcome). 

  • The first parameter, dollars per output, is simply the full economic cost of the charity’s main product or activity—for example, delivering a bednet or providing cataract surgery.15 This number is based on the charity’s financial and administrative data, such as its expenditure on the intervention and the number of outputs it produced or delivered. The number is adjusted for a range of factors, such as the likelihood that outputs were successfully delivered (e.g. the probability that a bednet was received and used as intended by the target recipient), the likelihood of spillovers (e.g. the recipient’s neighbors saw the bednet and purchased one for themselves), and so on.
  • The second parameter, outputs per outcome, is the extent to which a given output changes an outcome that we care about, according to the best available evidence (i.e. how much does a bednet reduce the probability of death from malaria). This number is based on the body of scientific evidence around the intervention, ideally from meta analyses spanning multiple contexts. The number is adjusted to take into account the nature and quality of the evidence—for example, how rigorous the methodology is, how similar the contexts are, and so on.16 Ideally, some of this evidence would be generated by the charity itself, through externally commissioned studies, since this would require no adjustment for context.

These parameters give us two measures of cost-effectiveness: 

  • dollars per output (the cost of buying and delivering a bednet), and 
  • dollars per outcome (the cost of preventing a death from malaria).17

The first measure allows us to compare charities producing identical outputs—for example, bednets—to find the charities that are the best at delivering this intervention. The second allows us to compare charities trying to influence common outcomes using different kinds of outputs, from fewer road accidents to seasonal chemoprevention. When we consider adding new charities to a cause area, we use current charities as the benchmark for cost-effectiveness.

 

6. How we assess organizational quality

We think of a charity’s integrity in terms of its governance and financial management. We ensure that charities conform to widely accepted governance standards19 around how a charity is run (including its processes, activities and relationships). Among other things, these standards require a charity to remain charitable, operate lawfully, be run in an accountable and responsible way, and treat all employees, clients, and others who come into contact with the organization, with respect, irrespective of such criteria as race, gender, sexual orientation, ethnicity, or religion.

We think of transparency as how a charity shares and responds to information. The organization should have robust monitoring and evaluation systems and learn from these systems; it should be proactive about publicly sharing data and internal research; it should publicly admit mistakes and show that it is learning from them; it should regularly experiment with ways to increase its impact; it should produce knowledge to inform policymakers and other implementers; it should invite external review, research, and criticism; and so forth. Data for these metrics will come from our interactions with each charity (requests for documentation and data, etc.), document review, and consultations with experts, donors, and other implementers.

The above criteria are key to identifying and selecting recommended charities. Over and above these we consider additional features in the course of our biannual due diligence process, including time-bound opportunities and temporary constraints that particular charities are facing. These include:

  • Organizational stability. While each organization we recommend conforms to the highest ethical and legal standards, even good organizations have bad days. We keep track of several indicators to track organizational instability, such as ongoing moral, legal or administrative issues or pauses in operations due to any other reason. 
  • Room for funding. It is essential that recommended charities should be able to use the money we disburse. We assume by default that charities are always able to absorb more money, but periodically review year-on-year spending relative to revenue (i.e. track unspent funds). If an organization is underspending, we will then attempt to understand why, and potentially hold back funding for a given period. 
  • Funding opportunities. We may disburse additional money if the charity has a particular opportunity or need, for example to offset a funding loss or expand into a new country. The charity can request such funding through a competitive application process.
  • Evidence opportunities. We may also disburse additional money if the charity has an opportunity to collect better evidence that would help us better measure the intervention’s or charity’s impact or cost-effectiveness.

These features are temporary and would not, by and large, lead to charities being dropped from our list. But as we move to a more hands-on charity recommendation model, they allow us to ensure that our quarterly giving decisions are informed and strategically aimed at making sure our donors’ money is having the greatest impact at any given moment, and that our recommended charities are able to consistently improve, innovate, and build better evidence.


Endnotes

  1. Other examples where changing rich country policies could be far more impactful include trade, migration, conflict, pandemic prevention, and animal welfare, along with many long-termist causes.
  2. This is the Importance–Tractability–Neglectedness (ITN) framework used by most EA organizations to identify priority cause areas. Its three components can be expressed as: [I]mportance (or scale) = utility gained / % of problem solved; [T]ractability (or solvability) = % of problem solved / % increase in resources; and [N]eglectedness = % increase in resources / dollar. Then I x T x N = utility gained / dollar.
  3. These could be tax deductibility, political barriers, risk preferences, time-discounting, or different subjective valuations on, e.g., infant versus adult survival.
  4. For example, this back-of-the-envelope calculation by Matt Lerner of Founders Pledge suggests that “(a) if you can exert inexpensive enough leverage over the funding flows within some cause Y and/or (b) if funding opportunities within cause Y are sufficiently variable, cost-effectiveness is at least theoretically possible. …if you can get roughly 4000:1 leverage when it comes to spending money to move money, it can be cost-effective to influence funding patterns within this low-impact cause area.”
  5. A major limitation here is lack of data, since most preference elicitation and value of statistical life (VSL) studies are based on rich country samples. But for example, an IDinsight study asked nearly 2,000 poor people in Ghana and Kenya how much they were willing to pay to reduce the risk of death for themselves and their children, and how their community leaders would choose between life-saving and income-providing interventions. Similarly, in a recent study in Kenya (Shapiro, WBER 2019) asked recipients to rank 14 different interventions spanning agriculture, education, energy, health and water. Notably, most people ranked school inputs, financial literacy, and agricultural extension services significantly higher than bednet distribution, despite bednets being among GiveWell’s most cost-effective interventions. More representative ways to get at recipient preferences could be by analyzing global surveys that collect opinions around public spending, for example via Afrobarometer.
  6. We focus on nonprofit organizations, or charities, rather than, e.g., for-profit social enterprises or multilateral organizations, in part because there is no obvious funding mechanism for private philanthropists to contribute to such organizations. In the future we may consider recommending some of these types of recipients.
  7. Loss of functioning from a health condition is captured as a ‘disability weight’. DALYs are calculated semi-annually by the Institute of Health Metrics and Evaluation (IHME) as part of their Global Burden of Disease project.
  8. For example, fistula repair doesn’t save lives but it does helps women reintegrate into society, resume sexual functions, and improve their mental health. Cataract surgery substantially improves the ability and wellbeing of older people without necessarily leading to greater income. 
  9. Implicit here are the limits of our underlying moral framework: since we want to end global poverty, our ‘natural’ point of reference is material outcomes. Considering multiple kinds of material benefits forestalls the issue and leaves open the possibility of expanding the outcome set. It also might in the future mean switching over to subjective wellbeing measures such as happiness or life satisfaction, if that kind of data begins to be collected more frequently and comprehensively.
  10. When multiple methods are available to evaluate a given intervention, we follow the hierarchy below to select the method to use: (i) meta-analyses of RCTs (ii) meta-analyses of other quasi-experimental approaches, e.g. difference-in-differences, regression discontinuity designs, instrumental variables and matching; (iii) observational studies, e.g. panel data or cohort analysis, case-control studies, cross-sectional surveys; (iv) qualitative studies, e.g. case reports, case studies, process evaluations; (v) expert opinions.
  11. The main difference from our earlier approach to evaluation is (i) we now have in-house capacity and expertise to review all the evidence collected by others in detail before deciding to recommend a given charity, (ii) we work with specialized evaluators with whom we have an ex ante understanding of their approach and strategy, and (iii) we redo cost-effectiveness analyses to bring them in line with the strategy outlined in this document.
  12. A standard approach to deal with the second and third types is to tie funds to specific interventions—this is the approach taken by GiveWell, for example, and most traditional donors who want to offset risk. This approach only partly addresses the issue, because money can be fungible across parts of the organization. The downside is that it can also tie the hands of innovative organizations from trying new things. 
  13. A basic reference point for cost-effectiveness is: is giving money to the charity better than giving the equivalent amount of cash to the recipient? In other words, how does the impact vary relative to a GiveDirectly cash transfer of the same amount. 
  14. For policy advocacy organizations, for example, such outputs could be: getting specific pieces of legislation passed, influencing design of legislation, ensuring election of pro-reform lawmakers, etc.
  15. For example, by using the Lives Saved Tool.
  16. Dollars per outcome = dollars per output X outputs per outcome
  17. See also GiveWell’s qualitative assessment guidelines
  18. For example those set by government agencies in donor and recipient countries, e.g. the ACNC in Australia