a Universidad Nacional Autónoma de México-Programa Universitario de Estudios del Desarrollo, México.
Email addresses: chuffman@unam.mx and
hector.najera@comunidad.unam.mx, respectively.
It is hardly contentious to assert that measuring concepts is an important aspect of scientific and practical research. However, there seems to be some degree of confusion nowadays regarding what is meant by measurement in poverty research. It is obvious from these exchanges that different notions of central terms in the debate are being held (reliability, validity, measurement error, measurement model) to the detriment of common understanding. To move the literature forward, this paper falls back on the epistemology of measurement to bridge the apparent conceptual gap in the debate. This article invites more discussion and constructive exchange of views regarding the meaning of measurement in poverty research and how to assess the relative success of different efforts.
There have been a number of debates throughout the history and development of poverty measurement: relative v absolute poverty, income-based v deprivation-based measurement, differential v equal weighting, unidimensional v multidimensional measurement, among others. Several of these discussions have an underlying concern in common: producing a trustworthy poverty measure that leads to credible and accurate conclusions about its extent, evolution and distribution.
If sound measurement has been an overarching objective in poverty research, it would be reasonable to expect a clear understanding and characterisation of such a goal across the poverty measurement debates. Recent scholarly exchanges, however, point at a rather different conclusion: the lack of such a common understanding of the meaning of measurement not only creates confusion among researchers but it perilously seems to lead to never-ending and broken discussions in the field (Nájera Catalán and Gordon, 2020; Santos and Villatoro, 2020; Gordon and Nájera Catalán, 2020).
A way forward in poverty research demands developing a common understanding of the meaning of measurement itself and of the governing principles that characterise good measurement. This paper draws upon recent agreements in the epistemology of measurement to shed light on the consistency, coherence and potential ways forward in poverty measurement.
The paper is organised as follows. Section two draws upon the epistemology of measurement to frame the meaning of scientific measurement. By implementing the ideas from the epistemology of measurement section three reviews key statements made in the exchanges in the Journal of Development Studies. Section four discusses the implications of ignoring some of the chief principles from current views on measurement for poverty research. Section five concludes the manuscript.
It is no wonder that, after all the accumulated knowledge in poverty research, there are still heated debates regarding what is meant by poverty measurement. One key aspect of these debates is the lack of common ground about the meaning of measurement itself. The history and development of the meaning of measurement have a long track record, and this paper will not attempt to summarise what has been already been well documented elsewhere (Tal, 2020). There are, however, some practice-oriented characterizations that we believe shed light on its difference with respect to other activities also intended to produce knowledge like, let us say, monitoring or (generic) evaluation (Mari, 2003).
Despite the difficulties behind defining measurement, there is a wide consensus among philosophers that measurement “is an activity that involves interaction with a concrete system with the aim of representing aspects of that system in abstract terms” (Tal, 2020, p. 1).
While this characterization is too broad to count as a proper definition –as many different other activities not usually considered measurements fit the bill–, it clearly advocates for distinguishing between, on the one hand, the design, execution, and observation of a concrete physical process and; on the other hand, the formal (i. e., abstract or symbolic) structure used to represent features of the “system under measurement”.1 This seemingly obvious distinction, which underpins the widely shared notion of measurement as a representational activity, acts as a sobering reminder that not every assignment of numerical values or scores actually measures what it purports to measure.
Contemporary scholarship has come to acknowledge the richness of representational means involved in measurement, particularly the pervasive use of theoretical assumptions in designing measurement apparatuses and interpreting their indications –as forcefully argued by Pierre Duhem since the end of the nineteenth-century (Duhem, 1991[1906]), and eloquently put by Norwood Hanson at the beginning of the second half of the twentieth (Hanson, 1965). Indeed, the assumptions underlying such representations influence which measurement outcomes are obtained, how errors are detected and corrected and how the representational adequacy of measurement outcomes is evaluated.
A key insight of a recent body of scholarship, sometimes called “the epistemology of measurement”, is that the fundamental theoretical background that makes measurement possible deals precisely with modelling (constructing abstract and local representations by means of simplifying assumptions) the measurement process itself: the workings of the measuring instrument and its principled relations with the abstract quantity which we aim to measure; i. e., a measurement or metrological model (Tal, 2020).
Current epistemological accounts of measurement characterize it as a model-based information gathering activity, where measurement models are crucial for supporting inferences from the information gathered by the measuring instruments in the form of “readings” (instrument indications in metrological jargon, encodings oftentimes numeric nowadays usually located on computers) to knowledge claims (or “results”, the outcome of a measurement) formulated in terms of abstract and universal concepts about the system under measurement, and for evaluating measurement error and uncertainty (Tal, 2017b).2
It is clear that, to evaluate the uncertainties associated with the deliverance of information from a putative source, at least some theorizing about the physical information transmission system is needed.
In constructing a measure, it is common practice that all generation of data must be based on some transparent principles that are applied consistently; but, while it is expected that the data gathered to provide us with information (reflect features) of the system under measurement –otherwise we would not be using them in the first place–, it is also to be expected that the data will reflect the influence of a host other things that have nothing to do with that which researchers purport to measure (i. e. noise, everything from instrumental design, to execution and coding).
That is why great effort goes into experimental design and field data collection. If despite of all the efforts poured into the production of good quality data, it ends up exhibiting too much the influence of other things different from that aspect of which researchers mean to acquire knowledge, there is little hope they can use such data to produce adequate numerical representations of this feature.
Regardless of what “too much” means in different contexts, the important thing to note here is that, without explicitly theorizing about the information transmission system; i. e., the measurement model, there is no way to think systematically about this all too likely possibility that the scores, however computed from the data at hand, do not offer measures of the quantity they were meant to. How else, but with explicit theoretical and statistical assumptions of the relations between the data –as final states (or indications) of the measurement process– and the feature being measured, are researchers meant to tell apart signal from noise? (Tal, 2017a).
In other words, the problem with undertheorizing the measurement model is the lack of framework against which researchers are to weight the different pieces of evidence in favor of interpreting a particular set of scores as intended by the researcher (i. e., representing the feature being measured). It is hardly possible to argue in favor of a given numerical assignment as adequately representing the intended feature without such a framework. It is only through well reasoned hypothesis regarding the expected relationships between data (“indications”) and the features of the objects being measured that the conformity of the data can be assessed. Without them we are in the dark unable to justify our believes regarding the actual meaning of our scores.
We can trace back the recent debates and misunderstandings in poverty measurement to this lack of framework that rules out the assessment of what is encoded in the data (and the scores), and thus if we are justified in our beliefs regarding poverty.
Santos and Villatoro (2018) draw upon the Alkire-Foster (AF) approach to put forward an index for Latin America. Typically, the af approach has two stages: a series of steps to select poverty indicators (Alkire et al., 2015), and the aggregation of such indicators using the af formulation (Alkire and Foster, 2011). After implementing a series of empirical analysis, Santos and Villatoro (2018) concluded that the af approach leads to robust measurement of poverty for the region.
In a recent exchange in the Journal of Development Studies, Santos and Villatoro (2020) defended a broadly used approach developed by Sabina Alkire and James Foster against several criticisms leveled by scholars who oppose their use (Nájera Catalán and Gordon, 2020). Indeed, Nájera Catalán and Gordon (2020) failed to find credible evidence that the Multidimensional Poverty Index for the Latin America (MPI-LA) region measured what it purports to measure: poverty. Calling thus for a moratorium on the MPI-LA drawing attention to the likely misleading results of relying on the MPI-LA for policy making and research purposes.
It is obvious from this exchange, and recent related literature (Nájera Catalán, 2019; Dutta et al., 2021; Vollmer and Alkire, 2022), that there is little agreement among both sides of the debate as to what we talk about when we talk about measurement, and thus what constitutes credible evidence of an adequate representation of poverty –or robust measurement as referred by Alkire et al. (2015).
The key disagreements in the exchanges revolve around three aspects: the use of theory for connecting poverty as a concept with data as a necessary condition, the existence and need of an implicit measurement model, and the character of observation and inference in poverty research.
Model-based approaches to measurement state that the claims about the effective representation of poverty require a clear connection between poverty and data. However, offering such evidence without explicitly theorizing the relationship between the data sets (mostly collected through Household Surveys in this case) and poverty is quite challenging, as several questions in need of answers require such a framework for the measurement tenet to gain credibility.3
For example, how are researchers going to assess whether the variables (indicators) in the data set, from which the MPI-LA is computed, reflect non-negligible influences from other things different from poverty? Considering that this can be the case with a subset of the 13 indicators grouped in 5 dimensions (housing, basic services, living standard, employment and social protection) across 32 different data sets (17 countries at 2 points in time) (Santos and Villatoro, 2020) is certainly a likely possibility. This is something researchers simply cannot ignore or assume away if they are to make a reasonable case for the MPI-LA (or any numerical assignment for that matter) to be considered poverty measurement.
Let us think about the possibility that one of the indicators included in the MPI-LA, say the one encoding the answers to a question related to unemployment relates, in a non-negligible part, to the fact that an individual can afford not having a job because he or she is actually not-poor.
This would obviously contribute to a misalignment between the scores produced and poverty (the index would go up whereas poverty would go down, according to the proposed aggregation method), but how are researchers going to assess the magnitude of this misalignment and decide whether this ends up being negligible for an index pretending to measure poverty for a given purpose.4
Of course, this is not the only way a misalignment between a multidimensional index and poverty can take place, everything from the phrasing of questions in Household Surveys to the selection of cut-off points for binary transformations can have such an undesired effect where the assignment of numbers (or scores) does not adequately represent poverty.
Needless to say, providing evidence in favor of the intended meaning of the scores (resembling the theoretical definition assigned by the researcher) is no easy feat. But without a framework theorizing about the expected relations between data and features of the objects being measured is virtually impo‑ ssible.
In light of the above discussion, Nájera and Gordon’s critiques of the MPI-LA are all predicated on what they assume is the “implicit measurement model” underlying the MPI-LA, as no relationship between poverty and indicators are explicitly represented by Santos and Villatoro, nor by Alkire or Foster for that matter. On the one hand, they assume that poverty, as a scientific object, exists at a deeper conceptual (abstract or symbolic) level than the indicators (variables) (Gordon and Nájera Catalán, 2020), no different in this respect than other abstract terms used to describe features of empirical systems like temperature or (perhaps against a natural intuition) length.
On the other hand, they also assume deprivation is the consequence (effect) of poverty. In other words, according to their measurement model, it is possible (makes sense) to imagine a change in a person’s poverty, net of other deprivations, leading to a change in the indicators in their data sets. It is important to note that this measurement model also has implications for what we should expect in terms of poverty from specific deprivation repair.
It is based on this measurement model that certain patterns of data (correlations) are expected to be observed if poverty is present, and their absence give us reasons to doubt the adequacy of the indicators and consequently the reliability of the scores produced with them.
It is only in virtue of accepting their assumed measurement model that one can accept the different pieces of evidence provided by (Nájera Catalán and Gordon, 2020) and reasonably question that the MPI-LA actually measures poverty. If, like Santos and Villatoro (2020), researches do not find the measurement model assumed by Nájera Catalán and Gordon compelling, they will hardly find their results persuasive.
Although an explicit discussion of what they mean by measurement is absent in Santos and Villatoro’s response to Nájera and Gordon’s critiques (Santos and Villatoro, 2020), they often come across as finding poverty (that which they aim to measure) in the same conceptual level as qualitatively observed deprivations (as encoded in their data bases). Claims about poverty being “observable” and the MPI-LA being “an implementation of direct poverty measurement” [emphasis in the original] (Santos and Villatoro, 2020, p. 1785), suggest they believe that pover‑ ty measurement is a kind of observation in itself in no need of modeling (as characterized above). This characterization of poverty measurement may even feel intuitive when dealing with normatively loaded concepts like poverty; after all, who can deny, as Sen (1981, p. vii) would put it, that “[t]here is indeed much that is transparent about poverty and misery”. However, conflating poverty measures (measurement outcomes) with observational reports is riddled with philosophical difficulties (Tal, 2016).
The problem that arises from classifying measurement as observation –instead of typically involving inference, theory, statistics, abstraction and idealizations– is that the empirical content of poverty boils down to some set of (privations) observations (variables in the data sets), making very difficult to explain what scientists mean by key terms in measurement practice such as accuracy, precision and measurement error in general (not to be confused with sampling error which, unlike measurement error, disappears when dealing with the entire population of data).
The claim that poverty measures are a kind of quantitative observation (mere reports coded in numbers) automatically confers the absence of error to the MPI-LA (beyond survey coding errors), and error-free measurement stops being an idealization of the measurement process, not allowing for the production of inconsistent measurement outcomes, thus making it empirically irrefutable. It is precisely this assumption of error-free measurement what Nájera and Gordon find at odds with measurement practice qualifying the MPI-LA as unscientific (Gordon and Nájera Catalán, 2020).
An additional difficulty with poverty measurement being grounded in nothing but qualitative observation has to do with the size of the set of observations that (the quantity of data required to) give empirical content to the poverty measures, as it would have to include every instance in which poverty is observed (a census of deprivations) to being able to rule it out (declare someone non-poor). It stands to reason that this is why there are indicators that Santos and Villatoro (2018, p. 63) “would have liked to include and could not due to data limitations, such as... [i]ndicators on fundamental cognitive skills, employment formality and quality”.5 As the MPI-LA claims to make the best possible use of existing data, poverty observation will almost certainly be data –and thus measurably– constrained and downward biased.
If assuming away any difference between the abstract concept of poverty and the data at hand would render the measurement endeavor moot (and measurement error intractable), as implied by the model-based characterization of measurement given above, the question remains regarding the nature of the measurement model underlying the MPI-LA. If we are to move forward in the poverty measurement debate, a good first step to start is further theorizing the relationships assumed between poverty and its dimensions/indicators as stated by Gordon and Nandy (2012) over a decade ago.
In all fairness, the approach developed by Alkire and Foster, applied by Santos and Villatoro in their MPI-LA, was never meant as part of a measurement model. As a continuation of the Unsatisfied Basic Needs (ubn) approach in the development studies literature from the 70s, the af method was meant to offer “a framework with respect to which various research and policy questions about multidimensional poverty can be analyzed, and the multiple deprivations which so many suffer can be reduced” (Alkire, 2013).6 Rather than embarking in what Alkire (following Sen) has labelled as a “quixotic search for the perfect measure”7 or the “Scylla of empirical over ambitiousness” (Alkire, 2013; Alkire and Kanagaratnam, 2021), the mpi aimed to offer a valuable tool “sufficient to guide multidimensional poverty reduction efforts to critical objectives” (Alkire, 2013, p. 92 [emphasis in original]). And, as a goal-monitoring tool, an argument can be made that it delivered as promised.
While hardly sufficient (or necessary) in a strict sense, keeping tabs on intended outcomes in a multidimensional/multidomain dashboard fashion does help in guiding poverty reduction efforts, it does not get us any closer to reasonably justifying a particular assignation of numbers as measurement, “perfect” of otherwise. The problem is not merely terminological, calling the mpi a measurement procedure implies suitability for producing scientific evidence, a distinction that is not shared by evaluation in general.
One of the distinctive outcomes of the af aggregation method is their Adjusted Headcount Ratio or M0 and it is often used as a metric to derive conclusions about the extent, evolution and distribution of poverty. Unjustifiedly taping into the evidential status of measurement can seriously compromise the scientific generalization –i. e. the objectivity– needed for developing knowledge about poverty and how to fight it, independently of the particular instruments and procedures used for its measurement (Tal, 2017b). Since any quantitative comparison based on the mpi, both geographical and in time, is likely to be confounded when interpreted as differences in poverty –if only because measurement is hardly found without explicit intent–, using the mpi can easily lead to incorrect conclusions as these comparisons do not produce meaningful results in terms of poverty as researchers (and policy officials) would expect from a poverty measure. One may just as easily discover “spurious” group differences that are in fact not there or miss true group differences that have been masked. Neither rigorous research design, nor advanced statistics, nor large samples can correct inferences being made on this basis. All of this makes it really hard to relate findings from different investigations and deepen our understanding of poverty and its drivers.
Another regrettable consequence of overlooking the inferential nature of measurement is that the distinction between poverty and the means used to explore it (the indicators) gets diluted, and the data variables used in the computation of the scores, for all intents and purposes, become indistinguishable from poverty itself, leading public officials to falsely interpret (and advertise) any and every specific deprivation repair as poverty alleviation.8 This state of affairs also leads to an undesirable multiplicity of the scientific concept in detriment of comparability, as the definition of poverty becomes dependent not only on the chosen dimensions (the particular data variables) that go into the algorithm, but the particular data set used (a sample collected at a particular time and place).
Researchers may rightly wonder if the vagueness that sometimes surrounds the definition of poverty does not make it a concept just too multifaceted to be measured without loss of meaning (Cartwright and Runhardt, 2014). Indeed, definitional uncertainty (Giordani and Mari, 2014; Gregis, 2015) can simply overwhelm measurement, and this certainly may well be the case of the current understanding of poverty as capability deprivation.
Many disciplines have benefited from standard measurement practices that put the conceptualisation and estimation of uncertainty (random and systematic errors) at the very centre of measurement endeavours. In poverty research, the progress made in by the Bristol School has shown fruitful results theorizing about the concept and measurement of poverty (Gordon, 2000 and 2006; Townsend, 1979). The lessons learned from psychological and educational assessment have proven fruitful also in poverty measurement, as Structural Equation Modelling has served as a statistical framework to test the empirical assumptions underlying causative (reflexive) measurement models with reasonable success.
Santos and Villatoro could be right in pointing out that these statistical methods (the same used by Nájera and Gordon) may not be appropriate in assessing the adequacy of the MPI-LA, particularly if “[the MPI-LA does not] propose a hypothesis of the correlations between dimensions and indicators” (Santos and Villatoro, 2020, p. 1786); however, this does not change the fact that the burden of proof relative to the adequacy of the MPI-LA in representing poverty remains with them, and the evidence offered requires for its proper interpretation an explicit measurement model.
Alkire, S. (2007). The missing dimensions of poverty data: introduction to the special issue, Oxford Development Studies, 35(4). https://doi.org/10.1080/13600810701701863
______ (2013). Choosing dimensions: The capability approach and multidimensional poverty. In N. Kakwani and J. Silber (eds.). The many dimensions of poverty (pp. 89-119). Palgrave Macmillan. https://doi.org/10.1057/9780230592407_6
Alkire, S. and Foster, J. (2011). Counting and multidimensional poverty measurement. Journal of Public Economics, 95(7). https://doi.org/10.1016/j.jpubeco.2010.11.006
______ and Kanagaratnam, U. (2021). Revisions of the global multidimensional poverty index: indicator options and their empirical assessment. Oxford Development Studies, 49(2). https://doi.org/10.1080/13600818.2020.1854209
______, Roche, J. M., Ballon, P., Foster, J., Santos, M. E. and Seth, S. (2015). Multidimensional poverty measurement and analysis. Oxford University Press.
Cartwright, N. L. and Runhardt, R. (2014). Measurement. In N. Cartwright and E. Montuschi (eds.). Philosophy of social science: A new introduction. Oxford University Press (chap. 14, pp. 265-287).
Duhem, P. M. M. (1991[1906]). The aim and structure of physical theory, vol. 13. Princeton University Press.
Dutta, I., Nogales, R. and Yalonetzky, G. (2021). Endogenous weights and multidimensional poverty: A cautionary tale. Journal of Development Economics, 151. https://doi.org/10.1016/j.jdeveco.2021.102649
Giordani, A. and Mari, L. (2014). Modeling measurement: Error and uncertainty. In M. Boumans, G. Hon and A. Petersen (eds.). Error and uncertainty in scientific practice. Pickering and Chatto.
Gordon, D. (2000). The scientific measurement of poverty: Recent theoretical advances. In J. Bradshaw (ed.). Researching poverty. Ashgate.
______ (2006). The concept and measurement of poverty. Poverty and social exclusion in Britain. Policy Press.
Gordon, D. and Nandy, S. (2012). Measuring child poverty and deprivation. In A. Minujin and N. Shailen (eds.). Global child poverty and well-being. Policy Press.
______ and Nájera Catalán, H. E. (2020). Reply to Santos and colleagues “the importance of reliability in the Multidimensional Poverty Index for Latin America (MPI-LA)”. The Journal of Development Studies, 56(9). https://doi.org/10.1080/00220388.2019.1663178
Gregis, F. (2015). Can we dispense with the notion of “true value” in metrology? Standardization in Measurement. Routledge. https://doi.org/10.4324/9781315653648
Hanson, N. R. (1965). Patterns of discovery: An inquiry into the conceptual foundations of science. cup Archive.
Mari, L. (2003). Epistemology of measurement. Measurement, 34(1). https://doi.org/10.1016/S0263-2241(03)00016-2
Nájera Catalán, H. (2019). Reliability, population classification and weighting in multidimensional poverty measurement: A Monte Carlo study. Social Indicators Research, 142(3). https://doi.org/10.1007/s11205-018-1950-z
Nájera Catalán, H. E. and Gordon, D. (2020). The importance of reliability and construct validity in multidimensional poverty measurement: An illustration using the multidimensional poverty index for Latin America (MPI-LA). The Journal of Development Studies, 56(9). https://doi.org/10.1080/00220388.2019.1663176
Santos, M. E. and Villatoro, P. (2018). A multidimensional poverty index for Latin America. Review of Income and Wealth, 64(1). https://doi.org/10.1111/roiw.12275
______ and Villatoro, P. (2020). The importance of reliability in the multidimensional poverty index for Latin America (MPI-LA). The Journal of Development Studies, 56(9). https://doi.org/10.1080/00220388.2019.1663177
Sen, A. (1981). Poverty and famines. An essay on entitlement and deprivation. Oxford University Press.
Tal, E. (2016). How does measuring generate evidence? The problem of observational grounding. Journal of Physics: Conference Series, IOP Publishing, vol. 772. https://doi.org/10.1088/1742-6596/772/1/012001
______ (2017a). Calibration: Modelling the measurement process. Studies in History and Philosophy of Science Part A 65. https://doi.org/10.1016/j.shpsa.2017.09.001
______ (2017b). A model-based epistemology of measurement. In N. Mößner and A. Nordmann (eds.). Reasoning in measurement. Routledge.
______ (2020). Measurement in science. In EN Zalta (ed.). The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab. Stanford University.
Townsend, P. (1979). Poverty in the United Kingdom: A survey of household resources and standards of living. University of California Press.
Vollmer, F. and Alkire, S. (2022). Consolidating and improving the assets indicator in the global Multidimensional Poverty Index. World Development 158:105997. https://doi.org/10.1016/j.worlddev.2022.105997
Woodward, J. F. (2011). Data and phenomena: A restatement and defense. Synthese, 182(1). https://doi.org/10.1007/s11229-009-9618-5
1 This discussion dovetails very nicely with James Woodward and James Bogen’s work on the distinction between data and phenomena (Woodward, 2011).
2 While some forms of evaluation (e. g. impact evaluation) also depend on casual models to make inferences about the likely effect of an intervention, these models are not necessarily measurement models in on themselves.
3 Some of these questions were posed by Gordon and Nandy (2012) over a decade ago.
4 Note that in this context the term “indicator” does not presuppose the success in indicating anything, but only a working hypothesis of it carrying some relevant information.
5 In a similar line of thought, Alkire wondered about “the missing dimensions of poverty data” (2007, p. 347) and even “[w]hat dimensions comprise poverty itself?” (2013, p. 95).
6 In the same vein, Vollmer and Alkire (2022) recently advised against the use of single metrics in poverty research as they obscure information that is potentially relevant to policy: for example, if a health subindex is created such that a child is deprived in health either if they lack such measurement or did not have an assisted birth, and the subindex rates each child as deprived or non-deprived, policy actors who wish to address the health deprivation do not know whether to focus on immunisation or maternal health.
7 According to Alkire and Kanagaratnam (2021, p. 92) “...global poverty measures, like Don Quixote, harbour an impossible dream. They must be sufficiently accurate measures of poverty for households of multiple sizes, compositions, occupations, locations, ages, and cultures. They must use existing data. They must retain a large sample in order to reduce sampling errors and permit disaggregation. They must reflect the meanings of poverty that different people and groups hold, and effectively monitor widespread policy priorities such as the Sustainable Development Goals (sdgs). In addition, they must be relatively robust to alternative specifications of controversial parameters. As in the case of the Man of La Mancha, the quest for a perfect global poverty measure is clearly doomed”.
8 As an example of the perils of this confusion we could take the fight against climate change. The implication would be confounding air conditioning as means to tackle global warming itself.