Kathleen A. Creel

Research Interests

My work broadly concerns philosophy of machine learning, ethics of AI, and general philosophy of science. I am interested in how humans can best use computation to understand themselves and their world. How can we gain scientific understanding with opaque, black-box computational methods? What ways of explaining machine learning best serve scientific and public life? How should supposedly autonomously generated concepts inspire us to revise our own? In answering these sorts of questions, I connect traditional topics in philosophy of science such as explanation, reference, and natural kinds with a practice-based approach to the study of methods in contemporary machine learning. By examining these epistemic and normative questions, I outline more fruitful uses of machine learning for human flourishing.

Philosophy of
Machine Learning

Deep neural networks are often thought to be opaque black boxes. However, the sense in which they are opaque is philosophically interesting since every component of the system can be individually surveyed. What would it mean for such a sytem to be transparent? Is transparency required for trust? I explore explanatory strategies for opaque machine learning both as it is used for science and as it is used in public life. For an example of this work see my paper on transparency below.
I am also interested in how machine learning influences scientific and social categories. When new features crosscut our existing scientific or social categories, how do we or should we revise our understanding of the world and its contents?

Ethics of
Artificial Intelligence

Answers to epistemic questions about transparency and explanation are relevant to the use of algorithmic decision making in public life. I consider case studies of non-state use of automated decision-making, such as automated hiring systems and loan approval algorithms. What implications do adversarial examples and other known features of deep learning systems have for transparency and fairness? A paper from this project has been accepted at ACM FAccT 2021.
Whether or not these systems are black boxes, if they are treated as such we may come to trust them on a testimonial basis. I explore questions of machine testimony and of appropriate trust in automated decisionmaking systems.

General Philosophy of Science

Machine learning is only one species of a genus of scientific methods for finding patterns in data. This pattern-finding capacity is often thought to support the discovery of scientific phenomena, or the recognition of patterns that reflect activity and causal processes in the world rather than noise or instrument-caused artifacts of the data. In my work in general philosophy of science, I investigate the distinguishment of signal from noise and phenomena from artifact. I am also interested in the normativity of scientific beliefs. Arguments for popular forms of scientific explanation such as mechanistic explanation implicitly rely on normative theories of epistemic reason-giving. I am interested in borrowing tools from metaethics to examine the nature/normativity of scientific belief formation.

History of Philosophy

Google’s Ali Rahimi has called machine learning a "new alchemy": a pre-paradigmatic science whose notable successes outstrip the scientific theory meant to explain them. Early modern "natural philosophers" like Bacon, Boyle, and du Châtelet faced a similar gap between their practical ability to predict or control and their capacity to explain those successes with existing scientific theories. In this gap flowered an integrated pursuit of observation, experimentation, epistemology, and metaphysics. Lessons from this period, especially the methodological pursuits of Scottish enlightenment scientists such as Joseph Black and James Hutton, inform my work.
Likewise, machine learning holds out the promise, or perhaps illusion, that our technology-enhanced capacities can outstrip the human -- that we can get outside ourselves. My research considers how to make automated decision-making systems more fair and just while grounding them in a naturalistic understanding of human sympathy and social relationships. This work focuses on early modern sentimentalists such as David Hume, Adam Smith, and Sophie de Grouchy. For example, Hume's theory of justice and the caprice of power underlies my most recent manuscript: the Algorithmic Leviathan, concerning arbitrariness by automated decision-making systems.


Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes, with Connor Toups, Rishi Bommasani, Sarah Bana, Dan Jurafsky, and Percy Liang.
NeurIPS (2023)

Abstract: Machine learning is traditionally studied at the model level: researchers measure and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific models. In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce ecosystem-level analysis: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. For example, ecosystem-level analysis in hiring recognizes that a job candidate's outcomes are not only determined by a single hiring algorithm or firm but instead by the collective decisions of all the firms they applied to. Across three modalities (text, images, speech) and 11 datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. Even when individual models improve at the population level over time, we find these improvements rarely reduce the prevalence of systemic failure. Instead, the benefits of these improvements predominantly accrue to individuals who are already correctly classified by other models. In light of these trends, we consider medical imaging for dermatology where the costs of systemic failure are especially high. While traditional analyses reveal racial performance disparities for both models and humans, ecosystem-level analysis reveals new forms of racial disparity in model predictions that do not present in human predictions. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.

Clinical decisions using AI must consider patient values, with Jonathan Birch, Abhinav Jha, and Anya Plutynski
Nature Medicine (2022)

Abstract: Built-in decision thresholds for AI diagnostics are ethically problematic, as patients may differ in their attitudes about the risk of false-positive and false-negative results, which will require that clinicians assess patient values.

Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization? with Rishi Bommasani, Ananya Kumar, Dan Jurafsky, and Percy Liang.
NeurIPS (2022)

Abstract: As the scope of machine learning broadens, we observe a recurring theme of algorithmic monoculture: the same systems, or systems that share components (e.g. training data), are deployed by multiple decision-makers. While sharing offers clear advantages (e.g. amortizing costs), does it bear risks? We introduce and formalize one such risk, outcome homogenization: the extent to which particular individuals or groups experience negative outcomes from all decision-makers. If the same individuals or groups exclusively experience undesirable outcomes, this may institutionalize systemic exclusion and reinscribe social hierarchy. To relate algorithmic monoculture and outcome homogenization, we propose the component-sharing hypothesis: if decision-makers share components like training data or specific models, then they will produce more homogeneous outcomes. We test this hypothesis on algorithmic fairness benchmarks, demonstrating that sharing training data reliably exacerbates homogenization, with individual-level effects generally exceeding group-level effects. Further, given the dominant paradigm in AI of foundation models, i.e. models that can be adapted for myriad downstream tasks, we test whether model sharing homogenizes outcomes across tasks. We observe mixed results: we find that for both vision and language settings, the specific methods for adapting a foundation model significantly influence the degree of outcome homogenization. We conclude with philosophical analyses of and societal challenges for outcome homogenization, with an eye towards implications for deployed machine learning systems.

Artificial Knowing Otherwise, with Os Keyes.
Feminist Philosophy Quarterly (2022)

Abstract: While feminist critiques of AI are increasingly common in the scholarly literature, they are by no means new. Alison Adam’s Artificial Knowing (1998) brought a feminist social and epistemological stance to the analysis of AI, critiquing the symbolic AI systems of her day and proposing constructive alternatives. In this paper, we seek to revisit and renew Adam’s arguments and methodology, exploring their resonances with current feminist concerns and their relevance to contemporary machine learning. Like Adam, we ask how new AI methods could be adapted for feminist purposes and what role new technologies might play in addressing concerns raised by feminist epistemologists and theorists about algorithmic systems. In particular, we highlight distributed and federated learning as providing partial solutions to the power-oriented concerns that have stymied efforts to make machine learning systems more representative and pluralist.

The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision Making, with Deborah Hellman
ACM FAccT 2021, Canadian Journal of Philosophy, 2022

Abstract: Automated decision-making systems implemented in public life are typically standardized. One algorithmic decision-making system can replace thousands of human deciders. Each of the humans so replaced had her own decision-making criteria: some good, some bad, and some arbitrary. Is such arbitrariness of moral concern? We argue that an isolated arbitrary decision need not morally wrong the individual whom it misclassifies. However, if the same algorithms are applied across a public sphere, such as hiring or lending, a person could be excluded from a large number of opportunities. This harm persists even when the automated decision-making systems are "fair" on standard metrics of fairness. We argue that such arbitrariness at scale is morally problematic and propose technically informed solutions that can lessen the impact of algorithms at scale and so mitigate or avoid the moral harms we identify.

Privacy and Paternalism: The Ethics of Student Data Collection, with Tara Dixit
MIT Case Studies in Social and Ethical Responsibilities of Computing (2022)


On the Opportunities and Risks of Foundation Models, with Rishi Bommasani et. al., see full author list at link. at arXiv (2021)

Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles (e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

Transparency in Complex Computational Systems
Philosophy of Science (October, 2020, Volume 87 Issue 4)
Winner of the Ernest Nagel Early-Career Scholar Essay Award

Abstract: Scientists depend on complex computational systems that are often ineliminably opaque, to the detriment of our ability to give scientific explanations and detect artifacts. Some philosophers have suggested treating opaque systems instrumentally, but computer scientists developing strategies for increasing transparency are correct in finding this unsatisfying. Instead, I propose an analysis of transparency as having three forms: transparency of the algorithm, the realization of the algorithm in code, and the way that code is run on particular hardware and data. This targets the transparency most useful for a task, avoiding instrumentalism by providing partial transparency when full transparency is impossible.