Skip to content
Raphaël Millière

Research

Research Topics

Capacities and Limitations of AI Systems

Debates about the capacities of current AI systems, such as language models, are starkly polarized: some dismiss them as mere statistical pattern-matchers while others herald them as genuinely intelligent. This polarization reveals a methodological gap in how we evaluate these systems. In my research, I tackle both first-order questions about whether AI systems can be ascribed specific capacities (like syntactic competence or analogical reasoning) and second-order questions about how we should assess these capacities in the first place. Standard benchmarks in the AI industry often lack construct validity and are easy to game. My work proposes adapting best practices from cognitive science to design rigorous behavioral experiments with proper controls, as well as interventional experiments that provide insight into the causal mechanisms responsible for behavior.

Research Questions

  • How can we design evaluations that reliably distinguish between superficial heuristics and genuine cognitive capacities in AI systems?
  • Is there a double dissociation between performance and competence in AI systems analogous to that observed in human cognition?
  • To what extent are current AI models, particularly large language models, capable of genuine reasoning rather than sophisticated pattern matching based on statistical correlations?
  • To what extent can neural network architectures implement forms of systematic cognition previously thought to require symbolic processing, and how can we empirically test these capabilities?

Selected Works

  • LLMs as Models for Analogical Reasoning

    Finds that while advanced language models can match human performance on novel analogical reasoning tasks requiring flexible re-representation of semantic information, they exhibit different patterns of behavior in response to task variations and semantic distractors, suggesting they may use different underlying mechanisms than humans.

  • Anthropocentric Bias in Language Model Evaluation

    Identifies two types of anthropocentric bias in evaluating large language models' cognitive capacities – overlooking auxiliary factors impeding performance despite competence (Type-I) and dismissing non-human-like competent strategies (Type-II) – and proposes mitigating these biases through an empirically-driven, iterative approach combining behavioral experiments with mechanistic studies.

  • Language Models as Models of Language

    Critically examines the potential contributions of modern language models to theoretical linguistics and debates about linguistic competence and acquisition, particularly by challenging learnability claims about syntax and providing evidence that hierarchical syntactic knowledge can emerge from exposure to linguistic data without built-in syntactic constraints.

  • Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models

    Introduces BIG-bench, a diverse and challenging benchmark of over 200 tasks for evaluating large language models, finding that model performance improves with scale but remains far below human-level. Note: I co-designed the 'conceptual combination' task, which tests language models' ability to grasp novel combinations of concepts, including made-up words.

Foundations of Interpretable AI

Artificial neural networks are often described as inscrutable black boxes. The emerging field of mechanistic interpretability aims to reverse-engineer these networks by uncovering the internal causal structures that generate their behavior. This approach seeks to identify both the features encoded in activation patterns and the algorithms implemented by specific circuits within the networks. Despite recent progress in mechanistic interpretability, the field still lacks robust conceptual foundations and methodological consensus. My AI2050 fellowship project, funded by Schmidt Sciences, aims to bridge this gap by drawing from the philosophy of science and causation. In particular, it addresses the risk of interpretability illusions – compelling but misleading explanations for the inner workings of neural networks.

Research Questions

  • What does it mean for neural networks to be 'interpretable,' and what are the criteria for adequate explanations of their behavior?
  • How can causal intervention techniques, as opposed to purely behavioral methods, provide deeper insights into the information processing mechanisms of deep neural networks?
  • Can interpretability methods yield illusory explanations of how neural networks process information?
  • What kind of functional primitives can bridge the explanatory gap between low-level neural mechanisms and high-level capabilities?

Selected Works

AI Safety and Alignment

As AI systems get more capable, we need to ensure they are safe, reliable, and aligned with human values. The main method to align the behavior of language models with desirable norms such as helpfulness, harmless and honesty involves fine-tuning them based on human preferences. I argue that this approach is fundamentally shallow and vulnerable to adversarial manipulation that exploits conflicts between the norms of alignment – for example, where being helpful conflicts with avoiding harm. While humans can navigate such conflicts through explicit deliberation that weighs the contextual relevance of competing norms, language models currently lack a robust capacity for normative reasoning. By bridging technical research on alignment methods with insights from moral philosophy and psychology, I aim to understand why AI systems remain vulnerable to blatant adversarial attacks, and how we can develop less superficial alignment strategies.

Research Questions

  • How do conflicts between the norms of alignment create exploitable vulnerabilities in language models fine-tuned to respect these norms?
  • How can we systematically evaluate an AI system's robustness against different types of normative conflicts?
  • What would genuine normative deliberation look like in AI systems, and how could it be implemented in practice?
  • Can insights from moral philosophy and value pluralism help create AI systems capable of contextual ethical reasoning resilient to adversarial manipulation?

Selected Works

  • Normative Conflicts and Shallow AI Alignment

    Argues current alignment strategies for language models are fundamentally inadequate because they reinforce shallow behavioral dispositions that leave them vulnerable to the exploitation of conflicts between norms like helpfulness, honesty, and harmlessness.

  • The Alignment Problem in Context

    Reviews current strategies to align the behavior of language models with desirable norms, and investigates why they remain vulnerable to adversarial attacks that elicit potential harmful outputs.

  • Adversarial Attacks on Image Generation With Made-Up Words

    Introduces two novel adversarial attacks on text-guided image generation models using made-up words, which can be used to bypass content filters and generate problematic images.

Consciousness and Self-Consciousness

In previous work, I investigated the nature and scope of conscious self-representation in ordinary experience as well as in specific conditions. I developed a pluralist account that distinguishes between several modes of self-representation across conscious thoughts, bodily experiences, and perceptual states – each of which can be disrupted either separately or jointly in anomalous cases, including psychopathologies and drug-induced states. I also argued against the long-standing claim that self-consciousness is constitutive of consciousness, which is either trivially true on a deflationary interpretation or unsupported on an inflationary interpretation. One upshot of my research is that it is both conceptually and nomologically possible to be conscious without being conscious of oneself in any way.

Research Questions

  • Is self-consciousness a necessary component of all conscious experience, or are 'selfless' states of consciousness genuinely possible?
  • What are the different varieties or dimensions of self-consciousness, and how can they be independently disrupted or modulated?
  • How can the study of altered states of consciousness (e.g., in psychopathologies and drug-induced states) inform our understanding of the nature of self-representation and ordinary experience?
  • What is the relationship between memory, self-representation, and the first-person perspective in reporting past conscious states?

Selected Works

  • Constitutive Self-Consciousness

    Argues that the claim that consciousness constitutively involves self-consciousness is either trivial on a deflationary interpretation or insufficiently supported on an inflationary interpretation.

  • Selfless Memories

    Argues that subjective reports of conscious experiences lacking self-consciousness can be credible under certain conditions and do not necessarily conflict with subjects' abilities to recall and report such experiences as their own.

  • The Varieties of Selflessness

    Distinguishes several forms of self-consciousness, showing through empirical evidence that each of them can be independently absent in certain conscious states, and further argues that there exist 'totally selfless' states of consciousness in which all of them are concurrently missing.

Research Outputs

2026

Anthropocentric Bias in Language Model Evaluation

Raphaël Millière & Charles Rathkopf·Computational Linguistics·Journal Paper

Language Models as Models of Language

Raphaël Millière·The Oxford Handbook of the Philosophy of Linguistics·Book Chapter

2025

Associationist Theories of Thought

Eric Mandelbaum & Raphaël Millière·The Stanford Encyclopedia of Philosophy·Encyclopedia Entry

Constitutive Self-Consciousness

Raphaël Millière·Australasian Journal of Philosophy·Journal Paper

Interventionist Methods for Interpreting Deep Neural Networks

Raphaël Millière & Cameron Buckner·Neurocognitive Foundations of Mind·Book Chapter

Normative Conflicts and Shallow AI Alignment

Raphaël Millière·Philosophical Studies·Journal Paper

Transformers

Raphaël Millière·Open Encyclopedia of Cognitive Science·Encyclopedia Entry

The Vector Grounding Problem

Dimitri Coelho Mollo & Raphaël Millière·arXiv·Preprint/Other

LLMs as Models for Analogical Reasoning

Sam Musker, Alex Duchnowski, Raphaël Millière & Ellie Pavlick·Journal of Memory and Language·Journal Paper

How Do Transformers Learn Variable Binding in Symbolic Programs?

Yiwei Wu, Atticus Geiger & Raphaël Millière·Forty-Second International Conference on Machine Learning·Conference Paper

2024

Drug-Induced Body Disownership

Raphaël Millière·Philosophical Perspectives on Psychedelic Psychiatry·Book Chapter

A Philosophical Introduction to Language Models – Part II: The Way Forward

Raphaël Millière & Cameron Buckner·arXiv·Preprint/Other

A Philosophical Introduction to Language Models – Part I: Continuity With Classic Debates

Raphaël Millière & Cameron Buckner·arXiv·Preprint/Other

Philosophy of Cognitive Science in the Age of Deep Learning

Raphaël Millière·WIREs Cognitive Science·Journal Paper

Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models

Safoora Yousefi, Leo Betthauser, Hosein Hasanbeig, Raphaël Millière & Ida Momennejad·arXiv·Preprint/Other

2023

The Alignment Problem in Context

Raphaël Millière·arXiv·Preprint/Other

Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models

Aarohi Srivastava, Abhinav Rastogi & Abhishek Rao et al.·Transactions on Machine Learning Research·Journal Paper

2022

Adversarial Attacks on Image Generation With Made-Up Words

Raphaël Millière·arXiv·Preprint/Other

Deep Learning and Synthetic Media

Raphaël Millière·Synthese·Journal Paper

Drug-Induced Alterations of Bodily Awareness

Raphaël Millière·The Routledge Handbook of Bodily Awareness·Book Chapter

Selfless Memories

Raphaël Millière & Albert Newen·Erkenntnis·Journal Paper

2020

The Multi-Dimensional Approach to Drug-Induced States: A Commentary on Bayne and Carter's "Dimensions of Consciousness and the Psychedelic State"

Martin Fortier-Davy & Raphaël Millière·Neuroscience of Consciousness·Journal Paper

The Varieties of Selflessness

Raphaël Millière·Philosophy and the Mind Sciences·Journal Paper

Radical Disruptions of Self-Consciousness

Raphaël Millière & Thomas Metzinger·Philosophy and the Mind Sciences·Journal Paper

Self in Mind: A Pluralist Account of Self-Consciousness

Raphaël Millière·Thesis·Thesis

2019

Are There Degrees of Self-Consciousness?

Raphaël Millière·Journal of Consciousness Studies·Journal Paper

Neural Correlates of the DMT Experience Assessed with Multivariate EEG

Christopher Timmermann, Leor Roseman & Michael Schartner et al.·Scientific Reports·Journal Paper

2018

Psychedelics, Meditation and Self-Consciousness

Raphaël Millière, Robin L. Carhart-Harris, Leor Roseman, Fynn-Mathis Trautwein & Aviva Berkovich-Ohana·Frontiers in Psychology·Journal Paper

2017

Looking For The Self: Phenomenology, Neurophysiology and Philosophical Significance of Drug-induced Ego Dissolution

Raphaël Millière·Frontiers in Human Neuroscience·Journal Paper

2016

Ingarden's Combinatorial Analysis of The Realism-Idealism Controversy

Raphaël Millière·Form(s) and Modes of Being: The Ontology of Roman Ingarden·Book Chapter