Benno Krojer

I'm a 4th-year PhD student in AI/NLP at MILA & McGill in Montreal with Siva Reddy, supported by the Vanier Scholarship. In 2024 I interned at Meta FAIR, specifically in the JEPA team under Mido Assran.
Pronouns: he/him

I am interested in the interplay of language and perception in AI (with a smaller focus on actions + world models), both from a technical and a philosophical/cognitive angle. Recently, I am adopting methods from interpretability to study these phenomena.

Concrete questions I am interested in:

What does each modality add to a conceptual modality-agnostic understanding of the world?
E.g. What can a model learn from videos that we can’t from images, or from images that we can’t from text, ...?
Settings where language and vision need to be tightly integrated and reasoned back and forth between modalities (e.g. visual search or compositionality)
Do text-only LLMs understand the physical world to some extent?
Do representations from different modalities converge?
Becomes increasingly easier to interface as they train on more data (keyword: *Platonic Representation Hypothesis*)
The relation between generation and understanding (e.g. discrimination, classification)
Approaching these questions both from a rigorous evaluation and interp/analysis perspective:
- How do models internally combine and represent vision and language? (via analysis and interpretability)
- How can we create tasks that are shortcut-robust and precise+diagnostic in what they claim to measure? Often via minimally-different/counterfactual pairs

What got me initially interested in pursuing a PhD at the end of my undergrad? Language grounding (see: the original Symbol Grounding Problem) fascinated me, specifically the position paper Experience Grounds Language had a strong influence.
Before coming to Mila, I graduated from LMU Munich in computational linguistics where I researched on symbolic reasoning and machine translation.

Aside from research, science communication/organization is fun and important: I currently organize the Mila Tea Talks (institution-wide academic talks), have organized the NewInML workshop at NeurIPS 2023 and ran the reading group on language grounding at Mila for three years. I gave 3-minute summary of my research for a broader audience at Mila's speed science competition ( YouTube). And toogether with my friend Tomas Vergara Browne we talk to fellow scientists on our podcast Behind the Research of AI. In my undergrad, I also founded a philosophy society which is still thriving to this day.
I strongly believe these kinds of activities next to publishing papers are crucial to a healthy and vibrant research community.

In my free time, I do lots of sports. I play Ultimate Frisbee competetively and have coached juniors and a mixed-gender team. With the PhD, I don't play on the highest level anymore but look back on great memories from U24 Worlds with Team Germany and several European Club Championships.

You can email me to chat about research or life in general: benno.krojer@gmail.com

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Benno Krojer, Dheeraj Vattikonda, Luis Lara, Varun Jampani, Eva Portelance, Christopher Pal, Siva Reddy
NeurIPS'24 Spotlight (Datasets & Benchmarks) | Conference on Neural Information Processing Systems

Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy
NeurIPS'23 | Conference on Neural Information Processing Systems

Image Retrieval from Contextual Descriptions
Benno Krojer, Vaibhav Adlakha, Vibhav Vineet, Yash Goyal, Edoardo Ponti, Siva Reddy
ACL'22 | Association for Computational Linguistics

Improving Automatic VQA Evaluation Using Large Language Models
Oscar Manas, Benno Krojer, Aishwarya Agrawal
AAAI'23 | AAAI Conference on Artificial Intelligence

Pragmatic Inference with a CLIP Listener for Contrastive Captioning
Jiefu Ou, Benno Krojer, Daniel Fried
ACL Findings'23 | Findings of the Association for Computational Linguistics

Are Pretrained Language Models Symbolic Reasoners Over Knowledge?
Nora Kassner*, Benno Krojer*, Hinrich Schütze
CoNLL'20 | Conference on Computational Natural Language Learning
(* = Equal Contribution)

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation
Dario Stojanovski*, Benno Krojer*, Denis Peskov*, Alexander Fraser
COLING'20 | Conference on Computational Linguistics
(* = Equal Contribution)

Tips on what to do in Montreal for attendees of the COLM 2025 conference (written by Marius, me and Michael)
Sept 19th, 2025

Collection of good advice and blog posts, etc
Sept 19th, 2025

Things I like
Jan 2nd, 2024

Elevator Pitch & ELI5 of my first PhD project: Image Retrieval from Contextual Descriptions
May 30th, 2022
Explaining my research to someone outside the field.

What made me do a PhD far away?
April 28th, 2021
Sometimes people ask me why I left Europe when there is good opportunities there, too.

[DRAFT]: Defining the most basic concepts in Language Grounding
Jan 22nd, 2021
We rarely think about what the most basic words mean.

Explain like I am 5: Language Grounding
Nov 21st, 2020
How would I explain the research field of language grounding to a novice.

Attending my first conference! ACL 2020 from an undergrad's perspective
Jul 2nd, 2020
Networking, learning about language grounding and mentally prepping for the PhD.

Why the current Corona situation makes memories blurry
May 8th, 2020
Applying knowledge from reading about neuroscience

Moral Pluralism through the lense of optimization
Apr 29th, 2020
Some reflections after a philosophy society meeting

Thank you to Sebastian Santy for the awesome website template!