Michał Wiliński

Incoming PhD student at Carnegie Mellon University

Research in imitation learning and reinforcement learning, with applications in modern ML systems, in particular language models.

About

I am an incoming PhD student at the Robotics Institute at Carnegie Mellon University, starting in Fall 2026. I am interested in building machine learning systems that are more capable, steerable, and useful in practice, with current work centered on imitation learning and reinforcement learning, with applications in modern ML systems, especially language models.

I completed my B.Sc. in Artificial Intelligence at Poznan University of Technology. My undergrad thesis, DetoxAI, explored debiasing deep learning models in computer vision. Previously, I worked on pretrained time series models and interpretability with the Auton Lab. Earlier projects also covered SLAM for space and industrial environments, neurosymbolic NLP, and LLM assistants.

Research Interests

I am broadly interested in machine learning systems that learn efficiently from human feedback, demonstrations, and interaction.

Lately I have been especially interested in how demonstration, feedback, and interaction can be used to train more capable and reliable language-model-based systems.

Education

Carnegie Mellon University
PhD, Robotics Institute, School of Computer Science
Deferred admit, starting Fall 2026
Poznan University of Technology
B.Sc. in Artificial Intelligence, 2021-2025

Selected Publications

For a complete list, see my Google Scholar profile.

Exploring Representations and Interventions in Time Series Foundation Models

M. Wiliński, M. Goswami, W. Potosnak, N. Żukowska, and A. Dubrawski

ICML 2025, Vancouver

TimeSeriesGym: A Scalable Benchmark for Time Series ML Engineering Agents

Y. Cai, X. Li, M. Goswami, M. Wiliński, G. Welter, and A. Dubrawski

arXiv:2505.13291

DetoxAI: A Python Toolkit for Debiasing Deep Learning Models in Computer Vision

I. Stępka, Ł. Sztukiewicz, M. Wiliński, and J. Stefanowski

ECML-PKDD 2025, Porto

Interactive Multiarmed Bandit

A multiarmed bandit is a simple sequential decision problem, often explained through the image of a row of slot machines. You repeatedly choose between a few actions, each with an unknown probability of reward, and try to maximize total reward over time.

That makes it a clean toy model of a broader question: when should a learner try something uncertain, and when should it exploit what currently looks best?

Here each arm has a hidden payout probability. You can play a few rounds as a human policy, then hand control to a simple baseline and compare cumulative reward and regret in the same environment.

Try five to ten human-policy pulls first. Then run one of the baselines and compare how your exploration strategy differs.

Manual play versus policy rollout

You pulls 0 Policy pulls 0 Oracle: 0.00

You Reward 0

You Regret 0.00

Policy Reward 0

Policy Regret 0.00

Cumulative Reward Oracle is the expected return of the best hidden arm

You Policy Oracle

New game. Start with a few human-policy pulls.