Michał Wiliński
Incoming PhD student at Carnegie Mellon University
Research in imitation learning and reinforcement learning, with applications in modern ML systems, in particular language models.
About
I am an incoming PhD student at the Robotics Institute at Carnegie Mellon University, starting in Fall 2026. I am interested in building machine learning systems that are more capable, steerable, and useful in practice, with current work centered on imitation learning and reinforcement learning, with applications in modern ML systems, especially language models.
I completed my B.Sc. in Artificial Intelligence at Poznan University of Technology. My undergrad thesis, DetoxAI, explored debiasing deep learning models in computer vision. Previously, I worked on pretrained time series models and interpretability with the Auton Lab. Earlier projects also covered SLAM for space and industrial environments, neurosymbolic NLP, and LLM assistants.
Selected Publications
For a complete list, see my Google Scholar profile.
TimeSeriesGym: A Scalable Benchmark for Time Series ML Engineering Agents
DetoxAI: A Python Toolkit for Debiasing Deep Learning Models in Computer Vision
Interactive Multiarmed Bandit
A multiarmed bandit is a simple sequential decision problem, often explained through the image of a row of slot machines. You repeatedly choose between a few actions, each with an unknown probability of reward, and try to maximize total reward over time.
That makes it a clean toy model of a broader question: when should a learner try something uncertain, and when should it exploit what currently looks best?
Here each arm has a hidden payout probability. You can play a few rounds as a human policy, then hand control to a simple baseline and compare cumulative reward and regret in the same environment.
Try five to ten human-policy pulls first. Then run one of the baselines and compare how your exploration strategy differs.
Manual play versus policy rollout
New game. Start with a few human-policy pulls.