王凯文 Kaiwen Wang

Hello! I'm a Computer Science Ph.D. student at Cornell, and I work on reinforcement learning (RL) and causal inference. I'm co-advised by Professors Nathan Kallus and Wen Sun.

Prior to Cornell, I worked on RL-for-ranking at Meta, where I helped develop the ReAgent platform and train ranking models for Instagram and Facebook Watch. Before that, I got my bachelors in Computer Science from Carnegie Mellon University.

If you're an undergrad or Master's student at Cornell and are interested in collaborating, please reach out! My email is kw437 at cornell dot edu.

Bio  /  Github  /  Google Scholar  /  Twitter

profile photo
Preprints
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun
arXiv

Second-order (variance-dependent) bounds in online and offline RL with distributional RL.

Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes
Andrew Bennett, Nathan Kallus, Miruna Oprescu, Wen Sun, Kaiwen Wang (alphabetical order)
arXiv

Useful for sensitivity analysis in MDPs with per-step confounding.

Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to Standard RL
Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun
arXiv

Algorithms for risk-sensitive RL with Optimized Certainty Equivalents, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance.

Switching the Loss Reduces the Cost in Batch Reinforcement Learning
Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári
arXiv

Consider trying log loss (binary cross-entropy) instead of squared loss in offline RL algorithms. We provide provable benefits!

JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Kaiwen Wang*, Junxiong Wang*, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
arXiv / code

A lightweight RL environment for join order selection in database query optimization.

Publications
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun
NeurIPS, 2023
arXiv / code

We provide the first mathematical and rigorous explaination of why and when maximum-likelihood-estimation based distributional RL can be better than regular RL, in contextual bandits, online RL, and offline RL.

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR
Kaiwen Wang, Nathan Kallus, Wen Sun
ICML, 2023
arXiv

We present the first (nearly) minimax-optimal algorithms for CVaR RL.

Provable Benefits of Representational Transfer in Reinforcement Learning
Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang (alphabetical order)
COLT, 2023
An earlier version appeared at the Offline RL Workshop @ NeurIPS, 2022   (Oral)
arXiv

We show that representation learning can provably enable transfer learning in online RL in low-rank MDPs.

Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies
Shachi Deshpande, Kaiwen Wang, Dhruv Sreenivas, Zheng Li, Volodymyr Kuleshov
NeurIPS, 2022
arXiv

Deep multi-modal deconfounder with applications to genomics (genome wide association studies).

Learning Bellman Complete Representations for Offline Policy Evaluation
Jonathan Chang*, Kaiwen Wang*, Nathan Kallus, Wen Sun (*=equal contribution)
ICML, 2022   (Oral)
arXiv / code / video

Representation learning for Offline Policy Evaluation (OPE) guided by Bellman Completeness and coverage. BCRL achieves state-of-the-art evaluation on image based, continuous control tasks from Deepmind Control Suite.

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou (alphabetical order)
ICML, 2022   (Spotlight)
arXiv / code / video

Offline policy evaluation (OPE) and policy learning (OPL) that is robust to both environment shifts (distributional robustness) as well as missing propensities (double robustness).

Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning
Kaiwen Wang, Travis Dick, Maria-Florina Balcan
Workshop on Privacy-Preserving Machine Learning @ AAAI, 2020   (Oral)
arXiv / code

Distributed differentially private algorithms for learning decision trees in a top-down way.

Image-derived generative modeling of pseudo-macromolecular structures --- towards the statistical assessment of Electron CryoTomography template matching
Kaiwen Wang, Xiangrui Zeng, Xiaodan Liang, Zhiguang Huo, Eric P. Xing, Min Xu
BMVC, 2018
arXiv

Hypothesis testing for template matching using 3D-GAN generated macromolecules.

Other Projects
Reinforcement Learning Assembly (ReLA)
code

From my FAIR Internship in the summer of 2019. Distributed framework for rapidly training RL agents, e.g. Ape-X and R2D2.

Erdős-Rényi Random Graph
website

Enjoy!


Template from here.