Research

My work centers on modeling and exploiting uncertainty. I began in computational imaging and low-level vision, modeling mask/hardware uncertainty for more interpretable and robust reconstruction (ECCV Oral; TPAMI; NeurIPS), and image super-resolution (ICCV). I then carried this perspective into the multimodal setting, studying and exploiting uncertainty across modalities: modeling text as a stochastic “mass” rather than a point for better text–video alignment (T-MASS, CVPR Highlight); recasting retrieval as diffusion-inspired iterative alignment (NeurIPS) or as LLM chain-of-thought reasoning for explainable ranking (X-CoT, EMNLP); and improving visual instruction via self-questioning (SQ-LLaVA, ECCV). My guiding insight is that intelligent systems should explore, address, and exploit uncertainty (knowing, acting on, and benefiting from the unknown) for better performance, trustworthiness, and reduced hallucination.

Building on this, my current work explores three seemingly-separate directions (the 3 A’s): Agent — turning raw documents into controllable training environments for multimodal search/QA agents (DocArena, Adobe Research); Algorithm — post-training methods, including preference optimization and Bayesian-grounded approaches, that push environment signals into models efficiently and losslessly (Visual Self-Refinement, EMNLP; Visual Autoregressive Modeling, IROS); and Autonomy — post-training VLA models for autonomous driving (Latent-Centroid Steering, IROS; at NVIDIA). Interestingly, I often wonder how they might reinforce one another: the same post-training algorithms could push richer environment signals into both general agents and autonomous-driving models; autonomous driving, being embodied, real-time, and safety-critical, is in turn one of the most demanding environments for stress-testing and advancing agents; and the multi-turn, tool-using reasoning of agents could make driving systems more deliberate and trustworthy. I warmly welcome a chat on any of these directions, or the connections among them.