Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei Efros, Trevor Darrell, CyCADA: Cycle-Consistent Adversarial Domain Adaptation,  International Conference on Machine Learning (ICML), 2018

Abstract—Domain adaptation is critical for success in new, unseen environments. Adversarial adaptation models applied in feature spaces discover domain invariant representations, but are difficult to visualize and sometimes fail to capture pixel-level and low-level domain shifts. Recent work has shown that generative adversarial networks combined with cycle-consistency constraints are surprisingly effective at
mapping images between domains, even without the use of aligned image pairs.
We propose a novel discriminatively-trained Cycle-Consistent Adversarial Domain
Adaptation model. CyCADA adapts representations at both the pixel-level and
feature-level, enforces cycle-consistency while leveraging a task loss, and does
not require aligned pairs. Our model can be applied in a variety of visual recognition
and prediction settings. We show new state-of-the-art results across multiple
adaptation tasks, including digit classification and semantic segmentation of road
scenes demonstrating transfer from synthetic to real world domains.

Screenshot from 2017-11-08 10:55:55

Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. EfrosUnpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision (ICCV), 2017
(*indicates equal contributions). (webpage, pdf)

Abstract—Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to enforce F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

Taesung Park, Sergey Levine. Inverse Optimal Control for Humanoid Locomotion. Robotics Science and Systems (RSS) Workshop on Inverse Optimal Control & Robotic Learning from Demonstration. 2013. (pdf)

Abstract—In this paper, we present a method for learning the reward function for humanoid locomotion from motion-captured demonstrations of human running. We show how an approximate, local inverse optimal control algorithm can be used to learn the reward function for this high dimensional domain, and demonstrate how trajectory optimization can then be used to recreate dynamic, naturalistic running behaviors in new environments. Results are presented in simulation on a 29-DoF humanoid model, and include running on flat ground, rough terrain, and under strong lateral perturbation.

Taesung Park. Automatic 3D Character Animation Using Inverse Reinforcement Learning. Master’s research report, Stanford University Department of Computer Science. 2013. (pdf)

Abstract – This report presents a framework for learning 3D character animation in the Markov Decision Process (MDP) setting using a reward function learned using Inverse Reinforcement Learning (IRL). Solving the 3D character control problem as an optimization in MDP using reinforcement learning is attractive because it automatically generates the details of the motion and is portable across different environments. However, this approach has been infeasible due to two drawbacks: the curse of dimensionality and the subtlety of the reward function. This report overcomes the dimensionality problem by using a local iterative LQG method that makes local approximations, and using IRL that learns the precise reward function needed to generate the desired motion. The framework was evaluated on two models, a 2-DOF snake model and a 6-DOF bipedal walker model. Both models succesfully verified that optimal control in combination with IRL can be used to control characters to achieve the desired high-level goal. It was also shown that the reward function is portable to different domains by creating a walking motion under reduced gravity.