DDP Swimmers
This is a 14-dimensional system, and all the exhibited behaviors, including the optimal swimming gait, were learned.
A full interaction sequence with a gait learned through receding horizon DDP:
PG Swimmers
A comparison of the initial and the optimized gait, learned through policy gradient (PG):
A demonstration of a PG swimmer tracing a sigmoid path: