Can't reproduce an exmaple from textbook no matter what, Adaptive Dynamic programming / Adaptive optimal Control
Hi everyone!
I am trying to reproduce an example from a textbook step by step, but I cannot get the same result, no matter what i try
The example is about value iteration / adaptive dynamic programming for a discrete-time LQR problem. The system is a 4-state linear system obtained by discretizing a continuous-time model with zero-order hold and sampling time (T_s = 0.01) s.
What the authors say is that they use value iteration and estimate the parameters of (P) online using batch least squares every 15 data points collected from the trajectory. Then they update the controller and continue iterating. According to the book, this converges to the correct Riccati solution. (Exmple is at page 41 of this book :
I tried to reproduce exactly the same procedure in MATLAB, using the same type of quadratic features and solving the least-squares problem every 15 samples, but when I do this I do not have enough independent features to solve the problem correctly. The regression matrix quickly becomes rank deficient or nearly singular because the trajectory converges and the states lose excitation.
If I artificially collect much more data, or use many random resets of the initial condition, or generate many trajectories, then the estimation starts working much better and the learned (P) gets close to the true solution. But the book explicitly seems to suggest that the method works just by updating every 15 points along the trajectory.
And this is my code: https://pastebin.com/YkpUiNYj
These are the result from the book :
And these are mine:
i tried longer simulation time, different initial position, checking for rank but nothing seems to came close to their solution.
Has anyone worked with this before?? Thank a lot for your help!