MathJax TeX Test Page
I've been reading Section 1.3, Proposition 1.3.1. in "Dynamic Programming and Optimal Control, vol 2" by Dimitri
Bertsekas,
where author posed a proof of a proposition as an exercise to a reader. The proposition described properties of
the optimal
reward function , paraphrased here for reference:
Let . For fixed , the optimal reward function has
the following properties
as a function of M:
- is convex and monotinically nondecreasing.
- is constant for
- = for
Author proceeded with the proof by considering the reward function at timestep : and using
classical DP iteration: , where we choose between retiring
and continuing working on projects. We know as . He proved the 1st property, and below
I will prove the last two.
I will prove this by induction on . Properties 2 and 3 are evident for :
for and for .
Where . Essentially we want to bound :
is an infinite geometric series: is an
upperbound for .
Let's assume property 2 hold for some . Particularly, for .
Then,
, which is a constant in terms of . Moreover, using an earlier bound, we know:
Therefore, ,
where the second equality follows from and .
Therefore, property 2 was proved for and it completes induction.
Similar argument applies to property 3, except the last step where:
, so .
Q.E.D.
Comments
Post a Comment