Cost function properties in Multi-Armed Bandits
Let \(B = max_{l}max_{x^{l}} \mid R^{l}(x^{l}) \mid \). For fixed \(x\), the optimal reward function \(J(x, M)\) has the following properties as a function of M:
- \(J(x, M)\) is convex and monotinically nondecreasing.
- \(J(x, M)\) is constant for \(M \leq -B/(1-\alpha)\)
- \(J(x, M)\) = \(M\) for \(M \geq B/(1-\alpha)\)
I will prove this by induction on \(k\). Properties 2 and 3 are evident for \(k = 0\): $$J_{0} = max(0, M) = 0$$ for \(M \lt -B/(1-\alpha)\) and $$J_{0} = M$$ for \(M \gt B/(1-\alpha)\).
Where $$B = max_{l}max_{x_{l}}\mid R(x_{l}) \mid$$. Essentially we want to bound \(L^{l}\): \(\overline L\) is an infinite geometric series: \(B + \alpha B + \alpha^{2}B + ... + \alpha^{n} B = B / (1 - \alpha) \) is an upperbound for \(L^{l}\).
Let's assume property 2 hold for some \(k\). Particularly, $$J_{k}(x, M) = const$$ for \(M \leq -B/(1-\alpha)\). Then, $$\begin{align} L^{l}(x_{k}, M, J_{k}) &= R(x_{k}) + \alpha E\{J_{k}^{l}(x_{1},..,f^{l}(x^{l}, w^{l}), x^{l+1},...,x_{n}, M)\} \\ &= R(x_{k}) + const \\ &= const \\ \end{align}$$ , which is a constant in terms of \(M\). Moreover, using an earlier bound, we know: $$\begin{align} \mid L^{l} \mid \leq B / (1 - \alpha) \\ -B/(1-\alpha) \leq L^{l} \leq B/(1-\alpha) \end{align}$$ Therefore, $$\begin{align} J_{k+1} &= max(M, max_{l}L^{l}(x, J_{k}, M)) \\ &= max_{l}L^{l}(\cdot) \\ &= const \end{align}$$, where the second equality follows from \(-B/(1-\alpha) \leq L^{l}\) and \(M \leq -B/(1-\alpha)\). Therefore, property 2 was proved for \(k+1\) and it completes induction. Similar argument applies to property 3, except the last step where: $$\begin{align} L^{l} &\leq B / (1-\alpha) & B/(1-\alpha) &\leq M \end{align}$$ , so \(J_{k+1} = M\).
Q.E.D.
Comments
Post a Comment