I think the model result would be much better and more meaningful if the environment states were set to Wealth.
In this version, setting the pair (time, wealth) as states also works, but the model will always converge into the unique action choice, which has the highest return expectation, as I've mentioned at the end of the Readme report. This is because once the time is set as one of the states element, since the Time variable can not go back, each state in the states space will only be visited once. Even if your wealth stays the same after you take action, the next state is different from before since the time variable is different. Under this environment setting, each state is independent, so the algorithm would converge to the same action no matter what state it arrives in.
Setting the (time, wealth) pairs as the states also works fine for the algorithm, only it makes the question less interesting. I think it would be better to check whether the algorithm would take different actions after setting Wealth as the only state element. I'm going to try this in the future.