Cartpole-v0 is currently the only environment and is overly simplistic (4 states, 2 actions). There should be a gradual increase in problem complexity throughout the project.
A simple solution to the cartpole problem directly extracting the state space from the environment would be an educational middle step between a full RL approach and the current monte carlo solution. This would be a much better way to introduce concepts of RL than the present method.