wzziqian / optimal-bidding-policy-using-policy-gradient-in-a-multi-agent-contextual-bandit-setting Goto Github PK
View Code? Open in Web Editor NEWThis project forked from cskrishna/optimal-bidding-policy-using-policy-gradient-in-a-multi-agent-contextual-bandit-setting
We use policy gradient to help agents learn optimal policies in a competitive multi-agent contextual bandit setting