Comments (14)
@forrestmckee That's a good suggestion, it's not currently planned but it feels like something we ought to have. Work on stable_baselines3 is currently postponed until they complete the upgrade to the latest version of gym. Once this is done the plan is to upgrade cyberbattlesim to the latest version of gym and stable_baselines which will then allow for further improvements like the one you mentioned.
from cyberbattlesim.
@Screamer-Y the difference between the built in algorithms and Stable-Baselines3 is that the built ins have a check to ensure a valid action. SB3 doesn't, so a large portion of the time you're performing an impossible action given the current state of the environment.
from cyberbattlesim.
from cyberbattlesim.
Hi everyone,
I noticed that PR(#86) already has support for stable-baselines3 and gives the corresponding use cases in the notebooks folder, which is of greate help. Sorry for missing that before...
from cyberbattlesim.
Hi, @Screamer-Y, actually that is a interesting topic for me to investigate. If you will have any problems or you find any literature using this simulator, feel free to share, and I can do the same if you are interested.
Edited: I found the source of the error it was purely because of my own additions to the simulator.
from cyberbattlesim.
Hi, @Screamer-Y, actually that is a interesting topic for me to investigate. If you will have any problems or you find any literature using this simulator, feel free to share, and I can do the same if you are interested.
Edited: I found the source of the error it was purely because of my own additions to the simulator.
Hi, @kvas7andy , it's glad to know you're interested in this topic too.
Now I'm working on https://github.com/microsoft/CyberBattleSim/blob/4fd228bccfc2b088d911e27072a923251203cac8/cyberbattle/_env/flatten_wrapper.py.
My goal is to modify the 'action_space' from a 'spaces.MultiDiscrete' to a 'spaces.Discrete'. I do this because in stable-baselines3, if you want to use the value-based RL algorithm, action_space can only be 'spaces.Discrete'. Currently I have simply mapped all the possible actions to a Discrete value, but when a try to train a DQN agent, it can not learn from the environment properly. I'm still trying to figure out what went wrong.
Here is my modification,
class FlattenActionWrapper(ActionWrapper):
"""
Flatten all nested dictionaries and tuples from the
action space of a CyberBattleSim environment`CyberBattleEnv`.
The resulting action space is a `Discrete`.
"""
def __init__(self, env: CyberBattleEnv):
ActionWrapper.__init__(self, env)
self.env = env
# local:source_node_num x local_attacks_count;remote:source x target x remote_attacks_count;
# connect:source x target x remote_attacks_count x port x credentials
self.action_space = spaces.Discrete(env.bounds.maximum_node_count*env.bounds.local_attacks_count + \
env.bounds.maximum_node_count*env.bounds.maximum_node_count*env.bounds.remote_attacks_count + \
env.bounds.maximum_node_count*env.bounds.maximum_node_count*env.bounds.port_count*env.bounds.maximum_total_credentials)
def action(self, action: np.int64) -> Action:
n_nodes = self.env.bounds.maximum_node_count
n_local_attacks = self.env.bounds.local_attacks_count
n_remote_attacks = self.env.bounds.remote_attacks_count
n_port = self.env.bounds.port_count
n_credentials = self.env.bounds.maximum_total_credentials
if action<n_nodes*n_local_attacks:
source_node = action//n_local_attacks
local_vulnerability = action%n_local_attacks
return {'local_vulnerability': np.array([source_node, local_vulnerability])}
action -= n_nodes*n_local_attacks
if action < n_nodes*n_nodes*n_remote_attacks:
source_node = action//(n_remote_attacks*n_nodes)
target_node = (action//n_remote_attacks)%n_nodes
remote_vulnerability = action%n_remote_attacks
return {'remote_vulnerability': np.array([source_node, target_node, remote_vulnerability])}
action -= n_nodes*n_nodes*n_remote_attacks
if action < n_nodes*n_nodes*n_port*n_credentials:
source_node = action//(n_nodes*n_port*n_credentials)
target_node = (action//(n_port*n_credentials))%n_nodes
port = (action//(n_credentials))%n_port
credential = action%n_credentials
return {'connect': np.array([source_node,target_node,port,credential])}
raise NotSupportedError(f'Unsupported action: {action}')
def reverse_action(self, action):
raise NotImplementedError
I'm not a good programmer, so feel free to point out any problem and I will appreciate it.
from cyberbattlesim.
Hi, @Screamer-Y, actually that is a interesting topic for me to investigate. If you will have any problems or you find any literature using this simulator, feel free to share, and I can do the same if you are interested.
Edited: I found the source of the error it was purely because of my own additions to the simulator.Hi, @kvas7andy , it's glad to know you're interested in this topic too. Now I'm working on https://github.com/microsoft/CyberBattleSim/blob/4fd228bccfc2b088d911e27072a923251203cac8/cyberbattle/_env/flatten_wrapper.py. My goal is to modify the 'action_space' from a 'spaces.MultiDiscrete' to a 'spaces.Discrete'. I do this because in stable-baselines3, if you want to use the value-based RL algorithm, action_space can only be 'spaces.Discrete'. Currently I have simply mapped all the possible actions to a Discrete value, but when a try to train a DQN agent, it can not learn from the environment properly. I'm still trying to figure out what went wrong. Here is my modification,
class FlattenActionWrapper(ActionWrapper): """ Flatten all nested dictionaries and tuples from the action space of a CyberBattleSim environment`CyberBattleEnv`. The resulting action space is a `Discrete`. """ def __init__(self, env: CyberBattleEnv): ActionWrapper.__init__(self, env) self.env = env # local:source_node_num x local_attacks_count;remote:source x target x remote_attacks_count; # connect:source x target x remote_attacks_count x port x credentials self.action_space = spaces.Discrete(env.bounds.maximum_node_count*env.bounds.local_attacks_count + \ env.bounds.maximum_node_count*env.bounds.maximum_node_count*env.bounds.remote_attacks_count + \ env.bounds.maximum_node_count*env.bounds.maximum_node_count*env.bounds.port_count*env.bounds.maximum_total_credentials) def action(self, action: np.int64) -> Action: n_nodes = self.env.bounds.maximum_node_count n_local_attacks = self.env.bounds.local_attacks_count n_remote_attacks = self.env.bounds.remote_attacks_count n_port = self.env.bounds.port_count n_credentials = self.env.bounds.maximum_total_credentials if action<n_nodes*n_local_attacks: source_node = action//n_local_attacks local_vulnerability = action%n_local_attacks return {'local_vulnerability': np.array([source_node, local_vulnerability])} action -= n_nodes*n_local_attacks if action < n_nodes*n_nodes*n_remote_attacks: source_node = action//(n_remote_attacks*n_nodes) target_node = (action//n_remote_attacks)%n_nodes remote_vulnerability = action%n_remote_attacks return {'remote_vulnerability': np.array([source_node, target_node, remote_vulnerability])} action -= n_nodes*n_nodes*n_remote_attacks if action < n_nodes*n_nodes*n_port*n_credentials: source_node = action//(n_nodes*n_port*n_credentials) target_node = (action//(n_port*n_credentials))%n_nodes port = (action//(n_credentials))%n_port credential = action%n_credentials return {'connect': np.array([source_node,target_node,port,credential])} raise NotSupportedError(f'Unsupported action: {action}') def reverse_action(self, action): raise NotImplementedErrorI'm not a good programmer, so feel free to point out any problem and I will appreciate it.
@Screamer-Y, did you get the stable-baselines example script to work? For me it runs, but the agent never learns anything using A2C or PPO.
from cyberbattlesim.
@Screamer-Y, did you get the stable-baselines example script to work? For me it runs, but the agent never learns anything using A2C or PPO.
Hi @forrestmckee ,
Yes, From my side it works properly, I just run the code in https://github.com/microsoft/CyberBattleSim/blob/main/notebooks/stable-baselines-agent.py without any modification.
from cyberbattlesim.
@Screamer-Y are you using Linux, WSL, or Docker?
I can get the script you referenced to run, but the agent never makes it off of the foothold node regardless of the number of time steps I set. I'm also getting warnings that the agent is trying to access an invalid index.
from cyberbattlesim.
@Screamer-Y are you using Linux, WSL, or Docker?
I can get the script you referenced to run, but the agent never makes it off of the foothold node regardless of the number of time steps I set. I'm also getting warnings that the agent is trying to access an invalid index.
Hi @forrestmckee ,
I'm using Ubuntu Server 20.04 LTS. I ran the script again just now and only made one successful connect action with 10000 time steps. I think the problem is due to the way 'action_space' defined in 'flatten_wrapper', which contains all attacks, even if it is invalid and it's also the reason why you keep getting warnings.
I have the same problem when turning the 'action_space' into 'spaces.Discrete', one possible solution is reduce the dims of 'action_space' just as the way in ['agent_wrapper].(https://github.com/microsoft/CyberBattleSim/blob/main/cyberbattle/agents/baseline/agent_wrapper.py)
from cyberbattlesim.
@forrestmckee I came across the same issue as you met. But I noticed another interesting thing: during the training, although we got the warnings that the agent is trying to access an invalid index, but the number of nodes discovered so far is increasing. I think this means that the A2C or PPO is actually working. They did discover new nodes. The thing I don't understand is when the trained model is applied to the action prediction, it never discovers new nodes.
@Screamer-Y I don't quite understand why warnings are incorrect. I think you will also see the warnings if you set the logging levels. Because we have to discover new nodes, whose number is less than the maximum node count. So when the nodes are not discovered, we will always get warnings. And I also don't understand why we have to reduce the dims of action space.
from cyberbattlesim.
@blumu Is there a planned sample_valid_action
equivalent for Flattened Environments/Stable Baselines3? I believe what myself and others have discovered is that the entire observation and action spaces are "fair game" for the agent to sample from at any given time. Doesn't this mean that an agent can attempt to take an action both to and from a node that it hasn't discovered yet? This seems to greatly increase the number of time steps required for an agent to learn.
@Screamer-Y were you able to reduce the dims of the action space like you mentioned?
from cyberbattlesim.
Thanks for all the suggestions!
@Gabriel0402 I think you are right about the warnings, I didn't have a good understanding of the code at the time.
Regarding the second question, I once expected to speed up the learning process by reducing the size of action_space, and after trying this I found that this did not work significantly.
@forrestmckee @blumu So I still have questions: Is there a significant performance difference between the A2C or PPO methods implemented in
Stable-baseline3 and the DQN method implemented in agent_dql?
With the same set of iteration_count=1500 and episode_count=20, I observed in toy-ctf that A2C only gets an average return of no more than 40 per episode, which is far from the average return of about 450 in benchmark. I would be very grateful if you have any better approaches to improve the performance of Stable-baseline3.
from cyberbattlesim.
@forrestmckee Thank you so much for the speedy reply. I think I've understood what you've mentioned in this comment and the previous one :)
from cyberbattlesim.
Related Issues (20)
- Dockerfile not working HOT 7
- ModuleNotFoundError: No module named 'cyberbattle' HOT 2
- E: Unable to locate package python3.9 HOT 1
- Question about rewards in chain environment HOT 2
- Key for how the simulation is working HOT 1
- simplenv bidirectional? HOT 3
- Example agent doesn't work with conda install HOT 8
- python 3.11 not working - ValueError: mutable default <class 'cyberbattle.simulation.model.FirewallConfiguration'> HOT 9
- ValueError: mutable default <class 'cyberbattle.simulation.model.FirewallConfiguration'> for field firewall is not allowed: use default_factory HOT 3
- DQL still learning at evaluation time HOT 1
- Internal Node Id Dependent on Order of Action Execution HOT 1
- Unable to install dependencies HOT 12
- Action_space dimensions formally too large HOT 1
- A question about 'pretty_print_internal_action' HOT 2
- Making reward non-zero, what is the reason? HOT 1
- Something wrong with the node's last owned time and last reimaged time? HOT 1
- Consider designing defender as RL agent HOT 3
- init.sh doesn't work HOT 2
- How to setup machine/program for use HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cyberbattlesim.