crowdstrike / caracara Goto Github PK
View Code? Open in Web Editor NEWDeveloper enhancements (DX) for FalconPy, the CrowdStrike Python SDK
License: MIT License
Developer enhancements (DX) for FalconPy, the CrowdStrike Python SDK
License: MIT License
There is currently no way either in falconpy or caracara to stream a file down to the disk. That causes 4GB requests to be entirely loaded in memory at some point.
We're using ugly hacks to pass stream=True
to a raw requests.request
and call it a day, so that we can stream a large file to disk chunk by chunk.
def rtr_session_download_to_path(self, session_id, sha256, destination, known_size = None):
'''
Downloads an extracted file straight into a file (7z -pinfected) using
chunks, so that we don't have a 4GB single http request in memory at
some point. Or several in parallel.
'''
# First, prepare a HTTP request by stealing the self.falcon config for URL & token
url = f'{self.falcon.base_url}/real-time-response/entities/extracted-file-contents/v1'
params = {
'session_id': session_id,
'sha256': sha256,
}
self.logger.debug(f'Getting file sha256={sha256}, session_id={session_id} into {destination}')
total_written_bytes = 0
with request(
'get',url,
# Here we assume the token is fresh enough, which is usually the case since we just listed the file properties.
headers = self.falcon.headers(),
verify = self.falcon.ssl_verify,
stream = True,
params = params,
) as r:
if not destination.parent.exists():
self.logger.info(f'Creating folder {destination.parent}')
destination.parent.mkdir(parents = True, exist_ok = True)
with destination.open('wb') as f, tqdm(
desc=str(destination),
total=known_size,
unit='iB',
unit_scale=True,
unit_divisor=1024,
) as bar:
self.logger.debug(f'Actual download iteration start')
for chunk in r.iter_content(chunk_size=10*1024):
written_bytes = f.write(chunk)
bar.update(written_bytes)
total_written_bytes += written_bytes
return destination, total_written_bytes
Could this be done natively by caracara ? I'm no asyncio expert but there's some http + file magic to be done here imo.
Thanks !
The User-Agent header currently does not default to specifying Caracara, so it will fall back to crowdstrike-falconpy/version
.
The default for this value should be crowdstrike-caracara/version
.
When provided as a keyword, this value should be provided value (crowdstrike-caracara/version)
.
All supported
All supported
All supported
<= 0.1.0
Describe the bug
Ability to develop a class that can be use natively to handle foreign cid with several api keys.
Through working on #41 and trying to add filters for IOA Rule Groups, I encountered this issue.
The CrowdStrike API consists of lots of fragmented services, which occasionally have inconsistencies. One of the inconsistencies is in choice of filter attributes, for example:
platform
(source), which should be one of three values windows
, linux
or mac
(case-sensitive).platform_name
(source) which the docs state should be one of Windows
, Linux
or Mac
, but testing brings that it can also be Android
(which are all also case-sensitive).Hiding this kind of complexity is what caracara is supposed to do, so we have to design a good abstraction.
FalconFilterAttribute
, which helps by providing ways to easily provide validation that a filter is valid, as well as to hide the raw FQL name and use a more use friendly name. For example (here), platform_name
is validated to only be one of Windows
, Mac
or Linux
, and the user-facing name is OS
.Client
class here, providing a customised version of the FalconFilter
class to the userFalconFilter
.platform
and platform_name
example above).client
, creates a new filter with filters = client.FalconFilter()
filters.create_new_filter("OS", "Windows")
For example, instead of having a unified FalconFilter
, you would have separate filters, maybe HostsFalconFilter
and IoaFalconFilter
, so this separation is presented to the user directly. We can still provide unified names (i.e. OS
becomes platform_name
if used with HostsFalconFilter
and becomes platform
is used with IoaFalconFilter
) to bring a degree of unity, but still expose to the user that filters are different depending on the data you're querying
i.e. the user would still create using filters.create_new_filter("OS", "Windows")
or something similar, and if the user used it for filtering hosts, it would translate to platform_name:'Windows'
in FQL, but if the user used it for filtering IOAs it would translate to platform:'windows'
when used.
platform_name
and platform
are easily reconciled, but I haven't scoured the entire API to look for harder examples)Whilst we're talking about redesigning how filters work, here's some other problems we could solve!
FalconFilter
(a kind of collection of filters), and the filters within (added via create_new_filter).
FalconFilterGroup
(if you can think of a better one let me know!). This also has the plus of fitting better with the variable name commonly used in caracara to refer to an instance of FalconFilter
: filters
(which is used in place of filter
since filter
is a python builtin).FalconFilter
being the filter, but then renaming create_new_filter
to be something like create_new_condition
.Feel free to comment with any ideas!
When using the examples/get_devices example
, you are only returned 500 results (the DATA_BATCH_SIZE) regardless of the number of hosts available within the tenant.
examples/get_devices
example.All hosts within the tenant are returned.
All supported.
All supported.
All supported.
<= 0.1.0
Right now, we use the bullet
library for our examples. This project appears to be unmaintained and, in line with our work to move from pick
to prompt_toolkit
in Falcon Toolkit, we should also move to Prompt Toolkit here too to reduce the number of overall dependencies and risk.
Queued sessions are not described by describe-queued-sessions
Queued session are enumerated.
Please provide your operating system type and version. Example: Red Hat Enterprise Linux 8.3
Python 3.10.5
Poetry (version 1.4.1)
$ pip freeze | grep -E '(caracara|falconpy)'
-e git+https://github.com/CrowdStrike/caracara/@0dd2bd265889e1421346f4a8ac58df73642c21c9#egg=caracara
crowdstrike-falconpy==1.2.11
So, the problem is that in rtr.py _get_queued_session_ids you used "1" as a constant for True, and that makes the whole FQL filter invalid, and it yields nothing.
$ ./test.py RTR_ListAllSessions -p '{"filter":"offline_queued: True+deleted_at: null"}' -q | jq '.body.resources|length'
8
$ ./test.py RTR_ListAllSessions -q | jq '.body.resources|length'
18
$ ./test.py RTR_ListAllSessions -p '{"filter":"offline_queued: 1+deleted_at: null"}' -q | jq '.body.resources|length'
0
$ ./test.py RTR_ListAllSessions -p '{"filter":"offline_queued: zemlkqfjsqdmlkf+deleted_at: null"}' -q | jq '.body.resources|length'
0
Patching the code does show valid output
RTRApiModule: Searching for RTR sessions based on filter string: offline_queued: True+deleted_at: null
..
( before : caracara.common.batching: Batch data retrieval for list_sessions (0 items) )
( now : caracara.common.batching: Batch data retrieval for list_sessions (8 items) )
caracara.common.batching: ThreadPoolExecutor-0_0 | Batch worker started with a list of 8 items. Function: list_sessions
Once this is patched, well, it still shows nothing; but I guess we're facing another issue.
Thanks for the API wrapper btw, sorting out these different pagination methods is not really straightforwards otherwise.
(caracara-py3.10) $ git diff
diff --git a/caracara/modules/rtr/rtr.py b/caracara/modules/rtr/rtr.py
index b4f9e3e..ad88bfb 100644
--- a/caracara/modules/rtr/rtr.py
+++ b/caracara/modules/rtr/rtr.py
@@ -87,7 +87,7 @@ class RTRApiModule(FalconApiModule):
List[str]: A list of IDs of all queued RTR sessions discovered.
"""
self.logger.info("Searching for queued RTR sessions")
- filter_str = "offline_queued: 1+deleted_at: null"
+ filter_str = "offline_queued: True+deleted_at: null"
session_ids = self._search_sessions(filters=filter_str)
return session_ids
Some caracara modules do not carry the "caracara" name in their logging scope. This causes logging.getLogger('caracara').setLevel(logging.WARNING)
to only affect some parts of caracara.
2023-04-12 09:24:16,742 CustomIoaApiModule Initialising API module: CustomIoaApiModule
2023-04-12 09:24:16,742 FlightControlApiModule Initialising API module: FlightControlApiModule
2023-04-12 09:24:16,742 FlightControlApiModule Configuring the FalconPy Flight Control API
2023-04-12 09:24:16,743 HostsApiModule Initialising API module: HostsApiModule
2023-04-12 09:24:16,743 HostsApiModule Configuring the FalconPy Hosts API
2023-04-12 09:24:16,743 HostsApiModule Configuring the FalconPy Host Group API
2023-04-12 09:24:16,743 PreventionPoliciesApiModule Initialising API module: PreventionPoliciesApiModule
2023-04-12 09:24:16,745 PreventionPoliciesApiModule Configuring the FalconPy Prevention Policies API
2023-04-12 09:24:16,745 ResponsePoliciesApiModule Initialising API module: ResponsePoliciesApiModule
2023-04-12 09:24:16,745 ResponsePoliciesApiModule Configuring the FalconPy Response Policies API
2023-04-12 09:24:16,746 RTRApiModule Initialising API module: RTRApiModule
Run some code like the following, that just silences the 'caracara' module to a specified level :
# Silence a little bit all this debug noise enabled by default
for caracara_logging_scope in (
'caracara',
#'CustomIoaApiModule',
#'FlightControlApiModule',
#'HostsApiModule',
#'PreventionPoliciesApiModule',
#'RTRApiModule',
#'ResponsePoliciesApiModule',
):
if self.debug:
pass
else:
l = logging.getLogger(caracara_logging_scope)
l.setLevel(logging.WARNING)
It's not much, and we have a workaround ready with the list of modules generating logs added to a hardcoded list. That being said, I have a vague feeling that some PEP might suggest prefixing logging scopes.
All code loaded by caracara receives the same log level
Prefix these logging scopes with caracara
or caracara.modules
. What we do in our code base is just pick the current module and call it a day. Prior to that we even added the class name, but having one class per file makes it simpler. ( I won't comment on my spaghetti code base file lengths :D )
self.logger = logging.getLogger(".".join([
self.__module__,
# self.__class__.__name__,
]))
Debian bookworm
Python 3.10.5
Poetry (version 1.4.1)
$ pip freeze | grep -E '(caracara|falconpy)'
caracara==0.2.2
crowdstrike-falconpy==1.2.12
At instanciation time, caracara.client.Client configures itself, sends numerous logs and then fires a POST to https://api.eu-1.crowdstrike.com/oauth2/token to get an API token. Could it be possible to have lazy authentication ? That would mean preparing offline settings, and only trigger network requests when an API operation is required.
The falconpy
behavior is not to request a token unless it's needed ( https://github.com/CrowdStrike/falconpy/blob/main/src/falconpy/api_complete.py#L307 ) ; but I might have read this wrong.
Instanciate a caracara.client.Client class, there's a network call.
No network call unless asked to touch the network
Debian bookworm
3.10.5
1.4.1
$ pip freeze | grep -E '(caracara|falconpy)'
caracara==0.2.2
crowdstrike-falconpy==1.2.12
We have scripts that prepare handlers to request data from various locations, and one of the providers is Caracara. For cases where all the details are already cached offline ( so far it's about session data ) ; we end up instanciating a Client object just in case ; then not using it since all the data we need is in our cache ; and instead of having instantaneous results we have to wait for one (1) HTTP call, cause by the understandable need of caracara.client.Client to request a token when instanciated.
Could it be possible to only call self.api_authentication.token() when needed, even if that means remaining unaware of the base_url
variable for a while ? (which is fine, because you won't ever need it unless authenticated to answer a query ).
Feel free to say this problem is convoluted :D
Cheers,
As mentioned in #80 , we'd like to benefit from caracara's numerous advantages compared to manually hitting the API with falconpy
to enumerate IOC content.
So far, we're hitting indicator_search_v1
to get the identifiers, then iterate over pages of indicator_get_v1
. Could it be possible to have a describe_iocs_raw
somewhere ? (pretty much like describe_rule_groups_raw
)
Thanks !
#81 handles pulling information from the API, but it does not allow any changes to be written back to Falcon.
We intend to add the write support here (as a full CRUD implementation of the User Management APIs), so this Issue will track that effort.
So far caracara's excellent pagination system only exposes prevention policies and remote response policies. There are a few more (ignore the "global_config" one, it's a setting sent back by the API when querying a host policies (under "device_policies"), but there are no associated APIs, I guess that's some kind of default vendor system-wide per-OS policies or smth.
$ ls ./data/crowdstrike/policies.dev.* -1
./data/crowdstrike/policies.dev.device_control.json
./data/crowdstrike/policies.dev.firewall.json
./data/crowdstrike/policies.dev.global_config.json
./data/crowdstrike/policies.dev.prevention.json
./data/crowdstrike/policies.dev.remote_response.json
./data/crowdstrike/policies.dev.sensor_update.json
Please expose a describe_policies_raw
function for all the policy types below :
ptypes = [
'prevention',
'sensor_update',
'device_control',
# 'global_config',
'remote_response',
'firewall'
]
It's not a critical need of ours, we worked out pagination, but not really properly, and not multithreaded. I'd like to get rid of self.enumerate_paginated_api_endpoint
in our own code base , and since there is some bit of knowledge you can't really guess on the type of pagination used behind each API endpoint here I am, asking for per-policy support.
I'm a little bit reluctant to try to hook caracara.common.pagination directly our own code (that does mostly what caracara does, offer management functions over API endpoints) ; mostly because it's tightly integrated with the rest of caracara and that would mean reimplementing a half-baked in-house caracara clone that will get obsolete as soon as these few missing features are implemented.
So far my "developer experience" was enhanced by caracara since I could drop our own pagination function for a few use cases implemented in caracara, assuming pagination is done correctly on caracara's side.
def refresh_ioa_cache(self):
- ioas = self.enumerate_paginated_api_endpoint('query_rule_groups_full', limit=100, sort='modified_on.desc')
+ ioas = self.caracara.custom_ioas.describe_rule_groups_raw()
+ ioas = list(ioas.values())
+ #ioas = self.enumerate_paginated_api_endpoint('query_rule_groups_full', limit=100, sort='modified_on.desc')
self.logger.info(f'Storing {len(ioas)} IOA in {self.ioa_cache_path}')
Here's my shopping list on things I'd like to have enumerated through caracara ( just for reading )
falconpy.RealTimeResponse.list_sessions
not falconpy.RealTimeResponse.list_queued_sessions
and they have a different schema and "pwd" as command while it's not the case, that's another issue right; RTR_ListAllSessions does list all ids but then real data for queued sessions has to be fetched from list_queued_sessions. 🙃 )I'll go open different issues for :
I won't open an issue for the queued session thingy since I'm not comfortable with the issue diagnostic and my associated needs, so far.
Thanks for reading !
As mentioned in #80 , we'd like to benefit from caracara's numerous advantages compared to manually hitting the API with falconpy
to enumerate Users.
So far, we're hitting queryUserV1
to get the identifiers, then iterate over pages of retrieveUsersGETV1
. Could it be possible to have a describe_users_raw
somewhere ?
Thanks !
When enumerating devices with caracara.hosts.describe_devices() , caracara first downloads all the host data, then proceeds to review the "online" state of each of these (btw could we have an option to disable that? it's not needed and takes time when we just want to enumerate hosts).
Nothing goes fine since since all these /devices/entities/online-state/v1
calls are shipping 500 ids in the URL every time, always producing a HTTP 200 OK ERROR 400
{
"errors": [
{
"code": 400,
"message": "request must contain between 0 and 100 ids"
}
],
"meta": {
"powered_by": "cs.agentonline",
"query_time": 0.001833589,
"trace_id": "905f9d10-0d4e-405d-88bd-649b6fb849f2"
},
"resources": []
}
Bonus bug : the last request has more than 500 ids it seems:
$ grep '/dev[^:,]*' req_not_last.dat -aio|tr '&' '\n'|wc -l
500
$ grep '/dev[^:,]*' req_last.dat -aio|tr '&' '\n'|wc -l
822
$ python3 -m pip freeze|grep caracara
caracara==0.3.0
This ends up in a nice little stracktrace when all calls finish:
hosts = self.caracara.hosts.describe_devices()
File "/usr/local/lib/python3.10/dist-packages/caracara/filters/decorators.py", line 56, in wrapper
return func(*_args.args, **_args.kwargs)
File "/usr/local/lib/python3.10/dist-packages/caracara/modules/hosts/hosts.py", line 153, in describe_devices
device_state_data = self.get_online_state(device_ids)
File "/usr/local/lib/python3.10/dist-packages/caracara/modules/hosts/_online_state.py", line 59, in get_online_state
device_online_state_data = batch_get_data(device_ids, self.hosts_api.get_online_state)
File "/usr/local/lib/python3.10/dist-packages/caracara/common/batching.py", line 117, in batch_get_data
raise Exception("At least one thread returned an error: " + str(errors))
Exception: At least one thread returned an error: [{'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}, {'code': 400, 'message': 'request must contain between 0 and 100 ids'}]
Call caracara.hosts.describe_devices()
Devices are described without having everything explode
Debian bookworm
Python 3.10.5
1.4.1
$ python3 -m pip freeze | grep -iE '(caracara|falcon|crowdstrike)'
caracara==0.3.0
crowdstrike-falconpy==1.2.15
falcon-toolkit==3.1.2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.