otrf / ossem Goto Github PK
View Code? Open in Web Editor NEWOpen Source Security Events Metadata (OSSEM)
License: MIT License
Open Source Security Events Metadata (OSSEM)
License: MIT License
Problem
With ETW on-boarding, we now have multiple data dictionaries with the same 'event code' but different versions. The version is available in the filename, and in the tags array of the event.
Those consuming data dictionaries YML, will find that it is complex to distinguish between events with the same 'event code', specially when filtering for specific versions of a data dictionary. Parsing an array to retrieve the data dictionary version is cumbersome, since it introduces additional complexity.
Example
Because OSSEM data dictionaries are atomic file entities, the workaround to avoid file name conflicts as to append the version number to the data dictionary file name with '_v#', for example: event-4624_v1.yml.
If no version information is available, the filename only contains the event code, for example event-4624.yml
Proposal
PowerShell 4104 needs to match HELK. Few fields that we parse from here that are not in here.
Also I believe powershell_Path should be powershell_path
@Cyb3rWard0g probably going to need your input on this, since the original parser you had some modifications - so just wanted to get your final on it.
Hey, I saw that feedback was asked for regarding contributing. I'm the author of a tool, Grapl:
https://github.com/insanitybit/grapl
I've decided to adopt a schema that is heavily based on the CIM description here (it's in a branch currently), with only minor changes to support a bit more of a 'graph' feel. As two examples,
So it's mostly just a subset.
I chose this over CAR for a few reasons - I found the naming to be more general, and I liked that things such as digital signatures were attached to files, and not processes.
I thought this feedback might be of interest to you. Thanks for putting this project together.
I will say though, I hope that this stabilizes soon. If it takes a long time I will probably end up not bother to make any breaking updates and it would be a shame to diverge.
We're using OSSEM as a standard model for the platform we're building. Currently OSSEM does not define field names for MAC addresses. We would like to add those.
There is an extension mechanism for entities, in order not to duplicate field definitions. It would be good to have such a mechanism for data dictionaries as well. For example, all Zeek network protocol events have fields for source and destination IP and port, which are duplicated across all the data dictionaries; instead, they all could extend a generic dictionary which defines these common fields. What do you think? Is that already part of your plans?
Currently, there is no how-to on how to operate the ossem_converter.py script, that recreates OSSEM MD pages from the YAML source.
We currently have a sub-repo for Common Data Model (CDM) and Detection Data Model (DDM). We should have one sub-repo for data dictionaries (DD). We need to validate the potential impact on different scripts within the main OSSEM repo.
Missing fields from : https://docs.microsoft.com/en-us/windows/security/threat-protection/auditing/event-5145
<Data Name="SubjectUserSid">S-1-5-21-3457937927-2839227994-823803824-1104</Data>
<Data Name="SubjectUserName">dadmin</Data>
<Data Name="SubjectDomainName">CONTOSO</Data>
<Data Name="SubjectLogonId">0x38d34</Data>
<Data Name="ObjectType">File</Data>
<Data Name="IpAddress">fe80::31ea:6c3c:f40d:1973</Data>
<Data Name="IpPort">56926</Data>
<Data Name="ShareName">\\\\\*\\Documents</Data>
<Data Name="ShareLocalPath">\\??\\C:\\Documents</Data>
<Data Name="RelativeTargetName">Bginfo.exe</Data>
<Data Name="AccessMask">0x100081</Data>
<Data Name="AccessList">%%1541 %%4416 %%4423</Data>
<Data Name="AccessReason">%%1541: %%1801 D:(A;;FA;;;WD) %%4416: %%1801 D:(A;;FA;;;WD) %%4423: %%1801 D:(A;;FA;;;WD)</Data>
References Links on entities fields
name: Image
type: string
description: Adding References in this field section [1](http://..com)
Hi @Cyb3rWard0g
Referencing issues raised PR by @lostInSpaceSomewhere from Azure Sentinel Github. since we are generating parser from automated script, it makes sense to update original template to get those changes in the script.
PR : Azure/Azure-Sentinel#1754
Summary of changes required in original template:
| extend Hashes = extract_all(@"(?P<key>\w+)=(?P<value>[a-zA-Z0-9]+)", dynamic(["key","value"]), tostring(EventDetail.[17].["#text"]))
some files within "source" folder are not in the sub-repos. For example, we have more entities within the "source" folder than the Common Data Model sub repo.
Seems like the Elastic Common Schema (ECS) is seeking to solve the same problem: establishing a naming convention with consistent field names across any data source. Just curious what your perspective is on ECS, where you see OSSEM addressing areas missed by ECS, etc. Thanks!
Hi Team,
Why some fields are missing in the yml files?
For example, consider "destination_nat" entity. Here you can find multiple fields:
https://ossemproject.com/cdm/entities/destination_nat.html
However, in the yml file, I just find one field (i.e., original_value):
https://github.com/OTRF/OSSEM-CDM/blob/14c48b27c107abe5a76fbd1bcb16e8bf78882172/schemas/entities/destination_nat.yml
Should not they match together?
The current detection data model (DDM) does not take into consideration mandatory data fields, for example: I want to develop a detection analytic on "win registry key modification", and I require "registry_key_path", "registry_key_value_name" and "registry_key_value_data" to be present. If my EDR solution lacks to provide one of this fields (i.e. "registry_key_value_data"), both the data dictionary (of the EDR in question) and common information model will provide a "win registry" object that lacks a data field needed by the analytic (i.e. "registry_key_value_data").
Is this by design, something you want to keep out of the DDM?
Hey Nate (@Spydernaz) , any entity or concept that you are currently working on that we can use as our initial example to review OSSEM ontology?
There should be a unique device id field, DVC_UUID or something along those lines.
AWS servers would be "Instance ID" for example.
As previously discussed offline, a translation sheet needs to be made for field names - logon_impersonation_level, SIDS, and more.
Hi,
First of all, thank you for taking time to write the python tool ossem_converter.py
to convert to and from markdown and yaml. I am currently contributing aws data sources to the project and have created multiple markdowns for the aws data sources at https://github.com/hunters-forge/OSSEM/tree/aws-datadictionary/data_dictionaries/aws.
I tried using the tool to convert it to yaml before raising PR but was unsuccessful. It seems the code to convert from markdown to yaml is currently commented out (lines: 554-555,560-561,569-578), i tried uncommenting and use it locally but did not work. Before i investigate it further , i thought i should ask.
syntax used after uncommenting. It does not produce any error but also does not produce output files.
: python ossem_converter.py --from-md <aws folder path with markdowns> --to-yml <dest path>
Could you please point me or guide me correct instructions to convert those markdowns in aws folder to yaml with the script if supported ?
Also i have couple of follow-up questions when we do conversions.
Thanks.
Several events that the OSSEM CDM project describes have a sense of direction.
Usually in a network connection, this sense of direction is represented by source
and destination
to describe the origin
of the connection and where the network packets are sent to
.
This concept of direction is not only represented in a network connection
, but also other events such as creation of a process
where an entity interacts with another entity.
Therefore, the OSEEM project is also using the concept of target
instead of destination
when describing an interaction between entities that are not part of a network connection.
We need to provide some documentation for these use cases.
Their might be a mismatch between 2 log definition related to WMI events.
For example:
Sysmon EventID: 20
wmi_consumer_type
vs CONSUMER
wmi_consume_name
vs ESS
We might need to extract and modify fields from the built-in. But I believe that most of the info are present on the EventID: 5861
. Was it done on purpose?
Hello,
In some cases, there are mismatches between the CDM and Data Dictionaries, which is normal for such a young project. When such a case arises, what should be considered correct?
For example of such a mismatch, the full path of the executable file of a process is called process_file_path
in the CDM, but process_path
in most of the data dictionaries where it appears.
Cheers
There are no entities defined in the CDM for scheduled tasks or services as far as I can see. While scheduled tasks is a Windows name, they are generic concepts, with cron as a linux equivalent; and services have a direct equivalent in linux, and I guess in a lot of different systems as well.
Hey guys,
I see that you have defined the Common Data Model as a YAML to help with readability etc, but I was wondering if it would be worthwhile to describe these models as an ontology. It might also help describe the relationships between elements. I was already looking at describing a series of ontologies that relate attacks and the sort of data you would require to detect it. Would this be something of interest ?
CIM entities don't use this column, I'll remove it from the entities template.
Hello,
the
In some Windows Security logs concerning Object Access, the field (e.g. 4656) AccessList is translated into user_privilege_list
while for others it is object_access_list
. Which one is right?
PS: Is opening issues on this repo the right procedure for issues like this? Is there something you would prefer?
We can keep a list of entities that we think are useful and have community feedback
I am not sure if this is a mistake, or how it should be interpreted, but event_category_type
can be found twice in the event attributes:
Name | Type | Description | Sample Value |
---|---|---|---|
event_category_type | string | A description of the event, which can help with categorization. If the vendor defines a category/grouping for its log. i.e. Zeek has a few category types for its many logs (network-protocols, network-observations, etc...). Example. sysmon event id 12 is EventType field is this. | network-protocols |
event_category_type | string | If the event contains a category, then this it. i.e For the Windows Security channel, this could be something such as Audit object access. For Zeek conn.log, this would be network-protocols. | Audit Object Access |
https://github.com/OTRF/OSSEM/blob/master/docs/cdm/entities/event.md?plain=1#L9-L10
Hi. Any thoughts on using yaml to represent data dictionaries and cim entries? That would help consuming the data. Thanks!
In the Data Dictionary of Windows Security Event 4741, the field UserParameters
is translated into target_host_user_paremeters
(with a typo), and UserAccountControl into target_host_user_account_control
. For Event 4742, the corresponding fields are translated into target_host_parameters
and target_host_account_control
, so with one user
fewer. I haven't been able to find those defined in the CDM; what is the right standard field name?
Hello, as mentioned in other issues, we are working on extending OSSEM coverage for different technologies we are using. One of them is the cowrie honeypot, for which we have reached what we think is a satisfying quality. Could you have a look and let us know whether it seems to match your standards? If so, we could then open a pull request.
The changes are in the cowrie data dictionaries as well as the markdown versions.
I wasn't able to regenerate the general data dictionary markdown README, the ossem_converter
script crashes with FileNotFoundError: [Errno 2] No such file or directory: '.../source/data_dictionaries/aws/readme.yml'
even though said file is present.
Aren't they the same?
Found them on this page https://github.com/hunters-forge/OSSEM/blob/master/detection_data_model/object_relationships.md
Just tracking and so I don’t forget:
network payload/pcap
email entity
geo. include longitude, latitude, location, rack unit, etc
organization. name and uid
https://github.com/OTRF/OSSEM/blob/master/source/detection_data_model/tables/process-object-relationships.yml#L121
extra whitespace: relationship: bound _to relationship: bound_to
file_sha1
file_md5
file_sha256
If there are hashes. Remember, sysmon event_id 1 hashes the file executed.
Sysmon data dictionaries aren't compliant with the latest version of entities. I can start working on a compliant version right now, but @Cyb3rWard0g mentioned in other issues that you were reviewed all Windows events, so I want to make sure you haven't something ongoing on your side, to not duplicate work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.