Foreword
This article introduces an engine (a.k.a Gene) we have designed to match signatures in Windows events. Our motivations were driven by some observations done during several incident we have worked on. The first observation is that Windows OS has hundreds of different event logs, which makes very difficult to remember the meaning of all of them. Some event logs can simply characterize that something is going wrong on a system whilst some others are clearly the signature of a compromise. The two previous observations make the study of the Windows very important and sometime decisive for an analysis.
For instance, if one has the proper events enabled one can easily reconstruct 80% of the story of an incident. On the other hand, it can be very hard to find relevant information in this large amount of events. Another point we noticed was that sometimes the indicator of a compromise is a very specific event that one never heard about. This makes the information very difficult to share between the actors of an incident and even worse if we want to share it at a broader scope (CERT, external actors …). At best it ends up by everyone implementing his own tool to search for that specific event. While this approach might be suitable in the rush of an incident, it is neither scalable nor sustainable.
One conclusion out of our observations is that some Windows events are definitely IOCs and we were not aware of any tool capable of checking for those in a generic and efficient way. The other deduction was that there was no mean of sharing the knowledge about those IOCs because there was no appropriate format to do so. This is where our adventure began, we are going to introduce both a rule format and an engine that anyone can use to share and match signatures against Windows events.
Quick reminder about Windows event logs
We won’t come into the details of the Microsoft Windows EVTX file format, if you want to dig deeper on this topic, we invite you to read ([1], [2]). We will quickly remind to the reader what a Windows event looks like in a readable format so that anyone can follow what will be discussed next. As within the EVTX files the events are stored in BinXML format, it is quite common to represent Windows event in XML format as shown below.
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<EventData>
<Data Name='UtcTime'>2018-02-05 18:13:31.315</Data>
<Data Name='ProcessGuid'>{49F1AF32-1053-5A78-0000-00109473DD01}</Data>
<Data Name='ProcessId'>2608</Data>
<Data Name='Image'>C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe</Data>
<Data Name='ImageLoaded'>\\VBOXSVR\golang-win32\src\win32\wevtapi\test\test.test.exe</Data>
<Data Name='FileVersion'>?</Data>
<Data Name='Description'>?</Data>
<Data Name='Product'>?</Data>
<Data Name='Company'>?</Data>
<Data Name='Hashes'>SHA1=62E6250F800ADE743C98B342F4C905C8E64B4A4A,MD5=8E730B5B358DCE3F9F2E773D87BA50F0,SHA256=BA54DDEDFFE1178CA9AD367C286D753A17FD911DC52ED644F73EF0237FC55F84,IMPHASH=2C53CF70BB7ACD75FD60D941F68E3B77</Data>
<Data Name='Signed'>false</Data>
<Data Name='Signature'></Data>
<Data Name='SignatureStatus'>Unavailable</Data>
</EventData>
<System>
<Provider Name='Microsoft-Windows-Sysmon' Guid='{5770385F-C22A-43E0-BF4C-06F5698FFBD9}'/>
<EventID>7</EventID>
<Version>3</Version>
<Level>4</Level>
<Task>7</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime='2018-02-05T18:13:31.511688000Z'/>
<EventRecordID>13185699</EventRecordID>
<Correlation/>
<Execution ProcessID='1404' ThreadID='1872'/>
<Channel>Microsoft-Windows-Sysmon/Operational</Channel>
<Computer>GenEric-PC</Computer>
<Security UserID='S-1-5-18'/>
</System>
</Event>
Under the XML root <Event>
, we notice two nodes, which are <EventData>
and <System>
.
The <System>
node contains global information about the event and
can be seen as a kind of metadata of the event. For instance under this node we
can find information such as the <Channel>
identifying the source of the event and
the <EventID>
characterizes the type of the event. The couple formed by the <Channel>
and the <EventID>
uniquely identifies a type of Windows event. For instance the
event above identifies a Sysmon ImageLoad event. One can also find other useful
information like the time at which the event has been created in the Windows event
logging system, most of the time slightly different from the time at which the event
actually occurred.
The <EventData>
node contains information specific to the kind of event so
any type of event has its own <EventData>
definition. Taking the above example
as reference, any other Sysmon ImageLoad event will have exactly the same <Data>
nodes but of course containing different values. Likewise, a different Windows event
like the well known Security Successfull Logon (EventID: 4624) would have a completely
different <EventData>
definition while the <System>
section shape would be the
same.
While the XML format is human readable one could prefer using JSON object for better interoperability. Since there is no one to one translation between XML and JSON, we propose the following translation into JSON for the previously shown XML event.
{
"Event":{
"EventData":{
"UtcTime":"2018-02-05 18:13:31.315",
"ProcessGuid":"{49F1AF32-1053-5A78-0000-00109473DD01}",
"ProcessId":"2608",
"Image":"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
"ImageLoaded":"\\\\VBOXSVR\\golang-win32\\src\\win32\\wevtapi\\test\\test.test.exe",
"FileVersion":"?",
"Description":"?",
"Product":"?",
"Company":"?",
"Hashes":"SHA1=62E6250F800ADE743C98B342F4C905C8E64B4A4A,MD5=8E730B5B358DCE3F9F2E773D87BA50F0,SHA256=BA54DDEDFFE1178CA9AD367C286D753A17FD911DC52ED644F73EF0237FC55F84,IMPHASH=2C53CF70BB7ACD75FD60D941F68E3B77",
"Signed":"false",
"Signature":"",
"SignatureStatus":"Unavailable"
},
"System":{
"Provider":{
"Name":"Microsoft-Windows-Sysmon",
"Guid":"{5770385F-C22A-43E0-BF4C-06F5698FFBD9}"
},
"EventID":"7",
"Version":"3",
"Level":"4",
"Task":"7",
"Opcode":"0",
"Keywords":"0x8000000000000000",
"TimeCreated":{
"SystemTime":"2018-02-05T18:13:31.511688000Z"
},
"EventRecordID":"13185699",
"Correlation":{},
"Execution":{
"ProcessID":"1404",
"ThreadID":"1872"},
"Channel":"Microsoft-Windows-Sysmon/Operational",
"Computer":"GenEric-PC",
"Security":{
"UserID":"S-1-5-18"
}
}
}
}
We introduced the above JSON format since we have also designed our engine in such a way it can handle this format natively.
The rule format
Before designing the engine, we wanted to define the rules’ format. One of our objectives was to make the rule format as straightforward as possible for both writing and understanding. Since we were quite familiar with Yara tool, we immediately thought about having a rule format close to the one used by Yara but adapted to our application. However, we also wanted to avoid any additional layer of parsing, so we chose to encode the rules within JSON objects. This choice would make the format easily parsable by any programing language since JSON is heavily supported. This latter requirements make the format pretty much share oriented and thus fills one of our objective. At the time of designing the rules we had a format in mind close to what is shown afterwards. Considering the event described in the previous section as an example, hereafter would be a rule that matches the event.
{
"Name": "MinimalRule",
"Matches": [
"$comp: Company = '?'",
"$im: ImageLoaded ~= '(?i:\\\\test.test.exe$)'"
],
"Condition": "$comp and $im"
}
In order to enhance both the performances and the features of the engine, we had to add other fields to the rule definition and now a rule format is closer to the following format.
{
"Name": "ExtendedRule",
"Tags": ["Test"],
"Meta": {
"EventIDs": [7],
"Channels": ["Microsoft-Windows-Sysmon/Operational"],
"Computers": [],
"Criticality": 5
},
"Matches": [
"$comp: Company = '?'",
"$im: ImageLoaded ~= '(?i:\\\\test.test.exe$)'"
],
"Condition": "$comp and $im"
}
The Name
field is the name of the rule and should be unique across all the rules
loaded into the engine
The Tags
field is an array of tags characterizing the rule. The aim of this
field is to share tags between different rules so that we can selectively run rules
according to their tags. All rules sharing a given tag can be seen as a group.
The Meta
section of the rule contains information used to identify the
events the rule should apply on. The more detailed and accurate is this section
the quicker the engine will be.
Channels
is the list of Windows channels where we can find the events to match the rule against. If empty the rule applies to all channels.EventIDs
is the list of Event IDs the rule applies on. If empty the rule applies to all event ids.Computers
is the list of computer names the rule should match on. If empty the rule applies to all events.- The
Criticality
is a criticality level attributed to the event matching the rule. If an event matches several rules theCriticality
fields are summed between them.
The Matches
contains the different matches one can use later in the Condition
.
A Match
is in a form of $VAR_NAME: OPERAND OPERATOR 'VALUE'
where:
- The
OPERAND
is the field in theEventData
section of the event that will be checked against theVALUE
- So far the
OPERATOR
only applies onstring
so the value cannot bytyped
- There are two types of
OPERATOR
for theMatches
=
strict match~=
regexp match (following Go regexp syntax)
The Condition
is the logic applied to the Matches
in order to trigger the rule
()
can be used to prioritize the matchesand
andor
logical operations are allowed combine matches between them!
can be used to negate aMatch
. Examples:!$im
,!($im and $comp)
…
The engine (a.k.a Gene)
The engine, as you have guessed, is the software responsible of parsing all the rules
but also matching the rules against the events. The job of the engine is
to report any events matching any of the rules. What is important to
consider here is that every event goes through all the rules loaded in the engine
so the rules have to be designed accordingly (i.e. Meta
section well defined
and make use of efficient regex in Matches
section). Hereafter is a pseudo
algorithm of the match function of the engine.
func match(rule, jsonEvent) bool {
// Handle EventID matching
if jsonEvent.EventID not in rule.EventIDs {
return false
}
// Handle channel matching
if jsonEvent.Channel not in rule.Channels {
return false
}
// Handle computer matching
if jsonEvent.Computer not in rule.Computers {
return false
}
// matchCondition is not detailed here since more complicated
return matchCondition(rule, jsonEvent) // returns true if condition matches jsonEvent else false
}
Since the more costly operations are the Matches
checks, we are making use of
the data defined in the Meta
section of the rule to filter out some events
useless to scan further. Now it is probably more clear why the Meta
section
has to contains as much details as possible. We can see the Meta
section as
a quick filter for the events going trough the engine
The engine is open source and comes with pre-compiled binaries for the most common OS, so if you are interested into knowing more about the engine we invite you to check out the code on our Github.
It’s Time to Play
Now that we have written enough about the concept, it is the time for you to play a bit with the tool through simple example. If you want to learn with an easy use case how to use the engine please download the playground files as well as the Gene engine for your operating system and copy everything in the same directory.
In the gene-playground.zip
there are three files.
- event.json : a single Windows event in JSON format
- minimal-rule.gen : a minimal Gene rule that match event.json
- extended-rule.gen: contains two rules matching event.json
By default Gene expects one or more Windows EVTX file(s) as input but in this example we don’t have
one, so don’t forget the -j
command line switch to instruct the engine to handle JSON formatted
events.
Try to run the following command lines and see the outputs. The following is in bash so feel free to adapt it to your environment.
# Get familiar with the different options
gene -h
# We load only one rule
gene -j -r ./minimal-rule.gen event.json
# We load all the rules in the current directory
gene -j -r ./ event.json
# If we specify - instead of the file, we take stdin
cat event.json | gene -j -r ./ -
You should have noticed that the Windows event printed by Gene is now slightly different
from the one ingested. Basically, there is an additional section named GeneInfo
containing information about the rules that matched the event as shown below.
{
"Event": {
"EventData": {"...": "..."},
"GeneInfo": {
"Criticality": 9,
"Signature": [
"ExtendedRule",
"ComplexButUselessRule",
"MinimalRule"
]
},
"System": {"...": "..."}
}
}
The Criticality
is the sum of all the Criticality
fields set in the rule
that matched bounded to the value of 10. The Signature
field is an array
containing all the names of the rules which matched the given event.
We think that now you have learnt the basics, you can start playing with Gene on your own. Just for the record, you can find some Gene rules on our Github repository. We have also ported the Gene engine into a Windows Host IDS which can be used to report anomalies in real time.
If you like what we do, share it with others and if you do not, share your opinion with us.
We wish you happy analysis, threat hunting and sharing …
References
[1] : https://rawsec.lu/blog/posts/2017/Jun/23/carving-evtx/