Go Evtx SigNature Engine (Gene)

Foreword

This article introduces an engine (a.k.a Gene) we have designed to match signatures in Windows events. Our motivations were driven by some observations done during several incident we have worked on. The first observation is that Windows OS has hundreds of different event logs, which makes very difficult to remember the meaning of all of them. Some event logs can simply characterize that something is going wrong on a system whilst some others are clearly the signature of a compromise. The two previous observations make the study of the Windows very important and sometime decisive for an analysis.

For instance, if one has the proper events enabled one can easily reconstruct 80% of the story of an incident. On the other hand, it can be very hard to find relevant information in this large amount of events. Another point we noticed was that sometimes the indicator of a compromise is a very specific event that one never heard about. This makes the information very difficult to share between the actors of an incident and even worse if we want to share it at a broader scope (CERT, external actors …). At best it ends up by everyone implementing his own tool to search for that specific event. While this approach might be suitable in the rush of an incident, it is neither scalable nor sustainable.

One conclusion out of our observations is that some Windows events are definitely IOCs and we were not aware of any tool capable of checking for those in a generic and efficient way. The other deduction was that there was no mean of sharing the knowledge about those IOCs because there was no appropriate format to do so. This is where our adventure began, we are going to introduce both a rule format and an engine that anyone can use to share and match signatures against Windows events.

Quick reminder about Windows event logs

We won’t come into the details of the Microsoft Windows EVTX file format, if you want to dig deeper on this topic, we invite you to read ([1], [2]). We will quickly remind to the reader what a Windows event looks like in a readable format so that anyone can follow what will be discussed next. As within the EVTX files the events are stored in BinXML format, it is quite common to represent Windows event in XML format as shown below.

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
  <EventData>
    <Data Name='UtcTime'>2018-02-05 18:13:31.315</Data>
    <Data Name='ProcessGuid'>{49F1AF32-1053-5A78-0000-00109473DD01}</Data>
    <Data Name='ProcessId'>2608</Data>
    <Data Name='Image'>C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe</Data>
    <Data Name='ImageLoaded'>\\VBOXSVR\golang-win32\src\win32\wevtapi\test\test.test.exe</Data>
    <Data Name='FileVersion'>?</Data>
    <Data Name='Description'>?</Data>
    <Data Name='Product'>?</Data>
    <Data Name='Company'>?</Data>
    <Data Name='Hashes'>SHA1=62E6250F800ADE743C98B342F4C905C8E64B4A4A,MD5=8E730B5B358DCE3F9F2E773D87BA50F0,SHA256=BA54DDEDFFE1178CA9AD367C286D753A17FD911DC52ED644F73EF0237FC55F84,IMPHASH=2C53CF70BB7ACD75FD60D941F68E3B77</Data>
    <Data Name='Signed'>false</Data>
    <Data Name='Signature'></Data>
    <Data Name='SignatureStatus'>Unavailable</Data>
  </EventData>
  <System>
    <Provider Name='Microsoft-Windows-Sysmon' Guid='{5770385F-C22A-43E0-BF4C-06F5698FFBD9}'/>
    <EventID>7</EventID>
    <Version>3</Version>
    <Level>4</Level>
    <Task>7</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime='2018-02-05T18:13:31.511688000Z'/>
    <EventRecordID>13185699</EventRecordID>
    <Correlation/>
    <Execution ProcessID='1404' ThreadID='1872'/>
    <Channel>Microsoft-Windows-Sysmon/Operational</Channel>
    <Computer>GenEric-PC</Computer>
    <Security UserID='S-1-5-18'/>
  </System>
</Event>

Under the XML root <Event>, we notice two nodes, which are <EventData> and <System>. The <System> node contains global information about the event and can be seen as a kind of metadata of the event. For instance under this node we can find information such as the <Channel> identifying the source of the event and the <EventID> characterizes the type of the event. The couple formed by the <Channel> and the <EventID> uniquely identifies a type of Windows event. For instance the event above identifies a Sysmon ImageLoad event. One can also find other useful information like the time at which the event has been created in the Windows event logging system, most of the time slightly different from the time at which the event actually occurred.

The <EventData> node contains information specific to the kind of event so any type of event has its own <EventData> definition. Taking the above example as reference, any other Sysmon ImageLoad event will have exactly the same <Data> nodes but of course containing different values. Likewise, a different Windows event like the well known Security Successfull Logon (EventID: 4624) would have a completely different <EventData> definition while the <System> section shape would be the same.

While the XML format is human readable one could prefer using JSON object for better interoperability. Since there is no one to one translation between XML and JSON, we propose the following translation into JSON for the previously shown XML event.

{
  "Event":{
    "EventData":{
      "UtcTime":"2018-02-05 18:13:31.315",
      "ProcessGuid":"{49F1AF32-1053-5A78-0000-00109473DD01}",
      "ProcessId":"2608",
      "Image":"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
      "ImageLoaded":"\\\\VBOXSVR\\golang-win32\\src\\win32\\wevtapi\\test\\test.test.exe",
      "FileVersion":"?",
      "Description":"?",
      "Product":"?",
      "Company":"?",
      "Hashes":"SHA1=62E6250F800ADE743C98B342F4C905C8E64B4A4A,MD5=8E730B5B358DCE3F9F2E773D87BA50F0,SHA256=BA54DDEDFFE1178CA9AD367C286D753A17FD911DC52ED644F73EF0237FC55F84,IMPHASH=2C53CF70BB7ACD75FD60D941F68E3B77",
      "Signed":"false",
      "Signature":"",
      "SignatureStatus":"Unavailable"
    },
    "System":{
      "Provider":{
        "Name":"Microsoft-Windows-Sysmon",
        "Guid":"{5770385F-C22A-43E0-BF4C-06F5698FFBD9}"
      },
      "EventID":"7",
      "Version":"3",
      "Level":"4",
      "Task":"7",
      "Opcode":"0",
      "Keywords":"0x8000000000000000",
      "TimeCreated":{
        "SystemTime":"2018-02-05T18:13:31.511688000Z"
      },
      "EventRecordID":"13185699",
      "Correlation":{},
      "Execution":{
        "ProcessID":"1404",
        "ThreadID":"1872"},
        "Channel":"Microsoft-Windows-Sysmon/Operational",
        "Computer":"GenEric-PC",
        "Security":{
          "UserID":"S-1-5-18"
        }
      }
    }
  }

We introduced the above JSON format since we have also designed our engine in such a way it can handle this format natively.

The rule format

Before designing the engine, we wanted to define the rules’ format. One of our objectives was to make the rule format as straightforward as possible for both writing and understanding. Since we were quite familiar with Yara tool, we immediately thought about having a rule format close to the one used by Yara but adapted to our application. However, we also wanted to avoid any additional layer of parsing, so we chose to encode the rules within JSON objects. This choice would make the format easily parsable by any programing language since JSON is heavily supported. This latter requirements make the format pretty much share oriented and thus fills one of our objective. At the time of designing the rules we had a format in mind close to what is shown afterwards. Considering the event described in the previous section as an example, hereafter would be a rule that matches the event.

{
  "Name": "MinimalRule",
  "Matches": [
    "$comp: Company = '?'",
    "$im: ImageLoaded ~= '(?i:\\\\test.test.exe$)'"
  ],
  "Condition": "$comp and $im"
}

In order to enhance both the performances and the features of the engine, we had to add other fields to the rule definition and now a rule format is closer to the following format.

{
  "Name": "ExtendedRule",
  "Tags": ["Test"],
  "Meta": {
    "EventIDs": [7],
    "Channels": ["Microsoft-Windows-Sysmon/Operational"],
    "Computers": [],
    "Criticality": 5
  },
  "Matches": [
    "$comp: Company = '?'",
    "$im: ImageLoaded ~= '(?i:\\\\test.test.exe$)'"
  ],
  "Condition": "$comp and $im"
}

The Name field is the name of the rule and should be unique across all the rules loaded into the engine

The Tags field is an array of tags characterizing the rule. The aim of this field is to share tags between different rules so that we can selectively run rules according to their tags. All rules sharing a given tag can be seen as a group.

The Meta section of the rule contains information used to identify the events the rule should apply on. The more detailed and accurate is this section the quicker the engine will be.

  • Channels is the list of Windows channels where we can find the events to match the rule against. If empty the rule applies to all channels.
  • EventIDs is the list of Event IDs the rule applies on. If empty the rule applies to all event ids.
  • Computers is the list of computer names the rule should match on. If empty the rule applies to all events.
  • The Criticality is a criticality level attributed to the event matching the rule. If an event matches several rules the Criticality fields are summed between them.

The Matches contains the different matches one can use later in the Condition. A Match is in a form of $VAR_NAME: OPERAND OPERATOR 'VALUE' where:

  • The OPERAND is the field in the EventData section of the event that will be checked against the VALUE
  • So far the OPERATOR only applies on string so the value cannot by typed
  • There are two types of OPERATOR for the Matches

The Condition is the logic applied to the Matches in order to trigger the rule

  • () can be used to prioritize the matches
  • and and or logical operations are allowed combine matches between them
  • ! can be used to negate a Match. Examples: !$im, !($im and $comp)

The engine (a.k.a Gene)

The engine, as you have guessed, is the software responsible of parsing all the rules but also matching the rules against the events. The job of the engine is to report any events matching any of the rules. What is important to consider here is that every event goes through all the rules loaded in the engine so the rules have to be designed accordingly (i.e. Meta section well defined and make use of efficient regex in Matches section). Hereafter is a pseudo algorithm of the match function of the engine.

func match(rule, jsonEvent) bool {

  // Handle EventID matching
  if jsonEvent.EventID not in rule.EventIDs {
    return false
  }

  // Handle channel matching
  if jsonEvent.Channel not in rule.Channels {
    return false
  }

  // Handle computer matching
	if jsonEvent.Computer not in rule.Computers {
    return false
  }

  // matchCondition is not detailed here since more complicated
  return matchCondition(rule, jsonEvent) // returns true if condition matches jsonEvent else false
}

Since the more costly operations are the Matches checks, we are making use of the data defined in the Meta section of the rule to filter out some events useless to scan further. Now it is probably more clear why the Meta section has to contains as much details as possible. We can see the Meta section as a quick filter for the events going trough the engine

The engine is open source and comes with pre-compiled binaries for the most common OS, so if you are interested into knowing more about the engine we invite you to check out the code on our Github.

It’s Time to Play

Now that we have written enough about the concept, it is the time for you to play a bit with the tool through simple example. If you want to learn with an easy use case how to use the engine please download the playground files as well as the Gene engine for your operating system and copy everything in the same directory.

In the gene-playground.zip there are three files.

  • event.json : a single Windows event in JSON format
  • minimal-rule.gen : a minimal Gene rule that match event.json
  • extended-rule.gen: contains two rules matching event.json

By default Gene expects one or more Windows EVTX file(s) as input but in this example we don’t have one, so don’t forget the -j command line switch to instruct the engine to handle JSON formatted events.

Try to run the following command lines and see the outputs. The following is in bash so feel free to adapt it to your environment.

# Get familiar with the different options
gene -h

# We load only one rule
gene -j -r ./minimal-rule.gen event.json

# We load all the rules in the current directory
gene -j -r ./ event.json

# If we specify - instead of the file, we take stdin
cat event.json | gene -j -r ./ -

You should have noticed that the Windows event printed by Gene is now slightly different from the one ingested. Basically, there is an additional section named GeneInfo containing information about the rules that matched the event as shown below.

{
  "Event": {
    "EventData": {"...": "..."},
    "GeneInfo": {
      "Criticality": 9,
      "Signature": [
        "ExtendedRule",
        "ComplexButUselessRule",
        "MinimalRule"
      ]
    },
    "System": {"...": "..."}
  }
}

The Criticality is the sum of all the Criticality fields set in the rule that matched bounded to the value of 10. The Signature field is an array containing all the names of the rules which matched the given event.

We think that now you have learnt the basics, you can start playing with Gene on your own. Just for the record, you can find some Gene rules on our Github repository. We have also ported the Gene engine into a Windows Host IDS which can be used to report anomalies in real time.

If you like what we do, share it with others and if you do not, share your opinion with us.

We wish you happy analysis, threat hunting and sharing …

References

[1] : https://rawsec.lu/blog/posts/2017/Jun/23/carving-evtx/

[2] : https://github.com/libyal/libevtx/blob/master/documentation/Windows%20XML%20Event%20Log%20(EVTX).asciidoc