*************
Writing Rules
*************

Gene rules have been designed to be simple to understand and to parse.
In order not to have to develop a custom rule parser, we have decided to use **JSON**
document as container. Thus every rule, to work properly, has to follow a specific
format which is going to be described in this page.


Getting the format supported by the engine
==========================================

A starting point to understand the format of a rule is to use the ``gene`` command
line utility to get the format supported by your engine::

    gene -template

This command line should return a **JSON** document containing all the fields used
by your current version of ``gene``.

.. note::
  When dealing with **JSON** documents in command line there is an amazing utility
  that you must try out if you do not know it yet. This tool is called ``jq``. If
  you are running Linux you can probably install it from your package manager,
  otherwise visit `jq <https://stedolan.github.io/jq/>`_

Rule Structure
==============

.. code-block:: JSON

  {                                                                 
    "Name": "",
    "Tags": [],
    "Meta": {
      "Events": {},
      "Computers": [],
      "ATTACK": [
        {
          "ID": "",
          "Tactic": "",
          "Reference": ""
        }
      ],
      "Criticality": 0,
      "Disable": false,
      "Filter": false,
      "Schema": "2.0.0"
    },
    "Matches": [],
    "Condition": "",
    "Actions": []
  }

.. note::
  The fields present in the template shown above are the ones used by the engine.
  It means that **any** additional fields will not impact the engine. This
  trick can be used to document the rule. It is a good practice to add information
  such as **Author**, **Comments** and eventual **Links** in the **Meta** section.


.. table:: Field Definition

  +------------+----------------------+----------------------------------------------------+
  | Field      | Type                 | Description                                        |
  +============+======================+====================================================+
  | Name       | string               | Name of the rule                                   |
  |            |                      |                                                    |
  +------------+----------------------+----------------------------------------------------+
  | Tags       | []string             |  Contains a list of tags related to the rule. It   |
  |            |                      |  can be used to group rules                        |
  |            |                      |  according to their tag(s).                        |
  +------------+----------------------+----------------------------------------------------+
  | Meta       | dict                 | Contains a bunch of information related the trigger|
  |            |                      | of the rule. The information in there is used to   |
  |            |                      | match against the "System" section of the Windows  |
  |            |                      | events to speed up the match.                      |
  +------------+----------------------+----------------------------------------------------+
  | Events     | map[string][]int     | List of Windows Event IDs the rule should match    |
  |            |                      | against. If empty the rule will apply against any  |
  |            |                      | Event ID of the ``Channels`` (c.f. see next)       |
  +------------+----------------------+----------------------------------------------------+
  | Computers  | []string             | List of computer names the rule should apply on.   |
  |            |                      | If empty, the rule applies on all the computers.   |
  +------------+----------------------+----------------------------------------------------+
  | ATTACK     | []map[string]string  | List of ATT&CK techniques corresponding to the     |
  |            |                      | detection rule. See `MITRE ATT&CK Integration`_.   |
  +------------+----------------------+----------------------------------------------------+
  |Criticality | 0 < int < 10         | The criticality level attributed to the events     |
  |            |                      | matching the rule. If an event matches several     |
  |            |                      | rules the criticality levels are added between them|
  |            |                      | and will never go above 10.                        |
  +------------+----------------------+----------------------------------------------------+
  | Disable    | bool                 | Boolean value used to disable the rule.            |
  +------------+----------------------+----------------------------------------------------+
  | Filter     | bool                 | Boolean value used to flag this rule as being a    |
  |            |                      | **filter**. A filter rule is used to filter in     |
  |            |                      | some wanted events without assigning any           |
  |            |                      | criticality to them. It can be used to show events |
  |            |                      | bringing contextual information.                   |
  +------------+----------------------+----------------------------------------------------+
  |Schema      | string               | **Schema** version of the rule. This has been      |
  |            |                      | introduced to solve incompatibility issues between |
  |            |                      | the engine and the rules.                          |
  +------------+----------------------+----------------------------------------------------+
  | Matches    | []string             | List of **Matches**, should follow the syntax of   |
  |            |                      | `Matches Format`_                                  |
  +------------+----------------------+----------------------------------------------------+
  | Condition  | string               | String implementing the logic on the **Matches** to|
  |            |                      | trigger the rule. The syntax should be compliant   |
  |            |                      | with `Condition Format`_                           |
  +------------+----------------------+----------------------------------------------------+
  | Actions    | []string             | This field is used to encode **Actions** to be     |
  |            |                      | taken when the rule triggers. It is up to the code |
  |            |                      | making use of the Gene engine to implement         |
  |            |                      | **action handlers**. Gene command-line utility does|
  |            |                      | not implement any **action handler**.              |
  +------------+----------------------+----------------------------------------------------+

.. important::
  The more precise **Events** field, the faster the rule is.
  This information is used to pre-filter relevant events.

Matches Format
--------------

A **Match** can be seen as an atomic check which is done on every Windows Event
(pre-filtered using **Meta** section of the rule) going through the engine. Every
match can be referenced once or more in the **Condition** to create complex
matching rule. Currently, the latest version of the engine supports two kinds of
**Matches**.

.. important::
  It is very important to remember that **Matches** only apply on the fields
  located under the ``EventData`` section of Windows Events.

Field Matches
^^^^^^^^^^^^^

.. warning::
  **Indirect Match** expressions are only available since **v1.6**

A **Field Match** is basically an **equality** or a **regex** check done on a
given **field value**. This kind of **Match** brings flexibility to the engine since
anything can be matched through regular expression. **Field Matches** come in two 
flavours namely **Direct** and **Indirect**. A **Direct Match** is used to match
against values (regex, strings ...) know in advance when the rule is written. 
An **Indirect Match** aims at matching against a value present in another field of the
event.

**Direct Match Syntax:** ``$VAR_NAME: FIELD OPERATOR 'VALUE'``

**Indirect Match Syntax:** ``$VAR_NAME: FIELD = @OTHER_FIELD``

.. table:: Field Match Symbols Definition

  +---------------------+----------------------------------------------------------------+
  | Symbols             | Description                                                    |
  +=====================+================================================================+
  | VAR_NAME            | Name of the variable use to access the result of the **Match** |
  |                     | in the **Condition**, it must be preceded by a ``$``           |
  +---------------------+----------------------------------------------------------------+
  | FIELD | OTHER_FIELD | Field to match with in ``EventData`` section of Windows Events |
  +---------------------+----------------------------------------------------------------+
  | OPERATOR            | Operator to use for the match:                                 |
  |                     |  * ``=`` : equal operator                                      |
  |                     |  * ``~=`` : regexp operator (tells to compile VALUE as a regex)|
  |                     |  * ``>`` : greater than operator (only for `int` fields)       |
  |                     |  * ``<`` : lower than operator (only for `int` fields)         |
  |                     |  * ``&=`` : test flag operator expects the field to be an `int`|
  +---------------------+----------------------------------------------------------------+
  | VALUE               | Must be surrounded by **simple quotes** ``'``. This is the     |
  |                     | **value/regex** to match against to make **$VAR_NAME = true**  |
  +---------------------+----------------------------------------------------------------+

Match Workflow::

            +-------+               +---------+
            | Event |               |  Match  |
            +-------+               +---------+
                |      +----------+      |
                +----> |  Engine  | <----+
                       +----------+
                             |
               +---------------------------+
               | Extracts value from FIELD |
               +---------------------------+
                             |
               +---------------------------+
               |   Does value match VALUE  |
               |   according to OPERATOR ? |
               +---------------------------+
                             |
                             ^
                      YES  /   \  NO
                          /     \
        +------------------+    +-------------------+
        | $VAR_NAME = true |    | $VAR_NAME = false |
        +------------------+    +-------------------+
                          \     /
                           \   /
                             v
                             |
                  +--------------------+
                  | $VAR_NAME value is |
                  |  used in condition |
                  +--------------------+


.. note::
  Any regular expression must follow `Go regexp syntax <https://golang.org/pkg/regexp/syntax/>`_.

Example
"""""""

The following snippet shows a rule used to catch Windows Event log clearing attempts
using ``wevtutil.exe``.

.. code-block:: JSON

    {
    "Name": "EventClearing",
    "Tags": [
      "PostExploit"
    ],
    "Meta": {
      "Events": {
        "Microsoft-Windows-Sysmon/Operational": [
          1
        ]
      },
      "ATTACK": [
        {
          "ID": "T1070",
          "Tactic": "defense-evasion",
          "Reference": "https://attack.mitre.org/techniques/T1070"
        }
      ],
      "Criticality": 8,
      "Schema": "2.0.0"
    },
    "Matches": [
      "$im: Image ~= '(?i:\\\\wevtutil\\.exe$)'",
      "$cmd: CommandLine ~= '(?i: cl | clear-log )'"
    ],
    "Condition": "$im and $cmd",
    "Actions": null
  }


.. warning::
  **Windows path separator** ``\`` **escaping:**
    * When using ``=~`` **operator**: needs to be escaped **twice** ``\\\\`` (one for JSON and one for regex parsers)
    * When using ``=`` **operator**: needs to be escaped **once** ``\\`` (for JSON parser)

The following additional example shows how to detect a suspicious access to ``lsass.exe`` with the help
of the ``&=`` operator. Basically, we want to trigger this alert on any **ProcessAccess**
events targeting ``lsass.exe`` where the **GrantedAccess** contains process
**read access flag 0x10**.

.. code-block:: JSON

    {
    "Name": "SuspiciousLsassAccess",
    "Tags": [
      "Mimikatz",
      "Credentials",
      "Lsass"
    ],
    "Meta": {
      "Events": {
        "Microsoft-Windows-Sysmon/Operational": [
          10
        ]
      },
      "ATTACK": [
        {
          "ID": "T1003",
          "Tactic": "Credential Access",
          "Reference": "https://attack.mitre.org/techniques/T1003/"
        }
      ],
      "Criticality": 8,
      "Schema": "2.0.0"
    },
    "Matches": [
      "$ctwdef: CallTrace ~= '(?i:windows defender)'",
      "$ga: GrantedAccess &= '0x10'",
      "$lsass: TargetImage ~= '(?i:\\\\lsass\\.exe$)'",
      "$wmiprvse: SourceImage ~= '(?i:(?i:C:\\\\Windows\\\\Sys(wow64|tem32)\\\\)wbem\\\\wmiprvse\\.exe)'",
      "$taskmgr: SourceImage ~= '(?i:(?i:C:\\\\Windows\\\\Sys(wow64|tem32)\\\\)taskmgr\\.exe)'",
      "$boot: SourceImage ~= '(?i:C:\\\\Windows\\\\system32\\\\(wininit|csrss)\\.exe)'"
    ],
    "Condition": "$lsass and $ga and !($ctwdef or $wmiprvse or $taskmgr or $boot)",
  }


Container Matches
^^^^^^^^^^^^^^^^^

A **Container Match** is a little bit more advanced since it can be used to extract
a part of a **field value** and check it against a container. For
instance, with this kind of **Match**, we are able to extract a **domain** information
contained in **Windows DNS-Client logs** and check it against a blacklist. Although,
implementing this use case would be possible with **Field Matches**, it
would be much slower due to regex engine. In addition the rule would need to be updated
at every new entry to check. With **Container Match** only the container
(a simple separate file) needs to be updated. The speed is provided by the
container being implemented in a form of a set data structure.

**Syntax:** ``$VAR_NAME: extract('REGEXP', FIELD) in CONTAINER``

.. table:: Container Match Symbols Definition

  +------------+----------------------------------------------------------------+
  | Symbols    | Description                                                    |
  +============+================================================================+
  | VAR_NAME   | Name of the variable used to access the result of the **Match**|
  |            | in the **Condition**, it must be preceded by a ``$``           |
  +------------+----------------------------------------------------------------+
  | FIELD      | Field to extract from                                          |
  +------------+----------------------------------------------------------------+
  | REGEXP     | Regular expression used to extract a value from FIELD and check|
  |            | it against a **CONTAINER**. **REGEXP** must follow **named**   |
  |            | regexp syntax ``(?P<name>re)``                                 |
  +------------+----------------------------------------------------------------+
  | CONTAINER  | Container to use to check the extracted value                  |
  +------------+----------------------------------------------------------------+

.. important::
  * If a rule makes use of an **undefined container**, the rule will be disabled
    at runtime and a warning message will be printed.
  * A given container is shared across all the rules loaded into the engine
  * Any regular expression must follow `Go regexp syntax <https://golang.org/pkg/regexp/syntax/>`_.

Example
"""""""

This rule shows an example of how to extract domains and sub-domains from **Windows
DNS-Client** logs and check it against a blacklist.

.. code-block:: JSON

    {
    "Name": "BlacklistedDomain",
    "Tags": [
      "DNS"
    ],
    "Meta": {
      "Events": {
        "Microsoft-Windows-DNS-Client/Operational": []
      },
      "Criticality": 10,
      "Schema": "2.0.0"
    },
    "Matches": [
      "$domainBL: extract('(?P<dom>\\w+\\.\\w+$)',QueryName) in blacklist'",
      "$subdomainBL: extract('(?P<sub>\\w+\\.\\w+\\.\\w+$)',QueryName) in blacklist'",
      "$subsubdomainBL: extract('(?P<subsub>\\w+\\.\\w+\\.\\w+\\.\\w+$)',QueryName) in blacklist'"
    ],
    "Condition": "$domainBL or $subdomainBL or $subsubdomainBL",
  }

Condition Format
----------------

A condition applies a logic to the different **Matches** defined in the rule.
If the result of the computation of the **Condition** is **true** the event is
considered as matching the rule.

.. table:: Allowed Symbols in Condition

  +---------+----------------------------------------------------------------+
  | Symbols | Description                                                    |
  +=========+================================================================+
  | ``$var``| Variable referencing a **Match**                               |
  +---------+----------------------------------------------------------------+
  | ``()``  | Used to group / prioritize some logical expressions            |
  +---------+----------------------------------------------------------------+
  | ``!``   | Negates a **Match** or a grouped expression                    |
  +---------+----------------------------------------------------------------+
  | ``AND`` | AND logical operator                                           |
  +---------+                                                                |
  | ``and`` |                                                                |
  +---------+                                                                |
  | ``&&``  |                                                                |
  +---------+----------------------------------------------------------------+
  | ``OR``  | OR logical operator                                            |
  +---------+                                                                |
  | ``or``  |                                                                |
  +---------+                                                                |
  | ``||``  |                                                                |
  +---------+----------------------------------------------------------------+

.. important::
  **Matches** are evaluated in real time, in the same order their **variables** appear in the **Condition**.
  So the **variables** order has an impact on the **rule speed**. A good practice is to put first selective **Matches**
  to abort condition evaluation as soon as possible and prevent useless **Matches** to happen.

Example
^^^^^^^

The following rule is used to match suspicious explicit network logons, we can
see an example of a rule where the order of the **variables** in the condition matters.
In this case we first match on ``LogonType``, this makes the condition aborting after 
the first evaluation (as it is mandatory for the condition to be met) for every other ``LogonType``
than **3**.

.. code-block:: JSON

    {
    "Name": "ExplicitNetworkLogon",
    "Tags": [
      "Lateral",
      "Security"
    ],
    "Meta": {
      "Events": {
        "Security": [
          4624
        ]
      },
      "Criticality": 5,
      "Schema": "2.0.0"
    },
    "Matches": [
      "$logt: LogonType = '3'",
      "$user: TargetUserName = 'ANONYMOUS LOGON'",
      "$iplh1: IpAddress = '-'",
      "$iplh2: IpAddress = '127.0.0.1'",
      "$enddol: TargetUserName ~= '\\$$'"
    ],
    "Condition": "$logt and !($user or $iplh1 or $iplh2 or $enddol)",
  }

Regular Expression Templates
----------------------------

.. warning::
  Templates use **TOML** format (c.f. https://toml.io/en/) since **v2.0.0**

Regex templates have been introduced to remove the burden of maintaining rules
sharing the same regular expressions. Let's take a common example of suspicious
binaries we want to create rules on, a basic matching regex would look like this 
``(?i:\\(certutil|rundll32)\.exe)``. Assuming this regex is used in **several rules**,
it is a big burden to update all of them once we want to add a new executable name
in this list. So the idea behind regex template is to centralize such shared regex
inside **configuration file(s)** for easier maintainance.

**File Extension**: ``.toml`` (must be located in rule directory we are using)

**Syntax**
  * **definition:** ``TEMPLATE_NAME = 'REGULAR_EXPRESSION'``
  * **usage in Match:** ``{{TEMPLATE_NAME}}``

Example
^^^^^^^

There is an example of a few **regex templates**

.. code-block:: bash

  # Extensions
  script-exts = '(?i:(\.ps1|\.bat|\.cmd|\.vb|\.vbs|\.vbscript|\.vbe|\.js|\.jse|\.ws|\.wsf))'
  exec-exts = '(?i:(\.acm|\.ax|\.com|\.cpl|\.dic|\.dll|\.drv|\.ds|\.efi|\.exe|\.grm|\.iec|\.ime|\.lex|\.msstyles|\.mui|\.ocx|\.olb|\.rll|\.rs|\.scr|\.sys|\.tlb|\.tsp|\.winmd|\.node))'

  # Exe to monitor
  suspicious = '(?i:\\(certutil|rundll32|powershell|wscript|cscript|cmd|mshta|regsvr32|msbuild|installutil|regasm)\.exe)'


.. important::
  Only `Golang regexp special characters <https://golang.org/pkg/regexp/syntax/>`_ **need to be escaped**.
   * **Windows path separator** ``\`` needs to be escaped only once (i.e. ``\\``) in template definitions.

To make use of the template previously defined

.. code-block:: JSON

  {
    "Name": "HeurDropper",
    "Tags": [
      "Heuristics",
      "CreateFile"
    ],
    "Meta": {
      "Events": {
        "Microsoft-Windows-Sysmon/Operational": [
          11
        ]
      },
      "Criticality": 8,
      "Author": "0xrawsec",
      "Comments": "Experimental rule to detect executable files dropped by common utilities",
      "Schema": "2.0.0"
    },
    "Matches": [
      "$susp: Image ~= '{{suspicious}}$'",
      "$target: TargetFilename ~= '({{exec-exts}}|{{script-exts}})$'",
      "$poltest: TargetFilename ~= '(?i:C:\\\\Users\\\\.*?\\\\AppData\\\\Local\\\\Temp\\\\__PSScriptPolicyTest_.*?\\.ps1)'"
    ],
    "Condition": "$susp and $target and !$poltest"
  }

In order to debug the rules using templates, we have introduced a new feature
in the ``gene`` command line utility. One can use the ``-dump`` command line switch
to dump the rule as it is after template replacement.

.. code-block:: bash

  > gene -dump HeurDropper -r ./gene-rules | jq
    {
    "Name": "HeurDropper",
    "Tags": [
      "Heuristics",
      "CreateFile"
    ],
    "Meta": {
      "Events": {
        "Microsoft-Windows-Sysmon/Operational": [
          11
        ]
      },
      "Criticality": 8,
      "Schema": "2.0.0"
    },
    "Matches": [
      "$susp: Image ~= '(?i:\\\\(certutil|rundll32|powershell|wscript|cscript|cmd|mshta|regsvr32|msbuild|installutil|regasm)\\.exe)$'",
      "$target: TargetFilename ~= '((?i:(\\.acm|\\.ax|\\.com|\\.cpl|\\.dic|\\.dll|\\.drv|\\.ds|\\.efi|\\.exe|\\.grm|\\.iec|\\.ime|\\.lex|\\.msstyles|\\.mui|\\.ocx|\\.olb|\\.rll|\\.rs|\\.scr|\\.sys|\\.tlb|\\.tsp|\\.winmd|\\.node))|(?i:(\\.ps1|\\.bat|\\.cmd|\\.vb|\\.vbs|\\.vbscript|\\.vbe|\\.js|\\.jse|\\.ws|\\.wsf)))$'",
      "$poltest: TargetFilename ~= '(?i:C:\\\\Users\\\\.*?\\\\AppData\\\\Local\\\\Temp\\\\__PSScriptPolicyTest_.*?\\.ps1)'"
    ],
    "Condition": "$susp and $target and !$poltest",
  }

.. note::
  As you can see in the dumped rule, the simple ``\`` becomes ``\\``, this is due
  to **JSON** special characters' encoding.

.. note::
  See how easy it is now, just to add a new extension to the list so that it
  impacts all the rules using this template.

MITRE ATT&CK Integration
------------------------

Gene has full support for the `MITRE ATT&CK <https://attack.mitre.org/>`_ framework through the **ATTACK** field of the
**Meta** section of the rule definition. What is documented there is purely informational
and can be displayed in the alerts reported.

Example
^^^^^^^

Given the following rule matching suspicious ADS creation.

.. code-block:: JSON

  {
    "Name": "ExecutableADS",
    "Tags": [
      "ADS"
    ],
    "Meta": {
      "Events": {
        "Microsoft-Windows-Sysmon/Operational": [
          15
        ]
      },
      "ATTACK": [
        {
          "ID": "T1096",
          "Tactic": "defense-evasion",
          "Reference": "https://attack.mitre.org/techniques/T1096"
        }
      ],
      "Criticality": 10,
      "Schema": "2.0.0"
    },
    "Matches": [
      "$unk:  Hash = 'Unknown'",
      "$impash:  Hash ~= '(?i:(IMPHASH=00000000000000000000000000000000))'"
    ],
    "Condition": "!($impash or $unk)",
  }

The alert reported would look like the following.

.. code-block:: JSON

  {
    "Event": {
      "EventData": {
        "CreationUtcTime": "2018-02-23 13:17:31.176",
        "Hash": "SHA1=E8B4D84A28E5EA17272416EC45726964FDF25883,MD5=09F7401D56F2393C6CA534FF0241A590,SHA256=6766717B8AFAFE46B5FD66C7082CCCE6B382CBEA982C73CB651E35DC8187ACE1,IMPHASH=68E56344CAB250384904953E978B70A9",
        "Image": "C:\\Windows\\system32\\cmd.exe",
        "ProcessGuid": "{49F1AF32-12C5-5A90-0000-00100AEA0B00}",
        "ProcessId": "2100",
        "TargetFilename": "C:\\Users\\CALDUS~1\\AppData\\Local\\Temp\\test.txt:malicious.exe",
        "UtcTime": "2018-02-23 13:17:31.192"
      },
      "GeneInfo": {
        "ATTACK": [
          {
            "ID": "T1096",
            "Tactic": "defense-evasion",
            "Reference": "https://attack.mitre.org/techniques/T1096"
          }
        ],
        "Criticality": 10,
        "Signature": [
          "ExecutableADS"
        ]
      },
      "System": {
        "...": "..."
      }
    }
  }