Logs are not just a stream of information. Logs and events can tell a story about what happened, when, why, how, and who done it. Thus, any company ignoring their logs have a real challenge when dealing with information security.
To help your logs tell the story, it’s best to augment them with other bits of information. Typically, this is done after the fact by an analyst or investigator. The down side to this, is that it’s after the event has happened, and in a lot of scenarios, the augmented data has changed. The IP for a domain name has changed for example.
Beyond that, there is already intelligence lists that provide details on any given IP, Domain name, file hash and other metadata.
In this post, we’ll explore bridging the Collective Intelligence Framework version 2 (CIFv2) and those logs using Logstash.
CIFv2 has grown up a bit since it’s predecessor. The new version stores all it’s data within a project-installed Elasticsearch node. This allows us to access the data by directly querying the Elasticsearch node using pre-existing tools. Note, however, that the install does little to tweak the Elasticsearch configuration and can lead to unexpected results (like attaching itself to other default configured Elasticsearch clusters).
First step is to get and install the CIFv2 project. You can get it from its github page at https://github.com/csirtgadgets/massive-octo-spice. There are installation guides and single script installations available from it’s wiki page. Be aware of it’s minimum requirements.
Once you have the CIFv2 installation in place, we need to identify the data structure.
{ "_shard": 0, "_node": "fan7MZSvSUS6LS3XcwQFfA", "_index": "cif.observables-2015.04.20", "_type": "observables", "_id": "bafb0b47000be58fb6c4f08f29af81704531965b0bd907e57c7f99c69ae194b8", "_score": 1, "fields": { "tags": [ "suspicious" ], "protocol": [ 6 ], "application": [ "http", "https" ], "provider": [ "spamhaus.org" ], "confidence": [ 95 ], "tlp": [ "green" ], "@version": [ 2 ], "lang": [ "EN" ], "firsttime": [ "2015-04-20T04:06:26Z" ], "related": [ "8a8647dfd6b80bda02878afe106735bb15c9c513cfa4b49f5d09df333080771b" ], "id": [ "bafb0b47000be58fb6c4f08f29af81704531965b0bd907e57c7f99c69ae194b8" ], "@timestamp": [ "2015-04-20T04:06:26.651Z" ], "altid": [ "http://www.spamhaus.org/query/dbl?domain=anonymz.com" ], "reporttime": [ "2015-04-20T04:06:26Z" ], "lasttime": [ "2015-04-20T04:06:26Z" ], "altid_tlp": [ "green" ], "otype": [ "fqdn" ], "group": [ "everyone" ], "observable": [ "anonymz.com" ] }, "sort": [ 1 ], "_explanation": { "value": 1, "description": "sum of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } }
Not only do we need to identify what fields we want to augment our logs with, we also want to build our query.
In looking at the dataset, our data of interest is contained within the “observable” field. This contains both IP addresses as well as DNS names. This makes it easy to query; a simple “observable:[IP or HOST]” will do the trick and return all matching records.
Our next task is to combine this with Logstash. Fortunately, Logstash already has an Elasticsearch filter allowing us to query an Elasticsearch cluster and apply the results to our event. The filter defaults to using @timestamp as a sort method, which doesn’t exist in our dataset, so we need to specify not only the host where our Elasticsearch node is, the query, which fields we want but also a sort function. See below for my example.
elasticsearch { hosts => ["ElasticsearchNode"] query => "observable:%{Net.IP.SRC}" fields => ["tags", "CIF.Tags", "tlp", "CIF.Tlp", "confidence", "CIF.Confidence"] sort => "_score:asc" }
Now, you can query the CIFv2 dataset directly from within Logstash, giving more actionable data to work with. In this case, we're searching Elasticsearch for the field observable matching the value of Net.IP.SRC (Our source IP address). This will add the fields: tags, tlk and confidence as CIF.Tags, CIF.Tlk, and CIF.Confidence respectively to our event.
{ "message" => "141.101.113.108", "@version" => "1", "@timestamp" => "2015-04-23T16:05:39.831Z", "type" => "generated", "host" => "homer", "sequence" => 0, "CIF.Tags" => [ [0] "whitelist", [1] "rdata" ], "CIF.Tlp" => "green", "CIF.Confidence" => 12.949 }
Note the caveat, the current Elasticsearch filter only adds the first record. Thus not all known information about the observable will be reflected in the augmentation. Hopfully this will change in a future version.