The Problem with Collecting, Processing, and Analyzing More Security Data
Security teams collect a heck of a lot of data today. ESG research indicates that 38% of organizations collect, process, and analyze more than 10 terabytes of data as part of security operations each month. What types of data? The research indicates that the biggest data sources include firewall logs, log data from other types of security devices, log data from networking devices, data generated by AV tools, user activity logs, application logs, etc.
It’s also worth mentioning that the amount of security data collected continues to grow on an annual basis. In fact, 28% of organizations say they collect, process, and analyze substantially more data today than two years ago, while another 49% of organizations collect, process, and analyze somewhat more data today than two years ago.
Overall, this obsession with security data is a good thing. Somewhere within a growing haystack of data there exist needles of value. In theory then, more data equates to more needles.
Unfortunately, more data comes with a lot of baggage as well. Someone or something must sort through all the data, interpret it, make sense of it, and put it to use. There’s also a fundamental storage challenge here. Do I keep all this data or define some taxonomy of value, keep the valuable data, and throw everything else out? Do I centralize the data or distribute it? Do I store the data on my network or in the cloud? Oh, and how do I manage all this data: RDBMS? Elastic search? Hadoop? SIEM?
New Reality: Security Is A Big Data Application
Let’s face it, security is a big data application so it’s time that the security industry and cybersecurity professionals come together, think through security data problems, and come up with some communal solutions.
Allow me to make five suggestions along these lines:
1. We need to double down on data normalization. Yes, we have some standard formats from organizations like MITRE (i.e., STIX, TAXII, the CVE list, etc.) but the common complaint is that these standards are complex and mostly used in the US Federal Government. We need to create simple standard data envelopes that can be used on most if not all security data. As an example, look no further than Splunk, one of the leading SIEM platforms. If you want to maximize your return on Splunk, the company recommends that you normalize all data using the Common Information Model (CIM) standard. This makes it easier to search, contextualize, and correlate data elements from disparate systems. What we need as an industry is for all security data to adhere to a model like CIM out of the box, making things easier for everyone.
2. All security data should be available through standard APIs. Aside from a common format, all analytics tools, SaaS offerings, and data repositories should provide functionality for data import/export through standard APIs. Here’s a use case of what I’m thinking of: I have SIEM and network analytics tools on my network but I outsource EDR and threat intelligence analytics to SaaS providers. When my SOC team detects a security incident, they should be able to analyze all data from all sources instantly through any tool (or multiple tools) they want to use. We need real-time data import/export through standard APIs to make it easy to ingest data as necessary in real time.
3. Enterprises need a distributed security data management service. In today’s security operations environment, the same data is collected and processed multiple times in different analytics tools. This is extremely wasteful. To bolster the efficiency and effectiveness of security data, all security telemetry should be collected, processed, normalized, and made available through a distributed data management services. To be clear, the data isn’t analyzed here. Instead, it is presented to all types of analytics tools through standard interfaces in a common format. This security data management service should also take care of base level maintenance and security activities like backup/restore, archiving, data compression, encryption, etc. It’s likely that a distributed security data management service would store some data on-premises and then automatically age and archive other data to cheaper storage (tape, cloud, etc.). Note that a distributed security data management services is a one of the layers of ESG’s SOAPA.
4. CISOs must embrace artificial intelligence and machine learning. Given the growth of security data volume, the number of humans who know what the data is, where to get it, what it means, and how to piece it all together is exceedingly small and getting smaller. You could postulate that we’ve actually crossed the line where no human can do this effectively anymore and it would be hard to argue otherwise. It’s time that we let machines do a lot of the multi-layer data analysis, summarize the data for human consumption, and then let people make the difficult choices on what to do next. The good news is that there is a lot of innovation around AI for security and many solutions have reached a point in their evolution where they can be quite useful. The bad news is that there is way too much hype in the market (thanks to the phat cats on Sand Hill Rd.). Recommendation for CISOs: Caveat Emptor for sure, but put ample resources into research, RFIs/RFPs, and proof-of-concept projects.
5. Automate whatever you are comfortable with–and more. Anything that can be automated should be automated. This includes data collection, data normalization, data distribution, data analysis, and automated remediation. Humans should be relegated to the very back-end of the security data cycle, focusing on problematic investigations and decision making.
Let’s face it, well-intentioned security teams are being buried by data today. They go through heroic efforts and do what they can but there is an obvious and logical outcome here: As security data volume grows, security professionals will only be able to derive an incremental amount of value. You could even theorize that additional operational overhead from more security data could actually decrease the value of more data – I see this happening in enterprises today.
To make this data more powerful, we need to make it easier to consume, analyze, and operationalize. It will take the security industry and cybersecurity professionals working collectively to make this happen.