Pile on the data

“Here’s a link to the data, we need to have something before RSAC, which gives us about… four weeks”. That’s four weeks to process 40GBs of data, inconveniently a little bigger than fits into my laptops memory. Four weeks to establish a few interesting research questions, understand whether the data can answer those questions, and inevitably revise them into answerable ones. Four weeks to calculate statistics, build models, and visualize the results. Just four weeks to compose research questions and models and visualizations into a coherent story. Four weeks to polish that story into a publication ready document. As an academic data scientist whose timelines were generally a bit more fluid, this was…exciting. I wouldn’t have joined Cyentia if this wasn’t the type of excitement that I enjoy.

A data scientist journey to Cyentia

I joined Cyentia in January after leaving a IBM Research. Before that, I spent two years with Big Blue, first as a postdoc then as a Research Staff Member. While there I worked as a data scientist developing new security analytics using machine learning and securing AI systems. I jumped on the deep learning bandwagon and co-authored a couple papers. Before IBM I received my PhD from the University of New Mexico in 2016 under the supervision of Stephanie Forrest. My dissertation research focused on blending complex systems and computer security. This usually involved a fair bit of statistical analysis, and I wrote papers on everything from data breaches to spam to nationstate level cyberwar responses. I even got the chance to write a paper with Cyentia founder Jay Jacobs which sent me on the path to where I am now.

What came of that 40GB of data

The 40GBs of data I mentioned above is a sample of data from RiskRecon that Cyentia analyzed for the Risk Surface Report. At RSAC we released a preview of this report, and in the between then and now we’ve done a lot of work and made some exciting strides in the risk landscape. There is a lot more to come!

