My Impact on Society as a Female Data Scientist

By Xuan Zhao

I recently wrote an article for ITSPmagazine about being a female in a male-dominated world and, more specifically, in the all-male InfoSec industry.

Today, I’ll go more into how my work as a data scientist has a true impact on society by helping protect people from malicious malware and cyber attacks. Our mission at Cylance is to protect every computer, user and thing under the sun, and I’m honored to be a part of the team that helps make that come true.

My current role reminds me of the time I spent working on my Ph.D. We’re allowed to work at the location and times that we need in order to be most productive. I love having that kind of flexibility that allows me to work at a café or at my house if I need to – especially in the future when I have kids, it would mean even more to me, as a woman, to have a flexible and understanding work environment.

When it comes to my day-to-day role, most of my time is spent working on algorithms that can detect and predict malicious code, and prevent it from executing – or as we call it, zero-day prevention. As a data scientist, I’m interested in dissecting every file to determine its malicious details. With the help of machine learning, this data analysis process is dramatically shortened. A machine learning algorithm can select and prioritize the samples to analyze. It can also extract the malicious or suspicious details out for the malware analyst to inspect.

During a talk on applied machine learning that my coworkers Brian Wallace, Matt Wolff and I gave at Black Hat 2016, we discussed a tool called the Interactive Clustering Tool. This tool is available as an open source to the security community, as part of our efforts to have a positive impact on society. The clustering tool can group similar samples together, and the groups are made in correlation with the malware analysts to ensure that the security researchers are satisfied with the grouping.

After all the samples are grouped, the software belonging to the same group should follow similar patterns, so the security researchers can choose to only analyze a sample of one or two files from a group, instead of analyzing every single file. Below is a screenshot of the Interactive Clustering Tool we presented at Black Hat. The different colors indicate different clusters.

Machine learning techniques can also extract the malicious or suspicious details for the malware analysts to inspect so that they don’t have to go and find those details by themselves. I wrote a paper detecting malicious-looking filenames titled Evaluating Randomness in Cyber Attack Textual Artifacts, which describes a machine learning algorithm that can automatically tell if a filepath/filename is generated by computer algorithm or manually.

If it’s randomly generated by computer algorithm, it indicates that the executable dropping/copying/creating of a file with that filename has a higher probability of being malicious. We can present this clue to the security analyst to eliminate their having to read numerous reports and dig for unnecessary details.

So in summary, machine learning can shorten the list of files that malware analysts need to analyze – and as a data scientist, I’m very thankful for that. It can also do data mining and find the details and indicators that malware analysts are interested in seeing, to avoid unnecessary efforts by them.

Some people today are nervous of the impact that machine learning will have on society in relation to eliminating human jobs. However, I see machine learning as a beneficial addition to the work I do, rather than a negative thing. I think I’m working hard to protect society from malicious attacks via machine learning that helps humans make smarter decisions.

We are already seeing, with the use of our machine learning/artificial intelligence product, that this type of technology can identify malicious code much faster than human analysts can. It’s exciting to me, as a data scientist, to work on the evolution of a technology that can make the daily work of information security teams more efficient and focused and help them prevent the successful execution of malicious cyber attacks.


Be sure to read Xuan's Equal Respect article - it's equally impressive


About Xuan Zhao

Xuan Zhao is a Data Scientist at Cylance, where she explores AI related research topics and their applications to the computer security space. Her specific research includes advanced machine learning topics, including work in the deep learning space.

More About Xuan