Hello Crawler 2.0! How we improved our core service and what this means for your scan results


If you follow our blog, you might have already seen an announcement introducing our updated core service with a new crawler. More reliable, more thorough, more deterministic, and with better coverage – sounds great, but what does it all mean? Find out how our engineers, who aim to build the world’s best and most thorough security scanner, have brought to life a new crawler that gives you even better results and helps you stay safe.

Why it was time to say goodbye to the old crawler

Natasha Lazarova, software engineer

The crawler is where the fun begins; after gathering basic information, we crawl your website to identify the pages that will be tested. The quality of the entire scan depends on whether the crawler does its job well, so we wanted to improve crawling consistency and avoid duplicates. Because today’s web is not as static as it was in the past, our goals were to offer better javascript support and page filtering mechanisms.

The old crawler has served us well, but it was time to move on and build a crawler that is better-suited to our customers’ needs. Our software engineer Natasha Lazarova, who has been working on the crawler since April, says: “We started building the new crawler in spring, so it’s a result of more than half a year of hard work. It’s great to see that it’s performing well and that it has improved the quality of the entire service.”

 

Enter our shiny new crawler! What’s new?

Smart page filtering. This may very well be the biggest asset of the new crawler! How does it work? By looking at a few key metrics, for example

Crawler Visual - Detectify

Visual representation of our crawler in action

client state (cookies and other storage), dom structure and javascripts, we can filter properly. The smart page filtering has cut scan times, which is why your scan might take less time than it used to. Don’t worry, we are still carefully combing through your site, a shorter scan time simply means that smart page filtering is working its magic.

Improved javascript rendering support. We have drastically improved our javascript coverage, which means we can now render dynamically created DOM structures. Natasha explains: “Thanks to javascript rendering, we can now crawl corners of your website that we couldn’t crawl before.” Sorry Crawler 1.0, but this is something you simply couldn’t handle!

High level of configurability. The new crawler offers a bunch of new configuration options that we will be rolling out in the future for our power users. This includes options related to cookies, headers, connection timeouts, javascript timeouts, and allowing crawling actions (buttons/forms/links etc).

Improved crawl consistency. Consistent crawls are crucial for high quality scan results. “The crawling is now done sequentially, which makes it more consistent. Unless you have changed something on your website, two consecutive crawls should be the same,” Natasha says.

What’s next?

Now that the new crawler is up and running, Detectify’s engineers are working on optimizing other parts of the scanner: “The goal is to continuously work on making the scanner faster, produce better results and test for more vulnerabilities. We are also focusing on providing a higher level of transparency, such as the OWASP Top 10 view, where you can see how many tests you have passed or failed.”

Before we released the new crawler, it was available as a beta feature and tested by our awesome customers who kindly provided us with feedback. We would like to thank everyone who gave the new crawler a try while it was still in beta!



Source link