Wednesday, February 5, 2014

WAVSEP Web Application Scanner Benchmark 2014

WAVSEP 2013/2014 Score Chart:
The Web Application Vulnerability Scanners Benchmark
Commercial, SAAS & Open Source Scanners
An Accuracy, Coverage, Versatility, Adaptability, Feature and Price Comparison of 63 Black Box Web Application Vulnerability Scanners and SAAS Services

Part I

Information Security Researcher, Analyst, Tool Author and Speaker

Sponsored by

Multiple content contributions by

February 2014
Assessment Environments: WAVSEP 1.5, WIVET v3-rev148, ZAP-WAVE (WAVSEP integration), various undisclosed verification platforms

sectooladdict-{at}-gmail-{dot}-com


Table of Contents
1. Introduction
2. List of Tested Web Application Scanners
3. Benchmark Overview & Assessment Criteria
4. A Glimpse at the Results of the Benchmark
5. SURPRISE, SURPRISE!
6. How to Read and Use the Results - IMPORTANT
7. Test I - Scanner Versatility - Input Vector Support
8. Test II - WIVET - Coverage via Automated Crawling
9. Introduction to the Various Accuracy Assessments
10. Test III – The Detection Accuracy of Unvalidated Redirect (NEW!)
11. Test IV – The Detection Accuracy of Backup/Hidden Files (NEW!)
12. Test V – The Detection Accuracy of Path Traversal/LFI
13. Test VI – The Detection Accuracy of RFI (XSS via RFI)
14. Test VII – The Detection Accuracy of Reflected XSS
15. Test VIII – The Detection Accuracy of SQL Injection
16. Test IX – Attack Vector Support – Counting Audit Features
17. Test X – Scanner Adaptability - Crawling & Scan Barriers
18. Test XI – Authentication and Usability Feature Comparison
19. Test XII – The Crown Jewel - Results & Features vs. Pricing
20. Additional Comparisons, Built-in Products and Licenses
21. What Changed?
22. Initial Conclusions – Open Source vs. Commercial
23. Verifying The Benchmark Results
24. So What Now?
25. Recommended Reading List: Scanner Benchmarks
26. Acknowledgments
27. Appendix A – List of Tools Not Included In the Test

1. Introduction

Detailed Result Presentation at
Tools, Features, Results, Statistics and Price Comparison
(Delete Cache Prior to Viewing)
A Step by Step Guide for Choosing the Right Web Application Vulnerability Scanner for *You*

It is fashionably late, but the time eventually came.
Months and months of research finally came to fruition with the publication of the yearly WAVSEP benchmark, the fourth one in the series.

It's been a very exciting year for the project… with many new things happening.
  
I'd like to share some of those, as they can put in perspective how the project is progressing:

I've noticed the project was included in many continues integration processes of various commercial vendors, and lately, even in similar processes of open source projects (for example – ZAP).
The same commercial vendors, as well as colleagues and people I met in conferences around the world, brought to my attention that various government institutes and agencies worldwide use the platform as an assessment platform for vulnerability scanners, often as the main one.
I got contacted by many organizations in the financial and technology sector that asked me to help them do the same, and found some time to enhance the platform for that purpose.
I also received source code contributions from multiple project and individuals, as well as support from volunteers, feedback, and plenty of inspiration. 
I even began receiving phone calls, and on multiple occasions, from "angels", relevant companies and investors around the globe, that wanted to know whether or not to invest in vulnerability detection initiatives and products.

With all the support, contribution, and data collected in this research over the years, I believe that soon a subject that still remained obscure could finally be determined –
A simple process that will enable to evaluate the customized ROI per product:

The Return of Investment (ROI) from each product in the category

While were definitely not there yet, not in this article anyway, with each publication, there's less and less missing pieces, and the data collected while preparing for this publication closed a significant portion of the gap.

The assessment covers 12 different aspects of the tools (or 16, if you consider non competitive charts), including two new attack vectors they were not assessed in the past (!), and this time, they were all assigned with recommended priorities that readers can use for evaluation.

The research also managed to finally breach the traditional level 60 cap (the best metaphor a gamer could come up with at 5AM) and add three additional products to the assessment, to a total of 63 different web application vulnerability scanners, including some that were never assessed in the past, and with potential to add more in the near future.

These include a total of 14 commercial products and SAAS services, as well as 49 free and/or opensource projects.

Following its tradition, the research focused on the main module which is usually associated with term "web application vulnerability scanner", and this time, it is in our interest to define this module properly, as well as the difference between it and other modules that may be associated to the same title.

Although the term "web application scanner" meant different things over the years, I believe that dividing its various functionalities into modules, can help understand the focus of this research, as well as properly classify and evaluate the contribution of the various modules in the future.

Since I didn't find any dominant classification, I am going to use a descriptive one for the purpose of this research.

Black-Box web application scanners may contain any of the following modules:

(*) Generic Application-Level Vulnerability Detection Module: a collection of features that attempt to identify generic exposures in the application layer, without prior knowledge about the application and its structure, and while potentially overcoming barriers along the way. This module is the primary focus of this research.

(*) Known Application-Level / Web Server Vulnerability Detection Module: Commonly classified as a CGI scanner (a bit old school for my taste), or a web server scanner, but often using the same classification as the above module – the collection of features that falls under that category attempts to identify vulnerabilities that are known (and/or were published) in a shelf product. This module is NOT covered by this research

Additional modules may include "Generic Vulnerability Exploitation Module", "Known Vulnerability Exploitation Module", a "web site infection" detection module, and others. These too, are not covered by this research, and although many of the tested projects/products contain a couple of types, they are also often implemented in separate products.

So now that were done clarifying and classifying, as always, one last tip:
A lot of the information gathered in this research cannot be presented in graphs, so if you're seeking for the more significant content, you'll have dig in past the charts and graphs. If you're reading 3 graphs and can already declare a winner, you're missing some good stuff along the way.

Try the sections in the main menu with all the fancy words beside them… they usually do the trick.

Update:
During the assessment of Qualys it is highly likely that an optimization mechanism affected the scan results of POST test cases (compared to WAVSEP 2012 results). Although in the case of other vendors disabling similar mechanisms solved the problem, in the case of Qualys this optimization mechanism could not be disabled via the configuration interface. We are currently trying to find solutions to the problem.

2. List of Tested Web Application Scanners

The following commercial scanners were covered in the benchmark:
build 20140113 (Acunetix)
NTOSpider v6.0
builds 773/778
(NT OBJECTives)
Netsparker v3.1.7.0
(Netsparker Ltd, p.k.a Mavituna Security)
IBM AppScan v9.0.0.999 & v8.8.0.0
build 466 (IBM)
WebInspect v10.1.177.0
SecureBase 4.11.00
(HP)
Syhunt Dynamic v5.0.0.7 RC2
(Syhunt)
Burp Suite v1.5.20 (Portswigger)
N-Stalker Enterprise Edition X, build 10.13.11.31 (N-Stalker)
WebCruiser v2.7.0 EE (Janus Security)

The following SAAS services were assessed in the benchmark:
Qualys WAS - during January 2014 (Qualys)
ScanToSecure - during January 2014 (Netsparker Ltd)

The previous results of the following commercial scanners were included in the benchmark, since they were not updated since the previous benchmark (website):
ParosPro v1.9.12 (Milescan)
JSky v3.5.1-905 (NoSec)
Ammonite v1.2 (RyscCorp)

The following commercial scanners will be updated soon:
Nessus (Tenable Network Security) - Web Scanning Features
The latest versions of following free/open source scanners were re-tested:
Zed Attack Proxy (ZAP) v2.2.2 (OWASP)
IronWASP v 0.9.7.4 (Lavakumar Kuppan)
W3AF v1.6
revision 5460aa0377
(The W3AF team)
arachni v 0.4.6
(Tasos Laskos)
Skipfish v2.10b
(Google)
WATOBO v 0.9.19
(Andreas Schmidt)
VEGA v1.1 beta
build 108 (Subgraph)
Wapiti v2.3.0
(Nicolas Surribas)
XSSer v1.6-1
(OWASP)
Netsparker Community Edition v 3.1.6.0
(Netsparker Ltd)
N-Stalker 2012 Free Edition v 10.13.11.31
(N-Stalker)
Syhunt Mini v4.4.3.0 (Syhunt)
(p.k.a Sandcat Mini)

New aspects of the following open source scanners were tested in the benchmark:
Andiparos v1.0.6
(Compass Security)
Paros Proxy v3.2.13
(Milescan)

The previous results of the following free scanners were included in the benchmark, since they were not updated since the previous benchmark (website):
Acunetix Free Edition v8.0-20120509 (Acunetix)
N-Stalker 2009 Free Edition v7.0.0.223
(N-Stalker)
WebSecurify v0.9
latest free edition (GNUCITIZEN)
Sandcat Free Edition v4.0.0.1 (Syhunt)
WebCruiser v2.4.2 FE
JSKY Free Edition v1.0.0
Scrawler v1.0
(HP)
Safe3WVS v10.1 FE (Safe3 Network Center)


The results of the following open source scanners were included but not re-verified:
sqlmap v1.0-Jul-5-2012 (Github) – already achieved mastery in its supported feature
DSSS (Damn Simple SQLi Scanner) v0.1h0.2h exists, will be tested in the future
aidSQL 02062011 – newer version released in 2013-05-27, will be tested in the future

The results were compared to those of unmaintained scanners tested in the past:
ProxyStrike v2.2
Grendel Scan v1.0
PowerFuzzer v1.0
Oedipus v1.8.1 (v1.8.3 is around somewhere)
Xcobra v0.2
XSSploit v0.5
UWSS (Uber Web Security Scanner) v0.0.2
Grabber v0.1
WebScarab v20100820
Mini MySqlat0r v0.5
WSTool v0.14001
crawlfish v0.92
Gamja v1.6
iScan v0.1
LoverBoy v1.0
openAcunetix v0.1
ScreamingCSS v1.02
Secubat v0.5
SQID (SQL Injection Digger) v0.3
SQLiX v1.0
VulnDetector v0.0.2
Web Injection Scanner (WIS) v0.4
XSSS v0.40
Priamos v1.0

For a full list of commercial & open source tools that were not tested in this benchmark, refer to the appendix.

3. Benchmark Overview & Assessment Criteria
The benchmark focused on testing commercial & open source tools that are able to detect (and not necessarily exploit) security vulnerabilities on a wide range of URLs, and thus, each tool tested was required to support the following features:

(*)The ability to detect Reflected XSS and/or SQL Injection and/or Path Traversal/Local File Inclusion/Remote File Inclusion vulnerabilities.
(*)The ability to scan multiple URLs at once (using either a crawler/spider feature, URL/Log file parsing feature or a built-in proxy).
(*)The ability to control and limit the scan to internal or external host (domain/IP).

The testing procedure of all the tools included the following phases:

Feature Documentation
The features of each scanner were documented and compared, according to documentation, configuration, plugins and information received from the vendor. The features were then divided into groups, which were used to compose various hierarchal charts.

Accuracy Assessment
The fact that a scanner supports a certain category of tests, does not say anything on HOW WELL it is able to detect the supported issues. The purpose of the accuracy assessment is to see how effective each scanner is in detecting a variety of vulnerabilities, and to see whether or not the detection logic "settles" for simple scenarios, or covers a collection of common and advanced scenarios.

The scanners were all tested against the latest version of WAVSEP (v1.5), a benchmarking platform designed to assess the detection accuracy of web application scanners, which was released alongside the publication of this benchmark.
The purpose of WAVSEP’s test cases is to provide a scale for understanding which detection barriers each scanning tool can bypass, and which common vulnerability variations can be detected by each tool.
WAVSEP 1.5 added includes test cases from ZAP-WAVE, code contributions from various volunteers and a collection of 250+ NEW test cases for two new exposures: unvalidated redirect and obsolete/hidden files.

The various scanners were tested against the following test cases (GET/POST):
  • 60 test cases that were vulnerable to Phishing via Unvalidated Redirect.
  • 184 test cases that included Hidden, Obsolete and Backup files.
  • 816 test cases that were vulnerable to Path Traversal attacks.
  • 108 test cases that were vulnerable to (XSS via) Remote File Inclusion attacks.
  • 66 test cases that were vulnerable to Reflected Cross Site Scripting attacks.
  • 80 test cases that contained Error Disclosing SQL Injection exposures.
  • 46 test cases that contained Blind SQL Injection exposures.
  • 10 test cases that were vulnerable to Time Based SQL Injection attacks.


The various scanners were also tested against a variety of false positive scenarios:
  • 9 different categories of false positive Unvalidated Redirect vulnerabilities.
  • 3 different categories of false positive Obsolete/Hidden/Backup files.
  • 8 different categories of false positive Path Traversal / LFI vulnerabilities.
  • 6 different categories of false positive Remote File Inclusion vulnerabilities.
  • 7 different categories of false positive Reflected XSS vulnerabilities.
  • 10 different categories of false positive SQL Injection vulnerabilities.


Overall, a collection of 1413 vulnerable test cases for 6 different attack vectors, each test case simulating a different and unique scenario that may exist in an application.


Although the testing platform included a variety of experimental test cases for similar and different vulnerabilities (DOM-XSS, information disclosure issues, etc), these were not included in the scope of the benchmark, and their results did not affect the final score.

Attack Surface Coverage Assessment
In order to assess the scanners attack surface coverage, the assessment included tests that measure the efficiency of the scanner's automated crawling mechanism (input vector extraction) , and feature comparisons meant to assess its support for various technologies and its ability to handle different scan barriers.
This section of the benchmark also included the WIVET test (Web Input Vector Extractor Teaser v3-rev148), in which scanners were executed against a dedicated application that can assess their crawling mechanism efficiency in the aspect of input vector extraction. The specific details of this assessment are provided in the relevant section.

Result Verification
In order to ensure the result consistency, the directory of each exposure sub category was individually scanned multiple times using various configurations (for the vast majority of tested products), usually using a single thread and using a scan policy that only included the relevant plugins.

In order to ensure that the detection features of each scanner were truly effective, most of the scanners were tested against an additional benchmarking application that was prone to the same vulnerable test cases as the WAVSEP platform, but had a different design, slightly different behavior and different entry point format, in order to verify that no signatures were used, and that any improvement was due to the enhancement of the scanner's attack tree.

Furthermore, in order to verify that all WIVET results were reliable, the vast majority of tools were also tested against an unpublished online version of WIVET that included additional enhancements that prevent pre-adaptation to the platform URLs (http://wivet.webscantest.com/).

Finally, since the test was performed with the aid of several volunteers, some results were verified by more than one person and on multiple environments.

Making the Results Useful to Vendors
In order to help vendors understand which scenarios were "missed" by their products, the list of identified test cases was documented in detail, for each class of vulnerabilities, and the list of test cases that were missed can be deducted from that list. Since WAVSEP contains detailed documentation on each and every test case, this information can help vendors identify their weaknesses and cover prominent scenarios.
Refer to the scan description section (click the version link) of each scanner in http://www.sectoolmarket.com to locate exactly which test cases were identified by each scanner.

Public tests vs. Obscure tests
In order to make the test as fair as possible, while still enabling the various vendors to show improvement, the benchmark was divided into tests that were publically announced, and tests that were obscure to all vendors:

(*)Publically announced tests: the various feature comparisons, the WIVET assessment and the detection accuracy assessment of the SQL Injection, Reflected Cross Site Scripting, Path Traversal/LFI and (XSS via) Remote File Inclusion were well known to all vendors, and already published as a part of WAVSEP v1.2 (which was available online for the last year and a half).

(*)Tests that were obscure to all vendors until the moment of the publication: the detection accuracy assessment of the Unvalidated Redirect and Obsolete/Hidden File Detection implemented as 256+ NEW test cases in WAVSEP 1.5 (a new version that was only published alongside this benchmark).

The results of the main test categories are presented within three graphs (commercial/SAAS graph, free & open source graph, unified graph), and the detailed information of each test is presented in a dedicated section in benchmark presentation platform at http://www.sectoolmarket.com.

4. A Glimpse to the Results of the Benchmark
This presentation of results in this benchmark, alongside the dedicated result presentation website (http://www.sectoolmarket.com/) and a series of supporting articles and methodologies, are all designed to help the reader to make a decision - to choose the proper product/s or tool/s for the task at hand, within the borders of the time or budget.
A summary of the most significant results can be seen in the following links, and filtered according to the product license (commercial/opensource):
Price & Feature Comparison of Commercial Scanners
Price & Feature Comparison of a Unified List of Commercial, Free and Open Source Products



Some of the sections might not be clear to some of the readers at this phase, especially since many of them contain new conclusions and new results, which is why I advise both veterans and newcomers to read the rest of the article, prior to analyzing this summary.

5. SURPRISE SURPRISE
Although on a general basis – the vast majority of product improve their results from benchmark to benchmark, and this case is not different, this benchmark also has an above -average amount of conflicting results.

More than a few tools that got high results in the previous benchmarks categories, got lesser results in this one – in the same categories, although nothing in the test environment has changed.

Furthermore, some of the new tests were met with… surprising difficulty by the vast majority of the tools in the industry, leading me to believe that many products in the industry had grown to a size which may be challenging to maintain in the future years.

The overall problem is related to product testing and maintenance – the fact that software bugs may cause a variety of crucial features not to function for long periods of time, without anyone being aware of them.

The cost of the mitigating processes to the vendor (or lack of to the consumer!) may be very high, and to the fact that it's very difficult for the consumer to indentify such issues, especially on a periodic basis, can have a major effect.

It's hard to avoid it… all you need to do is take a look at a couple of the "new" charts, and even in some of the "traditional" WAVSEP charts to notice this issue, which I will discuss in details in some of the sections.

This phenomenon is something which I will probably analyze in future publication, and should be a reason to be concerned, especially since unless certain precautions will be taken, will probably become more severe with time.

6. How to Read and Use the Benchmark Results
The practical reader, the one who wants to make use of the information provided in this research to his advantage, can use the following guidelines for interpreting the results, and the following steps to get to practical decisions:

(*) Although it's tempting to look only at the tools at the top, it's important to remember that insignificant differences in results are just that – insignificant – and should be treated accordingly. The benchmark can never cover every single scenario, and a few percents don't always make a product better in a category (although plenty of percents probably do). I would therefore recommend the reader that evaluates a tool to figure out whether or not the tool has a good score in an assessment and in general, instead of falling to the 100% percent trap. That being said, a perfect score certainly isn't bad, so don't take it the other way around either.

(*) When trying to figure which tool you should use, try the following simple methodology:

1. Input Vector and Scan Barrier Support
Figure out if the input delivery method (Test I) used by the application or applications you are using is supported by the scanners you are evaluating. Do the same for the various security mechanisms, technologies and scan barriers that are used in the application (Text X). The scanner won't work at all, or will provide little value if it won't support those.

[Note: pentesters should probably go for a tool that supports enough of those, as the technological barriers they may encounter vary, while other organization may use tools that support only what they need]

2. Crawling & Input Vector Extraction
If you use scanners mainly in a point-and-shoot scenario, and prefer as much automation as possible, a high WIVET score will be the second most important feature you should follow.

[Note: for the most part, most pentesters can deal with a reasonable score as well, although a high one will certainly help, while organizations and QA/DEV departments really need a tool with a high score in this category – especially in 2014]

3. Vulnerability Detection Features and Accuracy
It's hard to say what's more important – so try and keep those in balance. The more accurate and the more feature rich – the better. Bear in mind that an accuracy difference of 1%, 5% or even 10% is NOT necessarily significant, although larger differences might be.

4. Price
No point in buying a product that can't run, isn't automated enough for you (in case you need it), isn't accurate at all (will only result in extra work for you), or doesn't have enough features to justify the price, but once all that out of the way, price is your next criterion. Bear in mind that you can usually negotiate, and that from time to time, prices changes.

5. All the rest
Some features may be special, such as platform specific capabilities, result documentation features, complementary features that can make your life easier, configure your WAF, generate reports for you manager or get you a free trip to mars.
Some of these features may even tip the scale on the expanse of other features, but in the long run, try to stick to that order.

Also note that these are general guidelines, and that if this choice is significant, you might want to consult with an expert to help you evaluate which tools match your needs.

7. Test I - Versatility - Input Vector Support
As I mentioned in previous posts from 2012, after investigating the field of DAST for the past five years, I consider the scanner's support for the tested application input delivery method to be the single MOST significant aspect in the selection process of any scanner.

Reasoning: the input delivery method (a.k.a the input vector) is the method used by the HTML/Flash/Applet/Silverlight application to deliver user-originating input from the client to the server.

These "formats" include common formats such as:

(*)Query String Parameters (URL?param1=value1&param2=value2)
(*)HTTP Body Parameters (param1=value1&param2=value2)

And "modern" formats such as:
(*) JSON Arrays ({"param1":"value1","param2":"value2"})
(*) XML Elements and Attributes (element value)

These methods may also include binary delivery methods for technology specific objects such as AMF, Java serialized objects and WCF, as well as many other input delivery methods.

Since the majority of attacks rely on malicious input being delivered through input parameters to the application, a scanner that is not able to deliver those values to most of the application server entry points WILL NOT be a good choice.
An automated tool can't detect vulnerabilities in a given parameter, if it can't scan the protocol or mimic the application's method of delivering the input.

In fact, lack of support for the dominant input vector used by the application can make the scanner NEARLY USELESS for that specific application (without demoting how useful it may be for other types of applications).

While organizations that stick with specific development technologies only need to verify that the scanner they use supports the input delivery method used by their applications, since in 2013/2014 there is a vast collection of different input delivery methods, versatility becomes a major issue for pentesters, and to some extent for organization that rapidly develop applications in different technologies.

Although the position in this section charts don't necessarily represent the most important score, it is the most important perquisite for the scanner to comply with when scanning a specific technology.

Therefore, the first assessment criterion of this benchmark is the number of input vectors each tool can scan (not just parse), which is a major component in the scanner versatility score.

Important Note
Although, it may seem logical that a scanner that supports an input delivery method will do so consistently, some scanners support for an input vector may be limited to SOME of the vulnerability detection plugins, while the rest may be supported only for basic input delivery methods.
I became aware of this condition after a thorough research, and unfortunately, at the moment there is no sure way to verify which detection capabilities of scanners are actually supported for each input vector, at least not on a large scale, and for the vast majority of scanners.

Since WAVSEP test cases are implemented with either query string or HTTP body parameters, only the support for these vectors was actually verified, and the rest of the information in this section derives from a thorough research that covered the vendor proclaimed results, source code (when possible) and feature documentation.

Future versions of WAVSEP may include test cases to verify the support of scanners for different input vectors.

Before viewing the charts that represent the versatility of different vulnerability scanners, it may be a good time to mention interesting features of two products which are related to this category.

This proclamation does not mean that the author takes a stand as to which product is "the best" (a conclusion that anyone who read my previous benchmarks knows very well not to expect), just that the approach these products take to classify attacks, manage scan scope and present the information to the user can be very beneficial in many situations.

The products I refer to are NTOSpider and Acunetix, and to some extent IronWASP, ZAP and Burp (and products with similar features, in case I forgot any), each taking an interesting approach to input vector support and scan scope management:

(*) NTOSpider enables the user to manage which input vectors should be tested for each attack, therefore presenting which vectors are supported for each attack, information which is very hard to obtain from documentation:




(*) Acunetix presents which attacks are performed per directory, schema, file, etc:




Other tools contained interesting features (with no attack-per-vector info) that provided control over which input vectors will be scanned:

IronWASP input delivery method scope selection in scan wizard:




OWASP ZAP input delivery method scope selection in the configuration window:




Similar features were verified in Burp Suite Pro, and may exist in other products as well.

The more vectors of input delivery that the scanner supports, the more versatile it is in scanning different technologies and applications (assuming it can handle the relevant scan barriers, supports necessary features such as authentication, or alternatively, contains features that can be used to work around the specific limitations).
The detailed comparison of the scanners support for various input delivery methods is documented in detail in the following section of sectoolmarket: http://www.sectoolmarket.com/input-vector-support-unified-list.html

The following charts shows how versatile each scanner is in scanning different input delivery vectors (and although not entirely comprehensive - different technologies):

Result Update (29/03/2014): Appscan, ZAP and arachni reported support for additional input vectors AFTER the original benchmark publication (in the same tested versions). The current charts include these updates, alongside others.

The Number of Input Vectors Supported – Commercial Tools & SAAS





The Number of Input Vectors Supported – Free & Open Source Tools


The Number of Input Vectors Supported – Unified List




Versatility of Open Source Scanners vs. Commercial Scanners in 2014
The vast majority of open source tools tested in 2012 (with the exception of IronWASP) did not support vectors besides the basic GET/POST/Header/Cookie vectors, making the task of using them against "modern" applications that rely on JSON/XML/etc impractical.
However, as the graph proves, certain open source vendors invested efforts in supporting additional input delivery methods in their vulnerability scanning features, and thus, these scanners can be used effectively against applications with "modern" input vectors and technologies.

Although this scenario is rare, and by no means representative, the careful inspector will even identify input delivery methods that are only supported by certain open source projects (for example, ZAPs support for GWT), although the same goes the other way around for many vectors supported by commercial vendors .

8. Test II - WIVET - Crawling Coverage
The second assessment criterion was focused on assessing crawling coverage features, which included the various discovery methods used to increase the attack surface of the tested application: to locate additional resources and input delivery methods to attack.

Although scanners can increase the attack surface in a number of ways, from detecting hidden files to exposing device-specific interfaces (mobile, tablet, etc), this assessment was focused at assessing the automated crawling capabilities and input vector extraction coverage (as opposed to input vector scanning support measured in the previous section) of the various scanners, and is primarily represented using the scanner's WIVET score.

This aspect of a scanner is extremely important in point-and-shoot scans, scans in which the user does not "train" the scanner to recognize the application structure, URLs and requests, either due to time/methodology restrictions, or when the user is not a security expert that knows how to properly use manual crawling with the scanner.

Although users that can afford "training" the scanner to recognize the URL and input sources in the application (by using it as a proxy, for example) don't necessarily require enhanced crawling coverage, organizations and individuals that prefer or require using the web application scanner in an automated manner (point-and-shoot) should consider the crawling coverage / input vector extraction to be of highest importance, second only to the support of the scanner for testing the necessary input delivery vectors.

As mentioned earlier, in order to evaluate these aspects in scanners, I used a project called WIVET (Web Input Vector Extractor Teaser); The WIVET project is a benchmarking project that was written by Bedirhan Urgun, and released under the GPL2 license.
The project is implemented as a web application which aims to "statistically analyze web link extractors", and measures the amount of input vectors extracted by each scanner scanning the WIVET website.

Plainly speaking, the project simply measures how well a scanner is able to crawl the application, and how well can it locate input vectors, by presenting a collection of challenges that contain links, parameters and input delivery methods that the crawling process should locate and extract.

In order for WIVET to work, the scanner must crawl the application while consistently using the same session identifier in its crawling requests, and while avoiding the 100.php logout page (which initializes the session, and thus the results).

The results can then be viewed by accessing the application index page, while using the session identifier used during the scan.

During the tests I used a variety of workarounds designed to "assist" scanners with missing proxy/cookie customization features to scan WIVET, usually by scanning a proxy that forwarded the communication to WIVET while adding consistent session identifiers and restricting the access to the logout page.

The scan configuration used with each scanner against WIVET was documented in detail in the scanners "scan log", and the comparison of the scanners' WIVET score is presented in the following section of sectoolmarket:
http://sectoolmarket.com/wivet-score-unified-list.html

Result Update (29/03/2014): the impressive 96% result of Webinspect can be achieved by selecting the "depth first" mode in the scan wizard. The default option in the wizard is slightly less efficient, but still yields a great result that competes with the best result of any other scanner (94%).

The WIVET Score of Web Application Scanners – Commercial Tools & SAAS 


Due to technical difficulties and time constraints the WIVET results of ScanToSecure are not yet included, it can be assumed to have the same score of Netsparker, since this is the engine at its core.

The WIVET Score of Web Application Scanners – Free and Open Source Tools


The WIVET Score of Web Application Scanners – Unified List



Although the scan success rate was much higher than in previous years, still, some of the scanners were not able to scan this platform despite all my efforts. The score of these projects will be updated as soon as they enhance their crawling mechanisms enough to scan WIVET.

It's crucial to remind the reader that scanners with burp-log parsing features (such sqlmap and IronWASP) can effectively be assigned with the WIVET score of burp, and also that scanners with internal proxy features (such as ZAP, Burp, etc) can be used with the crawling mechanisms of other scanners (such as Netsparker CE).
Thus, any scanner that supports any of those features can be artificially "enhanced" and assigned the WIVET score of any other scanner in the possession of the tester.

9. Introduction to the Accuracy Assessments
The following sections presents the results of the detection accuracy assessments performed for *Unvalidated Redirect*, *Old, Backup and Unreferenced Files*, *Path Traversal / LFI*, *(XSS via) Remote File Inclusion*, *Reflected XSS* and *SQL Injection*, six of the most commonly supported features in web application scanners.

Since two of these assessments are *NEW* to this yearly benchmark (the backup files and unvalidated redirect accuracy assessments - which were not disclosed to the various vendors prior to the publication of this benchmark), two more were new in the 2012 benchmark (the path traversal/LFI and the remote file inclusion accuracy assessments), and two existed in the benchmark from day one (SQL injection and reflected XSS) – there's an interesting combination of results that can help assess the overall scanner's performance.

Sure - the detection accuracy of a specific exposure might not reflect the overall condition of the scanner on its own, but the careful reader can go back and analyze previous benchmarks to identify patterns, and as always, these results serve as a crucial indicator for how good a scanner is at detecting specific vulnerability instances.

The various assessments were performed against the various test cases of WAVSEP v1.5, which emulate different common test case scenarios for generic technologies.

Reasoning: a scanner that is not accurate enough will not be able to identify many exposures, and might classify non-vulnerable entry points as vulnerable. These tests aim to assess how good is each tool at detecting the vulnerabilities it claims to support, in a supported input vector, which is located in a known entry point, without any restrictions that can prevent the tool from operating properly.

These accuracy assessments were also performed under optimal conditions (or at least as optimal as we could create), since the purpose was to see how well the detection logic functions, with no interference from various barriers that can affect it in applications.
Such optimal conditions included scanning relatively small groups of URLs, using a limited amount of threads, defining optimal configuration entries (in some cases), and so on.

Therefore, to reproduce these results, it is necessary to follow the exact instructions listed in the various scan logs included in sectoolmarket.

10. Test III - Unvalidated Redirect Detection
The third assessment criterion was the detection accuracy of Unvalidated Redirect, a common exposure which is also a commonly implemented feature in web application scanners, and most importantly, a NEW TEST in WAVSEP which the vendors were not aware of prior to the publication of this article.

It's also included in OWASP TOP 10 2010 and in OWASP TOP 10 2013, and represents a continued effort to make WAVSEP as compliant as possible with the various OWASP TOP 10 lists.

This score chart is different from the rest because unlike the rest of the detection accuracy charts, it calculates the score only based on QueryString/GET test cases, and does not take into account the HTTP POST test cases.

The reason to include only GET test cases in the score calculation is related to the properties of an unvalidated redirect attack:

It's essentially a phishing enhancing attack which relies on web site redirection features that redirect the browser to user-controlled addresses sent in the input. These attacks eventually redirect the user to an attacker controlled website, while misleading even cautious users that verify the domain address prior to accessing a link.
For example:

Original URL -
Abused URL -

A case could be made to state that since submitting malicious redirect values in POST parameters requires the user to first access an HTML form in an attacker controlled website, than there's no point in performing this attack at all, since the user already "trusted" the attackers website.

In fact, this statement is well ingrained in the perception of many tool authors, which usually don't submit any redirect payloads in POST parameters.

Several arguments can be made against that perception:

(*) Detecting persistent unvalidated redirect attacks (like persistent XSS attacks) in which the payload is "injected" into the database and affects other users, may very well justify sending redirect payloads in POST parameters.

(*) Detecting session-hosted unvalidated redirect attacks and pages in the actual website that embed externally supplied URLs in a form that will later be submitted using POST may justify performing POST tests as well.

Regardless of whether the argument is true or not, due to the lack of support for POST unvalidated redirect tests in most of the tested products, I decided not to include the POST test cases in this benchmark, despite the fact that they are already included in WAVSEP, and despite the various scenarios in which testing POST parameters with unvalidated redirect payloads may lead to valid vulnerabilities (persistent redirect, session redirect, reprinted redirect form, etc).

The POST test cases may however be included in the next benchmark, in one way or the other, and the full results are already included in the relevant scan logs of sectoolmarket.

In order to assess the detection accuracy of different unvalidated redirect instances, I used a total of 30-60 test cases (for 302 redirection, and even for JS redirection). I also used a bunch of false positive test cases, to see how permissive the detection process is.

The comparison of the scanners' unvalidated redirect detection accuracy is documented in detail in the following section of sectoolmarket:

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).

The Unvalidated Redirect Detection Accuracy of Commercial/SAAS Scanners

The Unvalidated Redirect Detection Accuracy of Opensource/Free Scanners


The Unvalidated Redirect Detection Accuracy of Scanners – Unified List





11. Test IV - Backup/Hidden File Detection
The fourth assessment criterion was the detection accuracy of Old, Backup and Unreferenced Files, a very common exposure, that may lead to source code and configuration theft, which is also a commonly implemented feature in web application scanners, and once again, a NEW TEST in WAVSEP which the vendors were not aware of prior to the publication of this article.

This is also the test in which the results are MOST SURPRISING.

To make it clear, this test assessed the capabilities of scanners to locate backup files with non-executable extensions, compressed versions of files and directories that developers may have forgotten, sequential files or copies of files and directories that are remnants of various development tests, and additional hazards that may lead to source code configuration disclosure.

For those of you that doubt the importance of this vector, it's an exposure that as a pen-tester I personally abused to download the entire source code of banks, e-commerce web sites, and credit card companies, obtained connection strings and hard-coded credentials from obsolete source code fragments and configuration files, as well as located numerous hidden entry points that were vulnerable to exposures that the rest of the application was not prone to.

What I'm trying to say is that while some instances of this exposure may yield insignificant results, some severe instance could mean the "game is over" for the application, and expose every server side vulnerability or hidden credential to the attacker.

Back in the old days, I used a collection of tools and lists to identify such issues;
I made heavy use of Sensepost's Wikto with customized lists of files and extensions; I used the backup/hidden file detection features of the earliest published version of W3AF to download the source code of several banks, and from time to time, even suffered through the false positives of the mythical Paros Proxy obsolete file detection features.

However, since then, many open source and commercial tools mastered those attacks, and tried to make the detection task easier.
But as the results obviously show, something bad happened along the way, which is not necessarily related to this specific vulnerability, as much as it is related to a major problem that affects the entire automated vulnerability detection industry.

Insufficient Implementation of
TDD
If there's any obvious conclusion that the reader can conclude from this benchmark, this is probably it:
The is a serious problem in (and therefore insufficient use of) implementations of TDD in the development of many web application vulnerability scanners:

Test Driven Development is a development process in which the software developers invest efforts in writing unit tests for code modules, often even prior to writing the modules themselves, and in which the build process of the product uses these tests to verify the code modules function properly, and that there aren't any unexpected behaviors.

TDD is usually very costly to implement, but in my opinion, pays in the long run – and in many aspects.

Now don't get me wrong, I'm certain that almost all vendors use TDD to some extent, however, after experiencing what I have in this benchmark, I'm also certain its probably insufficient (at least for some products).

And honestly, I find it very hard to blame the vendors.

Allow me to elaborate:
There is a lot of competition in this product category, and new features are often rushed to market as soon as possible. It also takes a major effort to write unit-tests that include network communication and scanning, and to review the results, even for a single vulnerability detection plugin.

Although it makes sense that the same outcome could be accomplished using traditional QA processes, which may very well be true for small-mid scale projects, one need only to look at the insane number of plugins and features in products like Qualys, Appscan, Webinspect and W3AF, to understand the futility of leaving all the testing to humans.

Imagine how much effort it would take to manually test that 200 generic detection plugins function properly… Implementing unit-tests for all those modules isn't a small investment as well.
And what about 50000 signature-based product specific vulnerabilities? How long will it take to manually test that (or develop unit-tests to verify) those features work?

During the testing process, I have seen plugins in several tools which were actually named after the various extensions of obsolete files I was trying to detect in WAVSEP, and still, scanning the platform with some or all of them did not yield results for many tools.

My assumption is that the same problem is also responsible for the results of tools that got 100% in previous benchmarks, and got different results in this bulk of tests, even though the testing framework (WAVSEP/WIVET) did not include any changes in the test cases scanned.

My Assumption:
The various plugins and features are based on a scan engine, and changes made to the engine (or plugins) may cause some of them to malfunction.
Since there wasn't a unit test (or other pre/post build test method) for those plugins, newer versions were released while those plugins were not functioning, maybe even for years, and without anybody knowing about it.

Not so scary when considering , let's say - small scale projects,
But VERY scary when you consider a product update that causes many plugins to malfunction in a scanner with 50000 plugins, which is released after the organization tested it successfully and used it for years, and while the official recommendation of the vendor was to install the update.

The vendor may never know, and the customer/user may only discover the issue after vulnerabilities that the product was suppose to identify will be exploited.

Customers that are currently not aware of a problem, vendors that may never be, and entities that can abuse that problem are a terrible combination… No malice intended.

In order to assess the detection accuracy of different old/backup/hidden file instances, I used a total of 184 test cases (many of them simulating files created in windows XP / windows 7 developer stations, as well as in common Linux flavors such as Ubuntu, Debian and Fedora). I also used three main groups of false positive behaviors - each representing real life scenarios that vulnerability scanners can experience.

The comparison of the scanners' old, backup and unreferenced files detection accuracy is documented in detail in the following section of sectoolmarket:

Note: as mentioned earlier, I saw various features in several of the tested tools that were supposed to identify additional results, but for some reason did not function. My current assumption (and that's all that is – my assumption) is that the reason is related to bugs in the engine or the module of those tools.
As luck (or lack of) would have it, the same problem seemed to persist for many vendors in that specific category of tests.

Disclaimer:
The results of OWASP ZAP in the obsolete file detection test were obtained using an external ZAP extension called Good-Old-Files (GoF - included in ZAP built-in marketplace).
The extension was written by a colleague of mine by the name of Michal Goldstein, and was originally inspired (to the previous extension authors) by various modules in W3AF.
She was not aware of the benchmark, or to the fact I was assessing her project, and when I built the testing platform, I used input from a collection of tools and sources to build the benchmark test-bed, including GoF/W3AF.
Those of you that believe that might have affected the testing process may feel free to ignore the results of that tool.

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).

The Old/Backup/Hidden File Detection Accuracy of Commercial/SAAS Scanners


The Old/Backup/Hidden File Detection Accuracy of Opensource/Free Scanners


The Old/Backup/Hidden File Detection Accuracy of Scanners – Unified List


12. Test V - Path Traversal / LFI Detection
The fifth assessment criterion is identical to the previous benchmark - the detection accuracy of Path Traversal (a.k.a Directory Traversal), an assessment feature that was implemented in WAVSEP v1.2, and tested in the 2012 benchmark for the first time.

It's also the third most commonly implemented attack vector in web application scanners, and a significant attack vector in its own right.

Many scanners had a difficult time locating a variety of traversal test cases in 2012, but this time, the results show a significant improvement in the results of many of the tools, proving that many vendors invested major efforts in improving their products.

Path Traversal vs. Local File Inclusion – Reminder
As I explained in the past, the reason Path Traversal was tagged along with Local File Inclusion (LFI) is simple - many scanners don't make the differentiation between inclusion and traversal, and furthermore, a few online vulnerability documentation sources do. In addition, the results obtained from the tests performed on the vast majority of tools lead to the same conclusion - many plugins listed under the name LFI detected the path traversal test cases.

While implementing the path traversal test cases in 2012 and consuming nearly every relevant piece of documentation I could find on the subject, I decided to take the current path, in spite of some acute differences some of the documentation sources suggested (although I did implemented an infrastructure in WAVSEP for "true" inclusion exposures).

The point is not to get into a discussion of whether or not path traversal, directory traversal and local file inclusion should be classified as the same vulnerability, but simply to explain why in spite of the differences some organizations / classification methods have for these exposures, they were listed under the same name.

The evaluation was performed on a WAVSEP v1.2 instance that was hosted on a windows XP VM, and although there are specific test cases meant to emulate servers that are running with a low privileged OS user accounts (using the servlet context file access method), many of the test cases emulate web servers that are running with administrative user accounts.

[Note - in addition to the wavsep installation, to produce identical results to those of this benchmark, a file by the name of content.ini must be placed in the root installation directory of the tomcat server- which is different than the root directory of the web server.
It’s also crucial to install WAVSEP on windows, and run the tomcat server with administrative privileges, as some of the test cases rely on windows-specific paths or require access to directories outside of the web server scope]

In order to assess the detection accuracy of different path traversal instances, I used a total of 816 path traversal test cases, and a bunch of false positive test cases as well.

The comparison of the scanners' path traversal detection accuracy is documented in detail in the following section of sectoolmarket:

Note:
During the testing of the development version of W3AF (the latest stable I could get was 1.2 which was tested in 2012, and the current development version was 1.6+) I experienced several bugs, specifically bugs that prevented the scanner from scanning HTML forms submitted using HTTP POST (or in short, POST parameters).
One of these bugs was related to the LFI/Path Traversal detection plugin, which caused the scan to crash whenever it was used, after detecting only a few vulnerable test cases.
I tried various methods to overcome that bug artificially, but failed to do so, so I was not able to obtain the actual results of the latest version of W3AF, and thus, decided to use the results from the previous benchmark to represent it's score.
The bugs were reported to the project leader, and hopefully, will be fixed in the future.

I had similar issues trying to use the various LFI/RFI plugins of Qualys, and unfortunately, wasn't able to overcome them and get an actual score by the publication of this benchmark (which is why Qualys is absent from the LFI/RFI charts). I'm currently not sure if the reason is a bug in product or in the configuration used during my testing process.

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).

Result Update (29/03/2014): The results of arachni were improved from 30.88% to 100% (!!!) according to vendor recommendations provided AFTER the original benchmark publication, by using the source code disclosure plugin, in addition to the local file inclusion and path traversal plugins, after verifying that the plugin behavior is relevant to the exposure (the name may deceive), and while using the same version. 

The result of Webinspect were likewise improved from 72.06% to 91.18% by using a custom configuration provided by the vendor AFTER the original benchmark publication, using the same tested version, which included the following plugins: 
  i.       10287 – Local File Include
  ii.      10271 – Local File Inclusion/Reading Vulnerability
  iii.     10272 – Possible Local File Inclusion/Reading Vulnerability
  iv.     11327 – LFI Tomcat
  v.      11332 – LFI IIS

The Path Traversal / LFI Detection Accuracy of Commercial /SAAS Scanners




The Path Traversal / LFI Detection Accuracy of Opensource /Free Scanners


The Path Traversal / LFI Detection Accuracy of Scanners – Unified List



13. Test VI - (XSS via) RFI Detection
The sixth assessment criterion was again, identical to the 2012 benchmark - the detection accuracy of Remote File Inclusion (or more accurately, vectors of RFI that can result in XSS or Phishing - and currently, not necessarily in server code execution), an assessment suite implemented in WAVSEP v1.2, which was tested in the 2012 benchmark for the first time, with interesting results indeed.

A reminder - although in the 2012 benchmark several products identified the vulnerable test cases properly, some products with RFI detection features ignored it completely.

Obviously, 1.5 years after the 2012 publication, that's no longer the case for the vast majority of vendors; the detection accuracy and support for (XSS via) RFI was dramatically improved in many tools, and we – the users – can reap the rewards in penetration tests.

In order to assess the detection accuracy of different remote file inclusion exposures, I used a total of 108 (xss via) remote file inclusion test cases, and as always, a bunch of false positive cases that represent common scenarios.

The comparison of the scanners' (xss via) remote file inclusion detection accuracy is documented in detail in the following section of sectoolmarket:

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).

The (XSS via) RFI Detection Accuracy of Commercial/SAAS Scanners

The (XSS via) RFI Detection Accuracy of Opensource/Free Scanners


The (XSS via) RFI Detection Accuracy of Scanners -Unified List


14. Test VII - Reflected XSS Detection
The seventh assessment criterion has been a part of the yearly WAVSEP assessment for four years now (!), and the results of the various vendors that maintain their tools emphasize that well.
As the title suggests, this section deals with the detection accuracy of Reflected Cross Site Scripting, a very common exposure which is the 2nd most commonly implemented feature in web application vulnerability scanners.

The assessment was performed using 66 different Reflected XSS test cases and a bunch of false positive test cases, and while ignoring the results of the various experimental RXSS test cases included in WAVSEP 1.5 (although the "experimental" results are included in most of the individual tools scan logs in sectoolmarket).

There's not much to say in this section that wasn't already said in previous articles and benchmarks, except to present the current (and generally IMPRESSIVE) results of the various maintained products / projects.

The comparison of the scanners' reflected cross site scripting detection accuracy is documented in detail in the following section of sectoolmarket:

Note
Bugs in certain products seemed to affect their detection accuracy for Reflected XSS, since in the past, these products obtained higher results (notably arachni/W3AF).

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).


Note:
During the assessment of Qualys it is highly likely that an optimization mechanism affected the scan results of POST test cases (compared to WAVSEP 2012 results). Although in the case of other vendors disabling similar mechanisms solved the problem, in the case of Qualys this optimization mechanism could not be disabled via the configuration interface. We are currently trying to find solutions to the problem.

The Reflected XSS Detection Accuracy of Commercial/SAAS Scanners


The Reflected XSS Detection Accuracy of Opensource/Free Scanners


The Reflected XSS Detection Accuracy of Scanners - Unified List


15. Test VIII – SQL Injection Detection Accuracy

15. Test VIII - SQL Injection Detection
The eight assessment criterion was the good old SQL Injection detection accuracy, another assessment suite that's been with us for the last four years (!) of WAVSEP benchmarks.

As one of the most famous exposures (and powerful attacks) and the most commonly implemented attack vector in web application scanners, it's also one of the aspects in which maintained projects showed the greatest improvement over the years.

Although the release of WAVSEP 1.5 includes optional vulnerable SQL injection test cases that were adjusted to support other databases (such as MSSQL, ORACLE, etc – contributed due to the endless generosity of the ZAP team members), due to time constraints, the evaluation was only performed on an application that used MySQL 5.5.x as its data repository, and thus, can only reflect the detection accuracy of the tool when scanning an application that uses similar data repositories.

My assumption however, is that the detection results of error-based test cases and behavior based test cases will be nearly identical if the underlying database will be different, but that there will be a difference for some of the tested tools in test cases that require time-based detection methods (in which some scanners may not support using the appropriate database-specific time delaying function).

The comparison of the scanners' SQL injection detection accuracy is documented in detail in the following section of sectoolmarket:

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).


Note:
During the assessment of Qualys it is highly likely that an optimization mechanism affected the scan results of POST test cases (compared to WAVSEP 2012 results). Although in the case of other vendors disabling similar mechanisms solved the problem, in the case of Qualys this optimization mechanism could not be disabled via the configuration interface. We are currently trying to find solutions to the problem.

The SQL Injection Detection Accuracy of Commercial/SAAS Scanners


The SQL Injection Detection Accuracy of Opensource/Free Scanners


The SQL Injection Detection Accuracy of Scanners – Unified List


16. Test IX - Attack Vector Support
The ninth assessment criterion is the number of audit features each tool supports.

For the purpose of the benchmark, an audit feature was defined as a common generic application-level scanning feature, supporting the detection of exposures which could be used to attack the tested web application, gain access to sensitive assets or attack legitimate clients.

The definition of the assessment criterion rules out product specific exposures and infrastructure related vulnerabilities, while unique and extremely rare features were documented and presented in a different section of this research, and were not taken into account when calculating the results.

Reasoning: An automated tool can't detect an exposure without a code module designed to identify the issue, and therefore, the number of audit features will affect the type (and amount) of exposures that the tool will be able to detect (assuming the audit features are implemented properly, that vulnerable entry points will be detected, that the tool will be able to handle the relevant scan barriers and scanning perquisites, and that the tool will manage to scan the vulnerable input vectors).

Although I typically place the assessment of supported audit features in a position of higher importance in the benchmark, my current research led me to make some changes.

I still consider the amount of supported generic vulnerability detection features (a.k.a audit plugins) to be a very significant aspect, probably more than I ever did.
Unfortunately, I came to the conclusion the current list that the WAVSEP project documents is like a drop in the ocean.

WAVSEP currently contains information on which scanners are relatively more audit-feature rich – relative, as in relation to other projects, not to the actual variety of attacks out there.
Although "relative" may still be very useful to the consumer, in my opinion, it's not as useful to the industry as I had hoped.

Originally, when I created the list of supported audit plugins which is currently used (and covers 32 attack categories at the moment), I composed it from the list of plugins that were commonly supported by scanners at the time (2009-2010).
Although the list was somehow limited, and by no means representative to the overall list of attacks that scanners should detect (and hopefully would one day be able to detect), it was enough to represent the differences between the products.

Five years passed – and many things changed.

Numerous new generic application-level attacks were invented, published or re-classified.
Projects like CWE, CAPEC, OWASP Testing Guide, Attacks and Vulnerabilities, WASC and others added more and more attack classifications, and that's without taking into account the numerous vectors that were published in blogs, conferences and competitions, which often didn't get the attention they deserved.

While the commonly implemented scanning features in scanners were usually derived from feature demands, attack vectors receiving higher levels of "popularity" and publicity, vulnerabilities that the vendors (and to some extent the users) perceived to be the most common or severe, and sometimes some vendor-specific "exotic" vectors, there was never any roadmap that will classify to consumers what was MORE important for vendors to support.

So after figuring that out, prior to the benchmark, I decided to expand my list of attacks and vulnerabilities, so I could properly map the contribution of the various tools against the overall risk map, and during the research stages that preceded this publication, I started researching which vectors that scanners can potentially identify actually exist, and which of those are supported by the individual scanners.


Well, it went pretty well…

In fact, it went so well that so far I classified 227 distinct application attacks;

227 attacks, not including multipliers due to persistent/session/indirect states, and I'm not even done mapping and classifying them.

Needless to say, that's a lot of mapping tasks for each individual product.

In fact, the effort of classifying and prioritizing those vectors while verifying which products supported them was so high, that I had to postpone their publication, or else the research you are currently reading might not have been published any time soon.

So, at the moment, this section describes the relative support for various audit features, and the rest of the content collected during the research will have to wait for another publication.

The detailed comparison of the scanners support for various audit features is documented in detail in the following section of sectoolmarket:

Note
The audit-feature count results of Webinspect may change in the coming days due additional verification processes I'm currently conducting. If eventually there are any changes, I will announce them using the comparison dedicated twitter account: @sectoolmarket

The Number of Audit Features in Scanners – Commercial/SAAS Tools


The Number of Audit Features in Scanners – Opensource/Free Tools


The Number of Audit Features in Scanners – Unified List


17. Test X - Adaptability - Scan Barriers
Applications may contain a variety of mechanisms and technologies that could be pose a barrier to a scanner – and in fact, effectively prevent it from being effective when scanning the application.

Scan barriers such as Anti-CSRF tokens, CAPTCHA mechanisms, platform specific tokens (such as required viewstate values) or account lock mechanisms have already become an integral part of many applications. Complicated RIA client technologies such as Flash, Applets and Silverlight are certainly not rare. 

Although not necessarily a measurable quality, the ability of the scanner to handle different technologies and scan barriers is an important perquisite, and in a sense, almost as important as being able to scan the input delivery method.

Reasoning: An automated tool can't detect a vulnerability in a point and shoot scenario if it is can't locate and scan the vulnerable location due to the lack of support in a certain a browser add-on, the lack of support for extracting data from certain non-standard vectors, or the lack of support in overcoming a specific barrier, such as a required token or challenge. The more barriers the scanner is able to handle, the more useful it is when scanning complex applications that employ the use of various technologies and scan barriers.

The detailed comparison of the scanners support for various barriers is documented in detail in the following of sectoolmarket:

The following charts show how many types of barriers each product claims to be able to handle (note that many of these features were not verified, and the information currently relies on documentation, research and vendor supplied information):

The Adaptability Score of Commercial/SAAS Scanners


The Adaptability Score of Opensource/Free Scanners


The Adaptability Score of Web Application Scanners – Unified List


18. Test XI - Authentication/Usability
Although supporting the authentication method required by the application seems like a crucial quality (and certainly is a convenient feature), in reality, certain scanner proxy chaining features can make-up for the lack of support in most of the authentication methods, by employing the use of a 3rd party proxy to authenticate on the scanner's behalf.

For example, if we wanted to use a scanner that does not support NTLM authentication (but does support an upstream proxy), we could have defined the relevant credentials in Burpsuite FE, and define it as an upstream proxy for the scanner we intend to use.

However, chaining the scanner to an external tool that supports the authentication still has some disadvantages, some of them major, such as reduced performance, potential stability issues, thread limitation and general inconvenience.

The following comparison table shows which authentication methods and features are supported by the various assessed scanners:

19. Test XII - Results/Features vs. Pricing
The following assessment is in fact a summary of the important results, in comparison to the product price and features.

This section will probably be the most useful section for anyone looking to purchase a commercial or SAAS solution, or is debating whether or not to use open source products instead.

As I mentioned in the introduction, since web application scanners might actually be a bundle of several semi-independent products (generic vulnerability scanner, known vulnerability scanner, infection scanner, etc), it's very important to notice which modules are included in each offer, especially in relation to commercial scanner pricing.

WAVSEP currently focuses on assessing the generic vulnerability scanning module of web application scanners, and whatever it is you're paying might be relative to the rest of the modules the product contains (or does not contain), in case you actually need those.

In short, the scanner price might (or might not) reflect a set of products that could probably have been priced separately as independent products.

For your convenience, I invested some effort in mapping which of these products contain additional modules, although some classification of modules might still be missing.
The mapped modules include generic web-app scanning modules, generic web service scanning modules, flash application scanning modules and CGI scanning modules (a.k.a web server scanning modules or known vulnerability scanning modules).

The mapped categories don't yet include SAST and IAST scanning modules, Applet/Silverlight scanning modules, website infection scanning modules and additional categories which may be mapped in the future.

Another important issue to pay attention to is the type of license acquired.
In general, I did not cover non commercial prices in this comparison, and in addition, did not include any vendor specific bundles, sales, discounts and sales pitches.
I presented the base prices listed in the vendor website or provided to me by the vendors, according to a total of 6 predefined categories, which are in fact, combinations of the following concepts:

Consultant Licenses: although there isn't a commonly accepted term, I defined "Consultant" licenses as licenses that fit the common requirements of a consulting firm - scanning an unrestricted amount of IP addresses, without any boundaries or limitations.
Limited Enterprise Licenses: Any license that allowed scanning an unlimited but restricted set of addresses (for example - internal network addresses or organization-specific assets) was defined as an enterprise license, which might not be suited for a consultant, but will usually suffice for an organization interested in assessing its own applications.
Website/Year - a license to install the software on a single station and use it for one year against a single IP address (the exception to this rule is Netsparker, in which the price per website reflects 3 Websites).
Seat/Year - a license to install the software on a single station and use it for a single year.
Perpetual Licenses - pay once, and it's yours (might still be limited by seat, website, enterprise or consultant restrictions). The vendor's website usually includes additional prices for optional support and product updates.

The various prices can be viewed in the dedicated comparison in sectoolmarket, available in the following address:

It is important to remember that these prices might change, vary or be affected by numerous variables, from special discounts and sales to a strategic conscious decision of vendors to invest in you as a customer or as a beta testing site.

20. Additional Comparisons
The following section contains additional information on the tested tools that was documented throughout the research, and may be of use to the reader.

List of Tools
The list of tools tested in this benchmark, and in the previous benchmarks, can be accessed through the following link:

Additional Features
Complementary scan features that were not evaluated or included in the benchmark:

In order to clarify what each column in the report table means, use the following glossary table:
Title
Possible Values
Configuration and Usage Scale
Very Simple - GUI + Wizard
Simple - GUI with simple options, Command line with scan configuration file or simple options
Complex - GUI with numerous options, Command line with multiple options
Very Complex - Manual scanning feature dependencies, multiple configuration requirements
Stability Scale
Very Stable - Rarely crashes, Never gets stuck
Stable - Rarely crashes, Gets stuck only in extreme scenarios
Unstable - Crashes every once in a while, Freezes on a consistent basis
Fragile – Freezes or Crashes on a consistent basis, Fails performing the operation in many cases
Performance Scale
Very Fast - Fast implementation with limited amount of scanning tasks
Fast - Fast implementation with plenty of scanning tasks
Slow - Slow implementation with limited amount of scanning tasks
Very Slow - Slow implementation with plenty of scanning tasks

Scan Logs
In order to access the scan logs and detailed scan results of each scanner, simply access the scan-specific information for that scanner, by clicking on the scanner version in the various comparison charts: http://sectoolmarket.com/

21. What Changed?
Since the latest benchmark, many open source and commercial tools added new features and improved their detection accuracy.

The following list presents a summary of changes in the detection accuracy and coverage of Commercial tools that were tested in the previous benchmark (+new):

(*) NTOSpider – NTOSpider last assessment took place in 2011, and since then there has been a significant improvement in all the test categories, as well as new results for tests not performed in 2011. It also came out FIRST in the WIVET category (along with 3 other products) and the XSS category (along with 10 others), and got high scores in many others. The rankings it got for the new tests (redirect/backup) were mixed.

(*) N-Stalker (Commerical Edition) – The commercial edition of N-Stalker was assessed in this benchmark for the first time. The only comparable result was to the XSS result of the free version tested in 2012, and in that case, there was a significant improvement. The rest of the results got it various ranks.

(*) Qualys – Qualys was first tested in 2012, and since then. The WIVET score didn't change (still one of the highest), and there are some new test results as well, but the SQL Injection and Reflected XSS results are actually worse, due to what I currently attribute to temporary bugs, either in the product or (less likely) my testing procedure.

(*) ScanToSecure – Another new SAAS service which is assessed for the first time, and got results that were almost identical to those of Netsparker.

(*) Netsparker (Commercial Edition) – Netsparker results were improved in almost every single category. The WIVET score was slightly improved (one of the highest), it came out FIRST in the Reflected XSS (along with 10 others) and Remote File Inclusion (along with 4 others) categories, dramatically improved the previous Local File Inclusion results (one of the highest results), and got a great results in many other tests. Like the vast majority of the products in the industry, its results were somehow mixed in the new tests (backup/redirect).

(*)WebInspectWebinspect significantly improved its scores in various categories: It was the only winner in the client/barrier coverage feature comparison, came out FIRST in the WIVET assessment (along with 3 others), the Remote File Inclusion (along with 4 others), Reflected XSS (along with 10 others) and SQL Injection (along with 4 others) categories, got surprisingly high score in the new (and secret) Unvalidated Redirect category (highest among commercial), and plenty of other high scores in different categories, but didn't get a good score in the backup/hidden file detection assessment.

(*) AppScan – AppScan too significantly improved its scores in various categories: It was the only winner in the Local File Inclusion and Supported Audit Features categories, got one of the highest WIVET scores, came out FIRST in the SQL Injection (along with 4 others), Reflected XSS (along with 10 others) and Remote File Inclusion (along with 4 others) categories, got plenty of high scores in other categories, but got mixed results in the new tests (backup/hidden files and unvalidated redirect).

(*) Acunetix WVS (Commercial Edition) – Acunetix slightly improved the results from the previous benchmarks, and got some very interesting new results: it got the BEST SCORE in the NEW Backup/Hidden Files category among commercial scanners (and some would argue, in total), came out FIRST in WIVET (along with 3 others), SQL Injection (along with 4 others), Reflected XSS (along with 10 others), got great results in many other categories, but didn't get a good score in the new unvalidated redirect category.

(*) Syhunt Dynamic – Syhunt dramatically improved their WIVET score (came out FIRST, along with 3 others), and slightly improved other scores as well (LFI, etc). They got a mixed result when scanning backup/hidden files, and didn't have a plugin to scan unvalidated redirect test cases (at least as far as I could tell).

(*) Burp Suite Pro – Burp is the undisputed winner of the (overall) versatility category, was the only winner in the input vector support category (followed closely by NTO, and less closely by Appscan, Webinspect and IronWASP), got one of the highest scores in detecting Backup/Hidden Files (relative), and decent scores in many other categories. It also came out FIRST in the SQL Injection (along with 4 others) and Reflected XSS (along with 10 others) categories, and dramatically improved its RFI score, but alas, didn't get a good score in the WIVET test (same as last year).

(*) WebCruiser – No significant changes compared to previous versions in the tested categories.

(*) ParosPro – was not retested, since no updates were released, so it has identical results.
(*) JSky – was not retested, since no updates were released, so it has identical results.
(*) Ammonite – was not retested, since no updates were released, so it has identical results.

The following list presents a summary of changes in the detection accuracy and coverage of Opensource/Free tools that were tested in the previous benchmark:

(*) ZAP – ZAP significantly improved almost all of its results. It implemented a new AJAX crawling feature that dramatically improved its WIVET score (highest among opensource) – but this feature optional and requires time to use. It came out FIRST in the Reflected XSS category (along with 10 others), got one of the highest scores in SQL Injection, Remote File Inclusion and Local File Inclusion, as well as a decent result in many others categories. If you take into account the external GoF plugin, ZAP is also the winner of the Backup/Hidden file detection category, although I'm leaving that interpretation to the reader. ZAP however, didn't get a good score when tested against unvalidated redirect test cases.

(*) IronWASP – Although IronWASP too had a new AJAX crawling feature, it was released too late for me to test it properly, and in my opinion, required a little more polishing (although rumors say it gets an insane WIVET score). It did however, make a clean (and unexpected) take away by being the only winner in the new and hidden Unvalidated Redirect category, with an impressive score that detected test cases that no other tool has. It also co-won the Reflected XSS category (along with 10 others), and got some great results in many other tests. Due to technical difficulties, I still don't a WIVET score for it, but hopefully will have soon.

(*) SkipfishSkipfish is back in the game. Although previous version were relatively buggy, the currently tested version had very impressive results, notable result consistency (which unfortunately I did not measure), and a dramatic improvement in almost every test category I used it in. It got very impressive results in many categories, and also a relatively very high results in the unvalidated redirect category.

(*) Vega – Vega was definitely a surprising player in this benchmark. It came out FIRST in both Reflected XSS (along with 10 others) and Remote File Inclusion (along with 4 others). It got a fantastic WIVET result for an open source tool (the best opensource result without using a visible browser – something no other opensource tool with good result did – worth reusing for other java tools), and got very impressive results in both the Local File Inclusion (although with lots of false positives) and SQL Injection. Sadly enough, it didn't have plugins for unvalidated redirect or backup/hidden files that I could test.

(*) Arachni – although anyone that will install and use the latest version of arachni will immediately notice a significant improvement in usability – a very impressive improvement if I might add, and probably the most consistent behavior I saw – and unfortunately did not measure (the idea behind the "AutoThrottle" feature is very interesting – and probably responsible for some of the consistent results – since it got the same results regardless of how many URLs it scanned – very rare in this industry), a bug in the XSS plugins seemed to reduce its score in that category in comparison to the previous assessment, and another bug caused the backup/hidden file detection plugin not to function at all. It still came out FIRST in the Remote File Inclusion test (along with 4 others), improved some other results, and got the third best score in the NEW Unvalidated Redirect category (along with Webinspect), and also got me thinking on how easy it is to start a new SAAS business just by using it.

(*) W3AF – The development version of W3AF had several bugs that affected its score, and in fact, some results were actually worse than the last benchmark (bugs were reported to the vendor). It did however, still manage to surprise and get the best score for an opensource tool in the Unvalidated Redirect category (and second best score in that category in total), a relatively good result in the Backup/Hidden File detection category, and a couple of other results that were impressive, especially in the context of the open-source industry (wivet, features).

(*) WATOBO – WATOBO significantly improved both its SQL Injection and Reflected XSS scores, got the same scores in LFI, and got above average (relative) results in backup/hidden file detection (which were generally bad to mediocre for most tools), but at the time of the test did not have any RFI or Unvalidated Redirect features I could test.

(*) WAPITI – those who recall this tool which got surprisingly high scores in previous benchmarks, would be delighted to know that the project has been recently revived and that a new version was released. It got relatively good results (impressive WIVET for an opensource tool), as well as improvement in almost every category. It did however, have a hard time with the Backup/Hidden file category in which it got a low score.

(*) N-Stalker 2012 FE Significantly improved its Reflected XSS Score compared to the previous benchmark.

(*) Netsparker Community/Free Edition got some slight improvements in some of its scores, and still has one of the best WIVET scores for a free tool, but in the overall, there were no major changes compared to the previous benchmark.

(*) SQLMap, WebSecurify, Acunetix FE and a couple of other projects were not retested, and most of the features of Syhunt Mini, AndiParos and Paros were not retested (although the latter three got some new results for unvalidated redirect and backup/hidden files). 

22. Opensource vs. Commercial - Insights?
The conclusions I have this year in relation to the open source vs. commercial tools enigma are not as decisive compared to the previous year.

Part of that is because I didn't yet completed all the analysis processes I planned, and part of it because there really was a significant improvement in the open source industry (and without taking lightly the significant improvements that took place in the commercial section).

Projects such as ZAP and IronWASP started supporting scanning input delivery methods of modern web applications, including JSON/AJAX, XML, and even nearly unique vectors such as OData and GWT, that even most commercial vendors don't support.

Projects like W3AF have long ago been almost as feature rich as Webinspect and Appscan (although they still lack stability), Vega is coming closer to having a crawling mechanism that can produce similar results to that of a commercial vendor, and if I were Qualys (or any other cloud vendor), I would watch the Improvement of the Arachni project CLOSELY. Seriously – Install it and give it a shot… The results don't emphasize the maturity level it got to.

However, in sheer numbers, as an overall solution, most open source tools still lag a bit behind some of the major commercial players, at least if you take into account all the categories… although I admit that I don't say that with the same confidence as I did before, and I believe that further analysis is required to get to a practical conclusion. 

23. Verifying the Benchmark Results
The results of the benchmark can be verified by replicating the scan methods described in the scan log of each scanner (accessible in sectoolmarket through the version link of each product), and by testing the scanner against WAVSEP v1.5 (obtained from the sourceforge WAVSEP repository) and WIVET v3-revision148.

The same methodology can be used to assess vulnerability scanners that were not included in the benchmark.

24. So What's Next?
During this research, which I have been conducting for the past 18 months or so (7 of those just to gather the results you are currently seeing), I gathered a ton of information.

Due to my consistently tight schedule, too many adventurous endeavors and the fact that I didn't want to delay the publication any longer, I didn't publish A LOT of content that was gathered, so in the next couple of weeks I'm going to try and wrap it up so it could come to fruition ASAP…in my opinion, the conclusions from the unpublished content can be very interesting for the technological trends in this industry.

The benchmark was branded as part I for a reason, and although I might add the results of additional products soon, in the upcoming weeks, I plan to focus on trying to see how much effort will be required to release part II, which will have a very different result format compared to the typical WAVSEP benchmark.

25. Recommended Read-List: Benchmarks
The following resources include additional information on previous benchmarks, comparisons and assessments in the field of web application vulnerability scanners:

(*) "HackMiami Web Application Scanner 2013 PwnOff", by James Ball, Alexander Heid, Rod Soto (a comparison of 5 web application scanners published at the HackMiami 2013 conference).
(*) "Top 10: The Web Application Vulnerability Scanners Benchmark, 2012", one of the predecessors of the current benchmark, by Shay Chen (a comparison of 60 commercial and open source scanners, July 2012)
(*)"Enemy of the State: A State-Aware Black-Box Web Vulnerability Scanner", by Adam Doup´e, Ludovico Cavedon, Christopher Kruegel, and Giovanni Vigna (a comparison of 3 scanners published in 2012).
(*)"SQL Injection through HTTP Headers", by Yasser Aboukir (an analysis and enhancement of the 2011 60 scanners benchmark, with a different approach for interpreting the results, March 2012)
(*)"The Scanning Legion: Web Application Scanners Accuracy Assessment & Feature Comparison", one of the predecessors of the current benchmark, by Shay Chen (a comparison of 60 commercial and open source scanners, August 2011)
(*)"Building a Benchmark for SQL Injection Scanners", by Andrew Petukhov (a commercial and opensource scanner SQL injection benchmark with a generator that produces 27680 (!!!) test cases, August 2011)
(*)"Webapp Scanner Review: Acunetix versus Netsparker", by Mark Baldwin (commercial scanner comparison, April 2011)
(*)"Effectiveness of Automated Application Penetration Testing Tools", by Alexandre Miguel Ferreira and Harald Kleppe (commercial and freeware scanner comparison, February 2011)
(*)"Web Application Scanners Accuracy Assessment", one of the predecessors of the current benchmark, by Shay Chen (a comparison of 43 free and open source scanners, December 2010)
(*)"State of the Art: Automated Black-Box Web Application Vulnerability Testing" (Original Paper), by Jason Bau, Elie Bursztein, Divij Gupta, John Mitchell (May 2010) – original paper
(*)"Analyzing the Accuracy and Time Costs of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, February 2010)
(*)"Why Johnny Can’t Pentest: An Analysis of Black-box Web Vulnerability Scanners", by Adam Doup´e, Marco Cova, Giovanni Vigna (commercial and open source scanner comparison, 2010)
(*)"Web Vulnerability Scanner Evaluation", by AnantaSec (commercial scanner comparison, January 2009)
(*)"Analyzing the Effectiveness and Coverage of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, October 2007)
(*)"Rolling Review: Web App Scanners Still Have Trouble with Ajax", by Jordan Wiens (commercial scanners comparison, October 2007)
(*)"Web Application Vulnerability Scanners – a Benchmark" , by Andreas Wiegenstein, Frederik Weidemann, Dr. Markus Schumacher, Sebastian Schinzel (Anonymous scanners comparison, October 2006)

26. Acknowledgements
While performing the research described in this article, I have received help from plenty of individuals and resources, and I’d like to take the opportunity to acknowledge them all.

To the researchers Ozhan Sisic and Sharath Unni which contributed content and results to the assessment, and did so at the expense of their own time, in dense timeframes, and often in unreasonable hours and timeframes.

To the various additional volunteers that did their best to assist me whenever they could, especially to the ones that chose to stay anonymous.

To the various members at Denim Group, and especially Dan Cornel, which assisted throughout the project, adapted their excellent platform Threadfix to fit my needs, and enabled me to handle nearly unreadable results and share information with volunteers that participated in the tests around the world.

For the various entities and projects that contributed code to WAVSEP, including (but not limited to) the various authors of the ZAP project and Lavakumar Kuppan from the IronWASP project.

To Dan Kuÿkendall from NTOBJECTives who permitted me to use their online enhanced adaptation of WIVET as an additional verification mechanism for the local WIVET results.

For all the open source tool authors that assisted me in testing the various tools in unreasonable late night hours and bothered to adjust their tools for me, discuss their various features and invest their time in explaining how I can optimize their use,
To the CEO's, Product Managers, Marketing Executives, QA engineers, Support Personal and Development teams of commercial vendors, which saved me tons of time, supported me throughout the process, helped me overcome obstacles and made my experience a pleasant one.

To the various information sources that helped me gather the list of scanners over the years, spread the news about the previous benchmarks, and gain knowledge, ideas, and insights, including (but not limited to) information security sources such as Security Sh3ll (http://security-sh3ll.blogspot.com/), PenTestIT (http://www.pentestit.com/), The Hacker News (http://thehackernews.com/), Toolswatch (http://www.vulnerabilitydatabase.com/toolswatch/), Darknet (http://www.darknet.org.uk/), Packet Storm (http://packetstormsecurity.org/), Google (of course), Twitter (and the never-ending list of favorites I keep there) and many others great sources that I have used over the years to gather the list of tools.

I can't thank you all enough, and wish you all the best.

27. Appendix A: Tools That Were Not Included
The following commercial web application vulnerability scanners were not included in the benchmark, due to deadlines and time restrictions from my part:

Commercial Scanners not included in this benchmark
(*)Websure
(*)Hailstorm (Cenzic)
(*)McAfee Vulnerability Manager (McAfee / Foundstone)
(*)Retina Web Application Scanner (eEye Digital Security)
(*)SAINT Scanner Web Application Scanning Features (SAINT co.)
(*)WebApp360 (NCircle)
(*)Parasoft Web Application Scanning Features (a.k.a WebKing, by Parasoft)
(*)Falcove (BuyServers ltd, currently Unmaintained)
(*)Safe3WVS 13.1 Commercial Edition (Safe3 Network Center)

The following open source web application vulnerability scanners were not included in the benchmark, mainly due to time restrictions, but might be included in future benchmarks:
Open Source Scanners not included in this benchmark
(*)Kayra
(*)2gwvs
(*)Fiddler XSSInspector/XSRFInspector Plugins
(*)Vulnerability Scanner 1.0 (by cmiN, RST)

The following is a partial list of SAAS scanners were not included in the benchmark, mainly due to time restrictions, but might be included in future benchmarks:
SAAS Online Scanning Services
Appscan On Demand (IBM), Click To Secure, Sentinel (WhiteHat), Veracode (Veracode), Quatrashield, Veracode Dynamic Analysis, edgescan, VUPEN Web Application Security Scanner (VUPEN Security), WebInspect (online service - HP), WebScanService (Elanize KG), Gamascan (GAMASEC – currently offline), Cloud Penetrator (Secpoint),  Zero Day Scan, DomXSS Scanner, Golem Technologies, etc.

Web Application Testing Tools which are using Dynamic Runtime Analysis (IAST):
(*)Seeker (Quotium)
(*)Contrast (Aspect Security)
(*)PuzlBox (currently named PHP Vulnerability Hunter)

The benchmark focused on web application scanners that are able to detect at least Reflected XSS or SQL Injection vulnerabilities, can be locally installed, and are also able to scan multiple URLs in the same execution.
As a result, the test did not include the following types of tools:
Scanners without RXSS / SQLi detection features:
(*)Dominator (Firefox Plugin)
(*)fimap
(*)lfimap
(*)LFI/RFI Checker (astalavista)
(*)Etc.

Passive Scanners (response analysis without verification):
(*)Watcher (Fiddler Plugin by Casaba Security)
(*)Skavanger (OWASP)
(*)Pantera (OWASP)
(*)Ratproxy (Google)
(*)Etc.


Scanners of specific products or services (CMS scanners, Web Services, etc):
(*)WSDigger
(*)Sprajax
(*)ScanAjax
(*)Joomscan
(*)wpscan
(*)Joomlascan
(*)Joomsq
(*)WPSqli
(*)Etc.

Uncontrollable Scanners
Scanners that can’t be controlled or restricted to scan a single site, since they either receive the list of URLs to scan from Google Dork, or continue and scan external sites that are linked to the tested site. This list currently includes the following tools (and might include more):
(*)Darkjumper 5.8 (scans additional external hosts that are linked to the given tested host)
(*)Bako's SQL Injection Scanner 2.2 (only tests sites from a google dork)
(*)Serverchk (only tests sites from a google dork)
(*)XSS Scanner by Xylitol (only tests sites from a google dork)
(*)Hexjector by hkhexon – also falls into other categories
(*)d0rk3r by b4ltazar

Deprecated Scanners
Incomplete tools that were not maintained for a very long time; currently includes the following tools (and might include more):
(*)Wpoison (development stopped in 2003, the new official version was never released, although the 2002 development version can be obtained by manually composing the sourceforge URL which does not appear in the web site- http://sourceforge.net/projects/wpoison/files/ )

De facto Fuzzers
Tools that scan applications in a similar way to a scanner, but where the scanner attempts to conclude whether or not the application or is vulnerable (according to some sort of “intelligent” set of rules), the fuzzer simply collects abnormal responses to various inputs and behaviors, leaving the task of concluding to the human user.
(*)Lilith 0.4c/0.6a (both versions 0.4c and 0.6a were tested, and although the tool seems to be a scanner at first glimpse, it doesn’t perform any intelligent analysis on the results).
(*)Spike proxy 1.48 (although the tool has XSS and SQLi scan features, it acts like a fuzzer more then it acts like a scanner – it sends payloads of partial XSS and SQLi, and does not verify that the context of the returned output is sufficient for execution or that the error presented by the server is related to a database syntax injection, leaving the verification task for the user).

Fuzzers
Scanning tools that lack the independent ability to conclude whether a given response represents a vulnerable location, by using some sort of verification method (this category includes tools such as JBroFuzz, Firefuzzer, Proxmon, st4lk3r, etc). Fuzzers that had at least one type of exposure that was verified were included in the benchmark (Powerfuzzer).

CGI Scanners
Vulnerability scanners that focus on detecting hardening flaws and version specific hazards in web infrastructures (Nikto, Wikto, WHCC, st4lk3r, N-Stealth, etc)
Single URL Vulnerability Scanners
Scanners that can only scan one URL at a time, or can only scan information from a google dork (uncontrollable):
(*)Havij (by itsecteam.com)
(*)Hexjector (by hkhexon)
(*)Simple XSS Fuzzer [SiXFu] (by www.EvilFingers.com)
(*)Mysqloit (by muhaimindz)
(*)PHP Fuzzer (by RoMeO from DarkMindZ)
(*)SQLi-Scanner (by Valentin Hoebel)
(*)Etc.

Vulnerability Detection Toolkits
Tools that aid in discovering vulnerabilities, but do not detect the vulnerability themselves; for example:
(*)Exploit-Me Suite (XSS-Me, SQL Inject-Me, Access-Me)
(*)XSSRays (chrome Addon)

Exploitation Tools
Tools that can exploit vulnerabilities without any independent ability to automatically detect vulnerabilities on a large scale. Examples:
(*)MultiInjector
(*)XSS-Proxy-Scanner
(*)Pangolin
(*)FGInjector
(*)Absinth
(*)Safe3 SQL Injector (an exploitation tool with scanning features (pentest mode) that are not available in the free version)
(*)Etc.

Exceptional Cases

(*)SecurityQA Toolbar (iSec) – various lists and rumors include this tool in the collection of free/open-source vulnerability scanners, but I wasn’t able to obtain it from the vendor’s web site, or from any other legitimate source, so I’m not really sure it fits the “free to use” category.

8 comments:

  1. Great article and analysis. Thanks for sharing

    ReplyDelete
  2. Absolutely amazing! Thank you so much for your hard work! You are making an enormous contribution to the state of web scanner out there.

    ReplyDelete
  3. I'm a little confused about the false positive bar. On some areas you show 100% being false positives but then also say 18% of being accurately detected. How is this possible?

    ReplyDelete
    Replies
    1. The percentage score of the false positive ratio is relative to the false positive test cases, not to the whole.

      For example, for backup/hidden file detection (which is the category the percentages you mentioned came from) - there are 184 different REAL vulnerable hidden/backup files,
      and 3 categories (not necessarily single files) of false positive behaviors.

      In the case of the scanner that got this score, it means it found 18% out of the 184 real vulnerabilities, but found numerous false positives that fall into each one of the 3 false positive categories (it identifies dynamic 200-404 responses, custom 200/404 responses and default file responses as hidden files).

      The same idea applies to all the rest of the tests - false positive test cases represent application behavior categories that may be identified as a vulnerability - and these test cases are separated from the actual vulnerable test cases.

      Delete
  4. Hi,

    Can you please share with me your expert comments on whether any of the opensource tool listed above is Linux friendly ?

    Apart from this can you please share with me the benchmark comparison for Kali Linux and OpenVas.

    Thank you in anticipation.
    Regards,
    Dhruv Trehan

    ReplyDelete
    Replies
    1. All the java ones should be fairly linux friendly (ZAP, Vega, etc) - and in fact, many of them should be already included within Kali linux.

      The reason Kali isn't included is that it's a collection of tools, not a scanner - and in fact, contains many of the tools assessed (check the various kali menus and you'll recognize the names).

      As for OpenVAS, it's a network/ifrastructure penetration testing tools - not an application penetration testing tool (at least not in focus), and therefore was not in the scope of this benchmark.

      Delete
  5. I am missing BeyondTrust's (former eEye) Web Application Scanner or BeyondSaas.

    ReplyDelete