JSON backend
All the data should be stored in individual JSON files, one for each vulnerability. These files should be stored in db/
and the name should be <vuln-id>-<vulnerability-name>.json
, where <vuln-id>
is a unique ID (integer) that we'll use to reference/find the vulnerabilities and the <vulnerability-name>
is a human readable name to make it easier for us to find a vulnerability in the output of an "ls" command. The vulnerability name should be dash separated. Examples:
1-cross-site-scripting.json
2-sql-injection.json
3-missing-hsts-header.json
JSON files should look similar to this example:
{
"id": 12345,
"title": "Cross-Site Scripting",
"description": "A very long description for Cross-Site Scripting",
"severity": "medium",
"wasc": ["0003"],
"tags": ["xss", "client side"],
"cwe": ["0003", "0007"],
"owasp_top_10": {
"2010": [1],
"2013": [2]
},
"fix": {
"guidance": "A very long text explaining how to fix XSS vulnerabilities",
"effort": 50
},
"references": [
{"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
{"url": "http://asp.net/xss", "title": "How to fix XSS vulns in ASP.NET"},
{"url": "http://owasp.org/xss", "title": "OWASP desc for XSS"}
]
}
Notes about the JSON file:
- "id" holds the vulnerability unique ID. The contents of the "id" field must match filename
- The "OWASP" fields points to the OWASP Top10 category. We have different versions of this (2010 and 2013) to match the different releases of the OWASP Top10 project
- "fix-effort" gives the user an idea of how much time it will take to solve this vulnerability, it's stored in minutes
- Since a long blob of unformatted text is hard to read for any user, the description and solution text MUST use markdown for formatting. There are various python libraries which translate markdown to HTML, which will allow us to show the text in the GUI and or Web UI.
Long strings in JSON
JSON has an awful limitation: "No multi-line strings". In our case this is very limiting since we don't want to have awfully long lines in these sections:
"description": "A very long description for Cross-Site Scripting",
"solution": "A very long text explaining how to fix XSS vulnerabilities",
So, the parser should check if the description
and solution
fields are strings or arrays. If they are arrays, the contents of the array should be joined using an empty space. For example:
"description": ["A very long description for Cross-Site Scripting",
" which has more than one line"]
Will be parsed as A very long description for Cross-Site Scripting which has more than one line
and then if the user wants to enter new lines, he needs to do so explicitly:
"description": ["Line1\n",
"Line2\n"]
Will be parsed as:
Translations
One of the cool things about this architecture is that it will allow us to easily add translations. Just adding a new set of JSON files (keeping the vulnerability IDs) would work.
Implementation details:
- The English version will be in
db/
- Unix's environment variable
LANG
is used for setting the language
- If there is no translation for the current
LANG
then the English version is used
- Translations will be in
db/ru/
, db/es/
, etc.
- The translation json files will override the parts of the main JSON file which is in English, for example, the main file contains:
"title": "Cross-Site Scripting",
And then the Russian translation file (for the same vulnerability id) says:
"title": "Межсайтовый скриптинг",
And if the LANG
environment variable is set to Russian then the python/ruby/go wrapper should return Cross-Site Scripting in Russian
as the title for this vulnerability. Any field from the main JSON file can be overridden (for example you can override a link to wikipedia to point to the XSS description in Russian).
Content
We'll use the content already created for the Arachni scanner, gently contributed to vulndb/data by Tasos.
Unittesting
- Unittests must be in the
tests/
directory
- We should write a json-schema for our JSON, and then validate the JSON after each push with a unittest. See https://python-jsonschema.readthedocs.org/en/latest/ for more information about json-schemas
- Assert that the
severity
field is one of high
, medium
, low
, informational
- Assert that these fields are present and contain at least 30 chars:
- Assert that these fields are present:
- id
- title
- fix_effort
- severity
- Assert that all references have
url
and title
fields, and that they are not empty
- For each value in WASC/OWASP, make sure that we can generate a link like
http://owasp.org/foo/<id>
, the URL is valid, site is online, not 404
- For each URL field in the schema, we need to check that:
- It's a valid URL
- The site is online, and no 404 is returned
fix_effort
value is in the right format
- The markdown in both description and solution are well formed
- The ID in the file name is the same as the one inside the JSON file
- The lines are not longer than 90 columns
References
andresriancho/w3af#53