Always Protect Against XSS Attacks

From The Socknet

(Redirected from XSS Sanity)
Jump to: navigation, search

This document is not specific to the Socknet or even applicable to the API. But, for some reason, not everyone gets education in XSS sanitization. And we don't want the Socknet to get a repuation for being easily hackable.

XSS
Cross-Site Scripting. Somewhat of a misnomer for code injected onto a website via untrusted user input. Generally used to load a larger amount of code from a separate website (thus the name), but this code can be dangerous without ever contacting a separate website.

Example: This code, if successfully injected into a website, would show the user's cookies in a popup as soon as the user attempts to click the link.

<a href="http://icanhascheezburger.com/" onmouseover="alert(document.cookie)">Funny Cats</a>

This code could also be written to send the cookies to an offsite location. The cookies contain important data that allows the website to identify the user. If an attacker obtained the cookies, he could use them to log in to the website under the name of the victim. The link would still send the user to see funny cats, so it's likely that no one would detect the security breach.

There are more nefarious ways to do this which would run the code as soon as the page is loaded.

Contents

Libraries

There are libraries out there that can do this work for you. Look into them.

Sanity

Some say that XSS can only be avoided with a whitelist mechanism; that you must avoid all HTML in user input except a very few items that you have thoroughly verified are OK. This is excessively strict, and in fact, a whitelist is often not completely effective.

Protecting against XSS is a simple activity with a small learning curve, and it can be completely effective.

First, keep on top of changes in browsers. But that should go without saying.

Currently, XSS in the form of JavaScript can be run in exactly three ways.

  1. a <script> or <style> tag
  2. an on.* attribute
  3. an attribute with a javascript: protocol link (more dangerous than you may know)

The safest way to combat these vulnerabilities, while keeping all other HTML available, is to parse the HTML (using some library already available in your language) and then find and remove dangerous nodes.

An example in a made-up language:

function clean_html(html){
	tree = parse_html(html);
	prune(tree);
	return tree.to_html();
}

function prune(branch){
	for (i = branch.children.length - 1; i >= 0; i--)
		if (branch.children[i].name == 'script' || branch.children[i].name == 'style')
			branch.remove(branch.children[i]);
	for (i = branch.attributes.length - 1; i >= 0; i--){
		if (branch.attributes[i].name.substr(0,2) == 'on')
			branch.remove(branch.attributes[i]);
		else if (branch.attributes[i].value.substr(0,11) == 'javascript:')
			branch.remove(branch.attributes[i]);
	}
	for (i = branch.children.length - 1; i >= 0; i--)
		prune(branch.children[i]);
}

This code will:

  1. convert the HTML code into a common programming tree format in which each element has a 'children' array of elements within it and an 'attributes' array of the attributes it has.
  2. send this tree to the prune method, which removes any child named 'script' or 'style', any attribute starting with 'on', and any attribute with a URL starting with 'javascript:'. (Note: Due to a bug in some browsers, we guard against all attributes with values starting with "javascript:", not just href attributes.)
  3. send each child element to the prune method too.

When finished, the tree contains no dangerous elements or attributes. Convert it back to HTML code and you have safe HTML.

Important: In the example, there should be better attention paid to capitalization, whitespace in attributes, and HTML encoding. HTML encoding in particular can be used to hide the "javascript:" protocol identifier, but it will be handled automatically by any good HTML parser. Whitespace in attributes is sometimes trimmed automatically.

Note: In the example above, the href attribute is removed from <a> tags if it has a javascript: link. This behavior works fine, but many browsers will render this as a link that goes nowhere, or goes to the current page. You probably don't want that.

Other considerations:

  • Style attributes can be a nuisance, think about position:absolute
  • Class attributes can be annoying too
  • ID attributes might stop a page from functioning correctly
  • Object elements can run unsafe Flash (though you may white-list some such elements by domain)
  • Some elements are simply useless without their 'on' attributes, ie buttons. They might be removed entirely.
  • Read about CSRF, a related danger with a very different solution



Oh, I'm just gonna use regex to filter for javascript.

No, don't do this. Just as an example, javascript: links can be disguised with HTML character-encoding such as "j&#65;vascript:" along with 2,097,151 other permutations (2 ways to encode the ":" and 4 for each letter). Web browsers will run javascript just fine when written in this form, but regex engines will be fooled.

Wait, what's wrong with a whitelist?

First off, whitelists tend to be too restrictive. More importantly, they usually do not work. For example, most whitelists include the <b> tag because its function is safe. However, many browsers allow a <b> tag to have an onmouseover attribute containing JavaScript. Most whitelist writers don't think of that. There may be dozens of things to filter for, and in the end your solution might look like a messy version of what was presented in this article.

If you are concerned that new browsers will have new tags that you won't know about you may find that a whitelist (a long one) is useful, when paired with the XSS solution here. However, at this time, the tags planned for HTML5 have been published for a few years and there is no obvious danger.

TODO verify that claim...

Does this work against VBS too?

We should all work against VBS.

But the real answer is: probably. Anyone reading this want to take a stab at that?

Wait, XSS doesn't have to be Cross-Site?

For example, with the right XSS, an attacker could change a victim's password using just the information on the page.

Though generally, an attacker is interested in gaining access without being found out, so he'll send current login information to a separate website. So cross-site is usually correct, but it's circumstantial.

Personal tools