WebMatrix - Protecting Your Web Pages Site

Your Web Pages site is under threat. There are people out there who want to break into restricted areas, download files they shouldn't have access to, mess up your database and steal your passwords. Worse still, they want to use your application as a gateway to the web server so that they can take full control over it. This article examines those threats and how you can protect your application against them.

The Open Web Application Security Project web site lists the top 10 threats that your site is subject to. The top two threats are always the same, SQL Injection and Cross Site Scripting (XSS), although the order changes depending on the number of instances found on the Internet. Both of these threats are very real, and can have catastrophic effects on your web application or the server it sits on. Most people think these things will never happen to their little site. After all, it's not as if you are building sites that manage people's money or credit card details, is it? While it's true that many serious hackers are interested in pecuniary gain, a lot of them are just losers and other lower forms of life who like causing disruption just because they can. Many of these people also use automated bots to scan for security vulnerabilities and don't even know it's your site that they are using as a means to take control over a server, or steal people's identity.

SQL Injection

SQL Injection is a technique whereby malicious users "inject" valid SQL into your code which then gets executed by the database causing unwanted side effects. You might be wondering how this can be done when you have already hardcoded the SQL in your web pages, but it's actually very easy. If your SQL is only partially written, and depends on dynamic values which come from the user, you have a potential security hole. Have a look at the following:

Select Count(*) From Users Where Username = '" + @Request["username"] + "' And Password = '" + @Request["password"] + "'"

The SQL above is waiting for some values to come from the user which will be provided via a log in form. Often, a test is applied to ensure that at least one row in the database table matches the values provided by the user, so if the returned value from the database is greater than 0, you let the validate the user and let them in to the restricted area, ofr instance. You expect a user to simply provide their username and password, but there's nothing to stop them adding a bit more. For example, if the user enters the following into the username and password text boxes:

' or ''='

the resulting SQL that gets executed against the database is as follows:

Select Count(*) From Users Where Username = '' or ''='' And Password = '' or ''=''

Every row in the database matches the OR condition, which means that the Count value will be greater than 0. Now the user has passed the test, but didn't even have to try to guess a valid user name and pasword combination. This example of SQL injection works becasue the apostrophes are valid SQL delimiters.

But there's more... SQL CE, the default database paltform for Web Pages is relatively immune from some of the more serious attacks because it doesn't support batch statements. But SQL Server Express and the full edition do, which means that a user can add extra commands to your existing ones. Consider the following input from a user:

'DROP table Users--

The dashes -- act as SQL Server comments, which means anything after "DROP table Users" is ignored, but if you have a table called Users in your database, it's just been deleted.

These are fairly simple examples of what SQL Injection is, but illustrate that it is a real issue, and easy to achieve. SQL injection can cause far more serious problems than this such as getting SQL Server to run Operating System commands on the web server, which effectively gives the hacker full control over it.

Cross Site Scripting (XSS)

XSS is the practice of injecting malicious javascript into a page and causing it to be executed by the user. If you have used jQuery, you will by now already lnow that javascript is extremely powerful. Using this technique, it's pretty simple for a hacker to inject javascript that changes a form's action attribute to point to a page on their server, for example, and steal your user name and password. For examples of how this can happen, and what the potential dangers are (which include cookie theft or tampering, stealing a person's identity, content injection, popping up adverts to name a few) have a look at this video from Joe Stagner of Microsoft.

Protection

Both SQL Injection and XSS vulnerabilities result from the same source - user input. More specifically, unsanitised user input. Rule number one in protecting your site is to validate all user input for data type and range. But before we look at how to do this, we should define what "user input" actually is. So far, the examples have concentrated on textboxes in forms, but there are other avenues where incoming values are accepted by a web application and could potentially form part of a SQL statement. These are query strings in URLs, cookies, hidden fields, dropdown lists, checkboxes, radio buttons, submit buttons...

You might at this point be wondering how on earth a dropdown list might be an avenue of attack. After all, you provide the values for that in your code, don't you? All you do is check to see which of your values the user chose. Actually, what gets sent back might not be one of your values. It's easy for a hacker to change your hardcoded values. They can save a local copy of your page to their desktop, edit it in Notepad and then submit it from there. Or they can use Firebug or a similar tool to change the values before submitting the form. Never trust any value that comes from the client.

There are a number of helpers for checking the data type of values:

IsBool() Checks the value to see if it can be converted to a boolean
IsDateTime() Checks if the value is a .NET DateTime
IsDecimal() Checks to see if the value is a Decimal
IsFloat() Checks to see if the value is a floating point number
IsInt() Checks to see if the value is an integer

 

Note that all values coming from the user are strings initially, so these methods check to see if a conversion to the relevant data type is possible. If it is, true is returned. Lets assume that your site shows a number of articles. Each one has an ArticleId, and your main page lists the articles with links to each one. The link URL is likely to look something like this:

http://mysite.com/Article/20

In the code within Article.cshtml, you are likely to use the value 20 to retrieve the the correct item from the database. Before you even attempt that, make sure it is an integer by using the IsInt() method:

@if(UrlData[0].IsInt()){
    @:It's a number <br />
}

HTML Input And XSS Attacks

By default, ASP.NET prevents people posting html tags via a form or a URL. This is a defensive measure specifically designed to stop people injecting <script> tags into your page from which they can initiate an XSS attack. The mechanism by which this is managed is called Request Validation. We can see this mechanism in action through the following exercise. Create a page like this:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8" />
        <title></title>
    </head>
    <body>
    <form method="post" action="">
        @Html.TextArea("input")
        <br />
        <input type="submit" name="action" value="Submit" />
        <br />
        @Request["input"]
    </form>
    </body>
</html>

It's a simple form with a textarea and a submit button for posting the form together with a line of code to render whatever was submitted. I've use the Html.TextArea helper for this. If you run the page and enter the following in the textbox:

<strong>Hello World!</strong>

an then hit submit, you'll see a YSOD (Yellow Screen Of Death):

This is good to know, but there may be times when you want to allow users to submit html to your site, for example if you provide them with a Rich Text Editor so that they can format their submission for display. This is a common requirement in forums, blog comments and content management systems where editors might submit new articles. One of the great things about Web Pages is that you can switch Request Validation off for individual form fields, while still maintaining protection for all other user input, including URLs. You can do this with the Request.Unvalidated() method passing in the name of the form fields you do not want to be protected:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8" />
        <title></title>
    </head>
    <body>
    <form method="post" action="">
        @Html.TextArea("input")
        <br />
        <input type="submit" name="action" value="Submit" />
        <br />
        @Request.Unvalidated("input")
    </form>
    </body>
</html>

Run the page again, entering the same <strong>Hello World</strong>. This time you get no error, but the result isn't quite what you might expect, either:

This demonstrates another inbuilt layer of defense that Web Pages provides against XSS attacks. By default, all output to be rendered to the browser is HTML encoded. Again, this may not be desirable - you are allowing users to enter html, so presumably you want to display it as it was intended. However, you do not want people getting away with injecting <script> tags. One very simple way to manage this is to simply reject any input if it contains "<script>":


<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8" />
        <title></title>
    </head>
    <body>
    <form method="post" action="">
        @Html.TextArea("input")
        <br />
        <input type="submit" name="action" value="Submit" />
        <br />
        @if(!Request.Unvalidated("input").IsEmpty()){
            if(!Request.Unvalidated("input").ToString().Contains("<script>"))
            {
                @Request.Unvalidated("input")
            }
        }
    </form>
    </body>
</html>

This is a fairly crude, but mostly effective way of managing things. You might also want to consider a "white list" approach, where you determine which html tags are allowable.

using System;
using System.Web;
using System.Text.RegularExpressions;

public static class HtmlSanitizer
{
    private static Regex _tags = new Regex("<[^>]*(>|$)", 
        RegexOptions.Singleline | 
        RegexOptions.ExplicitCapture | 
        RegexOptions.Compiled);
    
    private static Regex _whitelist = new Regex(@"
        ^</?(b(lockquote)?|code|d(d|t|l|el)|em|h(1|2|3)|i|kbd|
        li|ol|p(re)?|s(ub|up|trong|trike)?|ul)>$|
        ^<(b|h)r\s?/?>$",
        RegexOptions.Singleline | 
        RegexOptions.ExplicitCapture | 
        RegexOptions.Compiled | 
        RegexOptions.IgnorePatternWhitespace);
    
    private static Regex _whitelist_a = new Regex(@"
        ^<a\s
        href=""(\#\d+|(https?|ftp)://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+)""
        (\stitle=""[^""<>]+"")?\s?>$|
        ^</a>$",
        RegexOptions.Singleline | 
        RegexOptions.ExplicitCapture | 
        RegexOptions.Compiled | 
        RegexOptions.IgnorePatternWhitespace);
    
    private static Regex _whitelist_img = new Regex(@"
        ^<img\s
        src=""https?://[-a-z0-9+&@#/%?=~_|!:,.;\(\)]+""
        (\swidth=""\d{1,3}"")?
        (\sheight=""\d{1,3}"")?
        (\salt=""[^""<>]*"")?
        (\stitle=""[^""<>]*"")?
        \s?/?>$",
        RegexOptions.Singleline | 
        RegexOptions.ExplicitCapture | 
        RegexOptions.Compiled | 
        RegexOptions.IgnorePatternWhitespace);


    public static string AsSafeHtml(this string html)
    {
        string tagname;
        Match tag;
    
        // match every HTML tag in the input
        MatchCollection tags = _tags.Matches(html);
        for (int i = tags.Count - 1; i > -1; i--)
        {
            tag = tags[i];
            tagname = tag.Value.ToLowerInvariant();
            
            if(!(_whitelist.IsMatch(tagname) || _whitelist_a.IsMatch(tagname) || _whitelist_img.IsMatch(tagname)))
            {
                html = html.Remove(tag.Index, tag.Length);
            }
        }
        return html;
    }
}

It has been placed in a C# class file in App_Code, and the file is named HtmlSanitizer.cs. This might look quite complicated, but it uses Regular Expressions to simply strip out any tags that do not appear in the white lists. There are 3 whilte lists - the first is general HTML tags that I will allow, followed by the second, which is allows <a href> tags, while the third allows <img> tags. <script> is not allowed. Nor, in this example, are h4, h5 or h6 tags. You could change that by altering h(1|2|3) to read h(1|2|3|4|5|6) in the _whitelist Regex pattern.

The return type for for the AsSafeHtml() method is an string. This still doesn't take care of the problem we had earlier when by default, all output is Html encoded. It needs to be converted to an HtmlString. Since the AsSafeHtml() method has been made an extension method, using it is simple:

@{
    var input = Request.Unvalidated("input");
}
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8" />
        <title></title>
    </head>
    <body>
    <form method="post" action="">
        @Html.TextArea("input")
        <br />
        <input type="submit" name="action" value="Submit" />
        <br />
        @if(!input.IsEmpty()){
            @(new HtmlString(input.ToString().AsSafeHtml()))
        }
    </form>
    </body>
</html>

Now when <strong>Hello World!</strong> is entered into the form, the output is as desired:

If someone tries to enter html tags that do not appear in the white list, such as <script>, they are removed:

SQL Injection And Parameters

Now that we have learned how to validate input and how to accept html safely, we still need to prevent users from trying to inject SQL. The first form of defence has already been put in place - if a value is expected to be anything other than a string, the IsInt(), IsDecimal() etc test will prevent arbitrary strings from successfully being appended by an attacker. Nevertheless, strings are still vulnerable. We could consider adopting a black list approach to prevent SQL injection. This would entail screening input against known SQL keywords (Drop, Create, Execute etc) and syntax such as single quotes, double dashes, but there are two problems with this. The first is that there could be legitimate reasons for a user to try supplying many SQL keywords. Many of them are in common English usage. The second is that the black list will need to be maintained as new keywords are introduced.

The only acceptable protection against SQL injection is to use Parameters. Fortunately, Web Pages makes this trivial, so there really is no excuse whatsoever for not doing this. Each of the methods that you use to execute a command against a database, Database.Query(), Database.QuerySingle(), Database.QueryValue() and Database.Execute() take a string representing the command to execute and an optional array of objects which represent parameter values. Let's assume that you have a form for adding a new article to a database. The article is made up from a number of elements - a title, introduction, and author and a body. You decide to allow html in the body so that the author can format the resulting output as he or she likes. The code for the page should look something like this:

@{

    if(IsPost){
        var title = Request["title"];
        var intro = Request["intro"];
        var body = Request.Unvalidated("body");
        if(!body.IsEmpty()){
            body = body.AsSafeHtml();
        }
        var author = Request["author"];
        var db = Database.Open("MyDatabase");
        var sql = "INSERT INTO Articles (Title, Intro, Body, Author) VALUES (@0, @1, @2, @3)";
        db.Execute(sql, title, intro, body, author);
    }
}
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="utf-8" />
        <title></title>
    </head>
    <body>
    <form method="post" action="">
        <div>Title</div>
        <div>@Html.TextBox("title")</div>
        <div>Intro</div>    
        <div>@Html.TextArea("intro", new{cols="40", rows="4"})</div>
        <div>Body</div>
        <div>@Html.TextArea("body", new{cols="40", rows="8"})</div>
        <div>Author</div>
        <div>@Html.TextBox("author")</div>
        <input type="submit" name="action" value="Submit" />
    </form>
      
    </body>
</html>

We have excluded the textarea that contains the body part of the article from Request Validation. Other form fields are still protected. If the form is submitted, each of the fields of data are transferred to variables and then a conection to the database is opened. The SQL statement looks quite normal up until the values. The @0, @1 etc are the parameter placeholders and tell the database to expect some values to be passed in separately. Calling the parameters @0, @1 and so on is essential to getting them to work correctly. Always start at 0 and increment by 1 each time. When the Database.Execute() method is called, the first argument is the SQL, and then the values for the paramters are passed in, each one separated by a comma, and in the order in which they appear in the SQL itself. Again, position is all-important. SQL Server (the full and Express versions) work on named parameters, which means the matching of values to parameters is done on the names given, but this is not the case with the Web Pages Database methods.

How does this protect us against SQL injection? If someone added "DROP Table Articles" to one of the form fields, why wouldn't this get executed against the database? Simply put, parameter values are considered literal strings if targeted at text-based fields in the database. Any single quotes are automatically escaped and the whole thing is wrapped in the correct delimiters by the database provider. For numeric types, once the database provider has established that the destination field in the database is expecting some kind of number, anything passed in the parameter value that cannot be converted to a numeric type will cause an error message to be raised. Simple, and very safe.

Conclusion

What constantly amazes me is that protecting your application is pretty easy to do, yet so many people fail to take the basic precautions. OK, it requires a bit more work than not doing so, and I'm sure that many people don't take these kinds of precautions because they think that hackers will never target their application. The most recent attacks that have been reported in the ASP.NET forums recently show that anyone is vulnerable. Not only that, but the more common attacks actually take advantage of both SQL injection AND XSS, in that they first pry for a SQL injection vulnerability, and once found, use it to inject script into every row in a database table. Now that you know what the threats are and how to code defensively against them, don't let it happen to you.