Handling Legacy URLs with ASP.NET MVC

According to Google's Webmaster tools, there are about 15,000 incoming links to my site. 13,000 of those reference a .ASPX file on disk. When I convert to MVC, with new Search Engine Friendly urls, all those links will break unless I do something about it. Presenting users with a 404 - file not found is not an option. I need to show them the content they were expecting, and update Search Engines with the news that things have changed. Here's how I will be managing those Legacy URLs.

Quick Overview of Routing

System.Web.Routing was introduced to ASP.NET with 3.5 SP1 to be used primarily with Dynamic Data applications. Whereas, with the vast majority of technologies, a URL maps to physical files on disk - examples of these include .aspx files, ashx, .asp, .htm, .php. .gif etc - System.Web.Routing's main purpose was to provide a means to map or route requests (URLs) to resources other than these types of physical file. Within an ASP.NET MVC application, these resources are generally represented by methods exposed by classes that inherit from System.Mvc.Controller, or Controller actions. To begin with, understanding how this all works can be as difficult as getting to grips with Regular Expressions. However, we'll give it a try.

All routes derive from an abstract class called RouteBase. This defines the properties and methods that you might need to manage your own route, although in the main, you will use the out-of-the-box System.Web.Routing.Route subclass. A collection of Routes is kept in the RouteTable.Routes property. The default set up within Global.asax of an MVC application contains a class - MVCApplication, which includes a static method (RegisterRoutes) that builds a RouteTable's Routes collection. within the Application_Start() event, this static method is called, and the configuration of routes for the application is persisted throughout its lifetime.

When a request is made to an MVC application, an HttpModule (UrlRoutingModule) checks RouteTable.Routes for an entry that matches the pattern of the requested URL. It starts from the top and checks each entry in turn until it finds one that matches. Once one is found, the matching Route object's GetRouteData() method is invoked which provides information about the route, such as how it is to be handled. The GetRouteData() method takes one parameter - an HttpContextBase object, which holds all the information you need about the HTTP Request that was made, including querystrings values, form values, cookie data, HTTP headers etc. Finally, the HttpModule invokes the RouteData's RouteHandler, which is typically of type MvcRouteHandler. It passes the handler a RequestContext object, which contains an HttpContext object and a RouteData object. The reason I have covered all of this is because this is the point at which we need to interrupt to examine whether a request is for a legacy .aspx file. We then need to provide an alternative RouteData structure so that the correct controller action is invoked in response.

Before looking at how all that is achieved, there are some other problems that need to be addressed. The first is SEO, or Search Engine Optimisation. I have already decided that my replacement for e.g. Article.aspx?ArticleID=100 will be Article/100/Experimenting-with-jQuery-Draggables-and-ASP.NET.In other words, as well as the ID of the article (see later for why), I am using the title. Instead of spaces (or %20) between words in the title, there will be hyphens. It appears that search engines like finding key words in the URL, and will give more weight to their relevance. It also appears that search engines are happy with hyphens, which they see as spaces. Not all of the titles of my articles lend themselves to this approach. For example, anything that currently includes some form of punctuation might look strange. So the first thing I need is a method to clean them up:


namespace MikesDotnetting.Helpers
{
  public static class UrlTidy
  {
    public static string ToCleanUrl(string urlToTidy)
    {
      var url = urlToTidy.Trim();
      url = url.Replace("  ", " ").Replace(" - "," ").Replace(" ", "-").Replace(",", "").Replace("...","");
      return url;
    }
  }
}

The next thing involves the controller that is responsible for managing requests for articles.The Route entry that causes the relevant controller action to be invoked is as follows:


routes.MapRoute(
     "Show",
     "{controller}/{id}/{title}",
     new { controller = "Article", action = "Show", id = "", title = "" }
 );

And the action itself:


public ActionResult Show(int id)
{
  return View(repository.GetArticle(id));
}

If you compare the two, you will see that the {title} parameter is ignored by the controller action. All it looks for is the id, which gets passed to the GetArticle() method of my Repository. The reason for this is that it is a lot quicker to find data according to the primary key of a table than it is to do a string comparison. In addition, I am changing the title that appears in the URL by the addition of hyphens and the removal of other punctuation, so trying to compare an actual title to the representation of one within a URL is going to be problematic. In other words, the title part of the URL is purely decorative as far as MVC is concerned. There is another reason why the title part of the URL is purely decorative, and that is that I may want to edit a title at some stage after links have been published. If I do, the article will still be found so long as the ID appears correctly in a request. Nevertheless, I need a title when handling legacy Article.aspx requests. I explain why when I come to use it, but in the meantime, I'll add the GetArticle method to the Repository:


public IEnumerable<ArticleTitle> GetArticleTitle(int id)
{
  return (de.ArticleSet
             .Where(a => a.ArticleID == id)
             .Select(a => new ArticleTitle
                            {
                              Head = a.Headline
                            }));
             
}

Now to the real business - building my custom Route object. I'll plonk all the code for the LegacyUrlRoute class in one go and then explain it:


using System;
using System.Web;
using System.Web.Routing;
using MikesDotnetting.Controllers;

namespace MikesDotnetting.Helpers
{
  public class LegacyUrlRoute : RouteBase
  {
    public override RouteData GetRouteData(HttpContextBase httpContext)
    {
      const string status = "301 Moved Permanently";
      var request = httpContext.Request;
      var response = httpContext.Response;
      var title = "";
      var legacyUrl = request.Url.ToString();
      var newUrl = "";
      var id = request.QueryString.Count != 0 ? request.QueryString[0] : "";
      
      if (legacyUrl.Contains("Article.aspx"))
      {
        var rep = new ArticleRepository();
        var article = rep.GetArticleTitle(Convert.ToInt32(id));
        foreach (var a in article)
          title = UrlTidy.ToCleanUrl(a.Head);
        newUrl = "Article/" + id + "/" + title;
        response.Status = status;
        response.RedirectLocation = newUrl;
        response.End();
      }
        return null;
    }

    public override VirtualPathData GetVirtualPath(RequestContext requestContext, 
				RouteValueDictionary values)
    {
      return null;
    }
  }

  
}

First thing to notice - all Route objects must derive from RouteBase. LegacyUrlRoute is no different. Both of the virtual methods of RouteBase are overridden - GetRouteData (which returns a RouteData object) and GetVirtualPath() which returns a VirtualPathData object. However, my overridden method does not return a RouteData object. That's because no request that matches the condition within the method will be processed at all.

Initially, some variables and one contant are created. The constant is an HTTP Status code that informs user agents (browsers and search engine bots) that the resource they are looking for has been moved to another location. It should make no difference to existing links on blogs and forums that human visitors follow, but search engines will hopefully update their indexes. and this is why I need a title. I want the search engines to store the whole link. The other variables reference the current HTTP Request and Response "contexts" (Microsoft really love that word, don' t they?), the currently requested URL and a querystring value (where it exists).

If the currently requested URL contains the string "Article.aspx", it's a legacy url. The first thing that happens is that the Article Title is obtained from the method in the Articles Repository that was introduced earlier. (I suspect that when this goes live, I shall map IDs to titles in a XML file and reference that instead of calling the database.) The title is then tidied up by the helper ToCleanUrl() method, and used to construct a new MVC URL. From there, an HTTP Response is prepared and sent. The status code is provided using the constant, and the new location for future requests is passed in to the RedirectLocation property. Response.End() is called, which prevents any further processing for this particular request, and the response is sent back to the user agent. No RouteData structures were built or referenced, and no HttpHandlers invoked.

If the requested URL does not contain "Article.aspx", null is returned so that the UrlRoutingModule can continue to try to match the URL to other routes within the RouteTable.Routes collection.

One final task, and that is to register the LegacyUrlRoute in the application's RouteTable. and that's done right at the beginning of the method in Global.asax:


public static void RegisterRoutes(RouteCollection routes)
{
  routes.IgnoreRoute("{resource}.axd/{*pathInfo}");
  
  routes.Add(new LegacyUrlRoute());

And now, if a request is made to an old URL, such as http://www.mikesdotnetting.com/Article.aspx?ArticleID=100, it is automatically redirected to http://www.mikesdotnetting.com/Article/100/Experimenting-with-jQuery-Draggables-and-ASP.NET with the correct header sent to the user agent.

Date Posted: Monday, June 8, 2009 7:37 AM
Last Updated: Friday, October 10, 2014 9:10 PM
Posted by: Mikesdotnetting
Total Views to date: 53994

16 Comments

Tuesday, June 9, 2009 11:39 AM - Peter

This is one of the clearest explanations I have found for Routing so far. Thanks very much!

Btw, when does your MVC version Go Live?

Tuesday, June 9, 2009 9:08 PM - Nick Berardi

Wouldn't it be a lot easier to use a URL Rewriter to handle legacy URL's instead of building in tests in a route? It seems like a lot of extra work, and it would required a new compiled application and deployment each time a new route was discovered. Wouldn't a much better solution be a URL Rewriter that used an external config, such as http://urlrewriter.codeplex.com?

Tuesday, June 9, 2009 9:17 PM - Nick Berardi

For example if you used the URL Rewriter at http://urlrewriter.codeplex.com, and created an extension module, for you database lookups, like indicated here http://www.coderjournal.com/2008/12/creating-extension-module-net-url-rewriter-reverse-proxy/ you could accomplish the same thing, with out having to try to integrate old legacy URL's in to your brand spanking new app. :)

I just like to keep new Apps clean of legacy stuff, and let outside sources handle all the legacy translations.

Tuesday, June 9, 2009 10:33 PM - Mike

@Nick

I don't usually approve comments that promote other people's stuff (you'll notice my site is an advert-free zone...), but in your case I made an exception :o)

Would it be a better solution to use a 3rd party Open Source solution? I'll let users judge that for themselves. If they are allowed to use OSS, of course.

Without having gone into your product (with your company's logo all over it) in any great detail, I wonder how having to build extension modules to said 3rd party products using Regex is easier or cleaner than using System.Web.Routing.

It could be said that just plopping the conditional tests into Application_BeginRequest() is easier than creating a custom Route as demonstrated in the article.

All depends on what floats your boat ;-)

Wednesday, June 10, 2009 1:44 AM - Nick Berardi

I don't usually go around posting my link everywhere either. Also to go to your point of the logo, the URL Rewriter space is very saturated, even calling it .NET URL Rewriter, would bump up against 3 other products. So I just choose to put my companies name on it, which is basically my private consulting business. It is no different than naming it Nick Berardi's Url Rewriter. I make absolutely no money off this software, and I probably spend way too much time on it. :)

But to go more to your point. The URL Rewriter that I posted is based off a common mod_rewrite syntax from Apache. I have just extended the syntax slightly to allow for pluggable modules, developed in .NET using a common interface. The post I referred you to is basically the same thing you are doing taking an old URL, doing a database lookup and then redirecting.

The only difference is that Routing was designed to be an API definition for your application. Not a handler for the manipulation of the URL. I actually get asked this question a lot:

"Now that ASP.NET MVC has routing you don't need a URL Rewriter right?"

And I always have to explain that while routing makes it easier to create nice-looking URLs, it is still designed as an API definition for your application, not a handler for the URL. I have always believed that you should use the right software for the task at hand. And that is why if you need to handle the URL, such as what you are doing above, you need to use a piece of software designed to handle URLs. Whether it is a URL Rewriter or like you said Application_BeginRequest (which is basically where my rewriter starts it process).

I probably don't have to explain all the different URL actions you have to check when moving and updating a blog, or even the day to day SEO up keep. Such as making sure everything redirects to www.yourdomain, making sure old URL's don't sneak through, etc. There are all these common examples from [http://www.helicontech.com/isapi_rewrite/doc/examples.htm] that would be tough implement it straight up code. That is why a URL Rewriter with it's own DSL is so important.

I don't really know where I am going with this. :) But my point boils down to in my experience it is hard to say up front what URL's you are going to need to modify and redirect in a blog until you start seeing stuff come in from Google and other user agents. And it would really stink if you had to recompile your application each time you needed to tweak a URL handler. Plus IMO System.Web.Routing is for API definition not manipulation.

But either way it was a good write up about getting started with writing Routes. :)

Wednesday, June 10, 2009 8:10 AM - Mike

@Nick,

"Nick Berardi's URL Rewriter" has a nice ring to it ;o)

Wednesday, June 10, 2009 9:58 AM - Binary

Another possibility which is a bit simpler than creating a route is to just create a stub article.aspx page and put the redirect code in there.

This can be a bit more flexible if you need to do other stuff in the page before executing the redirect (e.g. if your site has a pagebase that executes code for authentication or logging)

ASP.net routing only kicks in if the the file is not physically there, so this works fine.

Wednesday, June 10, 2009 4:59 PM - Mike

@Binary

Indeed that is a possibility. It's what I had to resort to when managing old classic ASP urls for a site that was converted to ASP.NET some years back - put an old "Article.asp" file up, and put a redirect in it. But if you have a large number of those, it will definitely clutter things up.


Friday, August 21, 2009 4:30 PM - Matt Roberts

This is well written article - thanks :)

Tuesday, September 1, 2009 7:58 AM - Asif Ashraf

The article is well written, I have read it all through till the end. But the problem I was trying to address is still not resolved. Its not like I am getting some older .aspx links, But its about Routing for sure.
Perhaps you can help me in that
There is a website http://radmade.com which is just using the basic ASP.NET MVC Template. There is nothing customized it was a complete raw template and I deployed there.

The routing problem I am facing is that the Home page comes fine. And the global file is telling that this is Views/Home/Index.aspx hitting Index() controller fine.
But if you click on the About link it will show a 404 error. The Index() controller was being hit for Sure but not About()

the big problem is that when I write http://radmade.com then the Home/Index view shows okay. But if I write http://radmade.com/Home or http://radmade.com/Home/Index it will give 404 error.

What can be the wrong there?

Wednesday, September 2, 2009 7:23 AM - Mike

@Asif

You will get a much quicker answer to your question if you post it to the forums at www.asp.net. As part of your question, you should also provide the routing code that you are trying to use. Otherwise all anyone can do is guess.

Tuesday, January 25, 2011 3:16 PM - Tom Teman

Thank you very much! I rewrote a website in ASP.NET MVC and this was exactly what I needed!

However, I followed a tip from another stackoverflow to also add this line when creating the response:

response.AppendHeader("Location", "/" + newUrl + "/");

otherwise, the redirection goes into a loop. I think it is also more SEO friendly that way.



In addition, in regards to adding the route:

routes.Add(new LegacyUrlRoute());

very important that it is placed right after:

routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

and not before! (also causes a loop)

Tuesday, August 14, 2012 7:07 AM - Sagar

HI,

How are we to do a redirect to site home page.Presently when i try to redirect to site home using the following synatax,

response.Redirect(www.domain.com);
response.End();

I end up having the following url in the browser,

www.domain.com/www.domain.com,

Thanks.

Tuesday, August 14, 2012 7:51 AM - Mike

@Sagar,

Prefix your url with "http://"

Response.Redirect("http://www.domain.com");

Tuesday, November 26, 2013 9:50 PM - Adam Tal

Amazing work & well written..

Going to use your technique in my next project which has the same url legacy problem..

After going through the comments, I'm still sure your sloution is the best (better then Nick's and much better then Binary's).

1 thing that I'm missing is some detail about search engines (especially google).
You've stated that this is SEO friendly, which makes me geuss that when a search engine stumbles the old url and recieves a "301 Moved Permanently" with a redirect response - it updates the information it has about that page (updates the url) and keeps the rating of the page and the site.

Is it correct? Do you think there's a chance such a migration will hurt the sites page rank? And if such a thing is possible wouldn't it be better to map the legacy url to the relevant action and give the new response without the redirect?

Saturday, December 7, 2013 8:48 PM - Mike

@Adam,

I'm sorry - I'm no SEO expert. I don't know the answers to your questions.
Add your comment

If you have any comments to make about this article, please use this form to do so. Make sure that your comment relates specifically to the article above. More general comments can be posted through the form on the Contact page.

Please note, all comments are moderated, and some may not be published. The kind of things that will ensure your comment is deleted without ever seeing the light of day are as follows:

  • Not relevant to the article
  • Gratuitous links to your own site or product
  • Anything abusive or libellous
  • Spam
  • Anything in a language I don't understand including gibberish.

I do not pass email addresses on to spammers, so a valid one will assist me in responding to you personally if required.

Recent Comments

Gautam 11/20/2014 8:01 AM
In response to I'm Writing A Book On WebMatrix
Hello Mike, I read your book, loved it! However, I have a few request/suggestions: 1) an example...

Bret Dev 11/19/2014 8:39 PM
In response to The Difference Between @Helpers and @Functions In WebMatrix
Excellent post! One concern - where can you place global @Functions code within an MVC project to Is...

Rob Farquharson 11/19/2014 4:28 PM
In response to iTextSharp - Links and Bookmarks
How can I place text at an absolute position on the page? Also, how can I rotate text?...

Andy 11/17/2014 8:08 PM
In response to MVC 5 with EF 6 in Visual Basic - Sorting, Filtering and Paging
Hello I'm testing your sorting instructions above. This is great and I was able to get it to work...

Gautam 11/17/2014 5:51 PM
In response to WebMatrix - Database Helpers for IN Clauses
Hi Mike, I am very new to programming: In the above example if I want to use a delete button the...

donramon 11/17/2014 3:22 PM
In response to Entity Framework 6 Recipe - Alphabetical Paging In ASP.NET MVC
Congratulations on your new website look and the excellent articles. Thank you!...

Gautam 11/17/2014 11:26 AM
In response to Looking At The WebMatrix WebGrid
Hi Mike, I add the jquery script at the end of my html file.. when ajax attribute is added to the be...

Chet Ripley 11/15/2014 6:57 PM
In response to Adding A New Field
It appears the command is case sensitive. I had the same issue as Cameron. When I changed the to it...

Alvin 11/14/2014 12:49 PM
In response to Razor Web Pages E-Commerce - Adding A Shopping Cart To The Bakery Template Site
Great article Mike! When do you plan to extend the bakery shopping cart beyond this point?...

Gautam 11/14/2014 10:16 AM
In response to Web Pages - Efficient Paging Without The WebGrid
to get the count can we use only the below sql, why to join category and author table var sql =...