Create a Google Site Map with ASP.NET

Over Christmas, I had a bit of spare time, so I decided to add some functionality to this site. One of the things that I added was the ability for people to post comments about specific articles, and you will see that a form has been added at the end of this article. The other thing I have been meaning to get round to is to add a Google Site Map. I believe I have created my site to be as friendly to Google's bots as possible, but there is no harm in submitting a site map to guarantee that Google knows about all the pages I want indexed.

Google publish extremely helpful guidelines for site owners (are they still called Web Masters??), including details on the specification of the sitemap format that they are happy to accept. The current standard is SiteMap Protocol 0.9, and is the definition of a fairly simple xml file. One way to create ths is manually, but doing so means that I would have to update it every time I added an article, or even deleted one. So, much like my current RSS feed, I decided to generate the sitemap dynamically on demand.

The xml format is simply this:

 

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

  <url>

    <loc>http://www.mikesdotnetting.com/</loc>

    <lastmod>2009-01-01</lastmod>

    <changefreq>monthly</changefreq>

    <priority>0.5</priority>

  </url>

  <url>

    <loc>http://www.mikesdotnetting.com/Contact.aspx</loc>

    <priority>0.3</priority>

  </url>

</urlset>

 

For each <url> tag, there must be a <loc> tag pointing to the URL of the page. <lastmod>, <changefreq> and <priority> are optional. More details can be found at the link above for the specification.

I have four distinct sections that I want to include: the home page, the articles, the list of articles by type and the list of articles by category. I'm not bothered about Google indexing the About Me or Contact pages. Very few people are that interested in finding out how boring I am and subsequently wanting to get in touch. The Home page is always the same URL, so most of the <url> section could be hardcoded for that, but I need to get the creation date of the most recent article for the <lastmod> tag. I also need to get the IDs and the last modified date for each of the articles for section 2, and the list of IDs for the Article Types and the Categories for the remaining sections. So the stored procedure is simple, and contains just 4 SELECT statements, one after the other. Rather than post bits of the code as I usually do, here it is all in one go, with some explanation at the end as to how it works:

using System;
using System.Configuration;
using System.Data;
using System.Data.SqlClient;
using System.Text;
using System.Web;
using System.Xml;

namespace Mikesdotnetting
{
    
    public class Sitemap : IHttpHandler
    {

        public void ProcessRequest(HttpContext context) {
            context.Response.ContentType = "text/xml";
            using (XmlTextWriter writer = new XmlTextWriter(context.Response.OutputStream, Encoding.UTF8)) {
                writer.WriteStartDocument();
                writer.WriteStartElement("urlset");
                writer.WriteAttributeString("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9");
                writer.WriteStartElement("url");

                string connect = ConfigurationManager.ConnectionStrings["MyConnString"].ConnectionString;
                string url = "http://www.mikesdotnetting.com/";
                using (SqlConnection conn = new SqlConnection(connect)) {
                    using (SqlCommand cmd = new SqlCommand("GetSiteMapContent", conn)) {
                        cmd.CommandType = CommandType.StoredProcedure;
                        conn.Open();
                        using (SqlDataReader rdr = cmd.ExecuteReader()) {
                            // Get the date of the most recent article
                            rdr.Read();
                            writer.WriteElementString("loc", string.Format("{0}Default.aspx", url));
                            writer.WriteElementString("lastmod", string.Format("{0:yyyy-MM-dd}", rdr[0]));
                            writer.WriteElementString("changefreq", "weekly");
                            writer.WriteElementString("priority", "1.0");
                            writer.WriteEndElement();
                            // Move to the Article IDs
                            rdr.NextResult();
                            while (rdr.Read()) {
                                writer.WriteStartElement("url");
                                writer.WriteElementString("loc", string.Format("{0}Article.aspx?ArticleID={1}", url,  rdr[0]));

                                if (rdr[1] != DBNull.Value)
                                    writer.WriteElementString("lastmod", string.Format("{0:yyyy-MM-dd}", rdr[1]));
                                writer.WriteElementString("changefreq", "monthly");
                                writer.WriteElementString("priority", "0.5");
                                writer.WriteEndElement();
                            }
                            // Move to the Article Type IDs
                            rdr.NextResult();
                            while (rdr.Read()) {
                                writer.WriteStartElement("url");
                                writer.WriteElementString("loc", string.Format("{0}ArticleTypes.aspx?Type={1}", url, rdr[0]));
                                writer.WriteElementString("priority", "0.5");
                                writer.WriteEndElement();
                            }
                            // Finally move to the Category IDs
                            rdr.NextResult();
                            while (rdr.Read()) {
                                writer.WriteStartElement("url");
                                writer.WriteElementString("loc", string.Format("{0}Category.aspx?Category={1}", url, rdr[0]));
                                writer.WriteElementString("priority", "0.5");
                                writer.WriteEndElement();
                            }
                            writer.WriteEndElement();
                            writer.WriteEndDocument();
                            writer.Flush();
                        }
                        context.Response.End();
                    }
                }
            }
        }

        public bool IsReusable {
            get {
                return false;
            }
        }
    }
}

The code above is placedin the ProcessRequest method of a Generic Handler (ashx file). This is the most efficient way of outputting formatted text which is not HTML-based to the browser in ASP.NET Web Forms.

Then it's just a question of iterating the different ResultSets within the SqlDataReader object. The result of the first SELECT statement gives me the date that the most recent story was added. This tells me when the Home page was last modified. The String.Format() method is used to ensure that the format of the date complies with one of the two options that the Sitemap Protocol will accept as valid. I set the change frequency to weekly, as that's about how often I hope to add articles. Then I gave the page the priority of 1.0. That means nothing in terms of page ranking or search engine results. It simply tells Google that I think the home page is the most important page in relation to all others on my site.

Then I use the NextResult() method of the SqlDataReader. This moves things on to the results of the second SELECT statement, which has the AricleID and the DateModified in it. If there is a DateModified, I add the optional <lastmod> tag, and I give all the pages the same priority. It occurred to me that, since I keep a count of the article views in the database, I could have weighted each article based on that count, and applied a different priority to each of them. However, that can keep to another time, perhaps.

The final two calls to NextResult() are pretty self-explanatory.

I then logged into my Google account, and submitted the URL (http://www.mikesdotnetting.com/SiteMap.ashx) for my site map, having tested that it validates against the protocol.