RSS Newsfeed Workbench

Robyn and Phil decide to build an RSS newsfeed in TSQL, using the power of SQL Server's XML.

Have you ever fancied being able to put Simple-Talk’s content into a database? We haven’t either, but we’ll do it just for practice. The objective of this workbench is to show just how easy it is to implement an RSS-based datafeed in SQL Server 2005. The purpose is to try out downloading an XML file from the internet, checking it for validity, and if not, seeing if it is a redirect. We’ll show a cool SQL Server 2005 way of reading a text file into a VARCHAR(MAX) variable. We’ll do some shredding of an XML file too. By the time we’ve finished we hope we will have illustrated some useful techniques and given you an application that you can turn into a production-quality newsreader.

It is very easy to read RSS feeds and OPML feeds into SQL Server 2005. This is because the files are XML and are easily eaten by the XML functions. RSS, which doesn’t actually stand for anything other than RDF Site Summary, is a simple way of publishing, or broadcasting, frequently updated Web content such as articles and blog entries. OPML, originally conceived for transferring outlines, is a format that has been used widely for distributing collections of RSS feeds. However, it has a large number of uses in transferring simple structured list-based information. As it is used for exchanging lists of RSS feeds, we’ll show how to read in an OPML list of feeds.

Please remember before we start that this code is for illustration and is not ‘production quality’. You will need to sort out the security issues (see the process delegation workbench) and add logging and more error-checking.

We won’t attempt to store all the information that RSS2 defines, simply because we can’t find any sites that use all this information. In fact, most feeds use only a small part of the information and few aggregators seem to display the extra information anyway. We’ll boil it down to two relational tables.

Firstly, create a new database called RSSFeedWorkbench. Make sure you have the rights to execute xp_cmdshell too!

The RSSFeed table contains every RSS feed that we want to read from. It is used to poll around the feeds. This routine checks to see whether the TYPE field is set to RSS (or RSS2 etc). If it isn’t then it isn’t read.

Although there are more portable ways of getting the contents of the URL, we’ll use CURL.EXE, which is free, and very easy to install, and useful for a wide range of purposes. You’ll need to download and install this to get the Workbench to run. You’ll need to install the OpenSSL package too from here .

Some RSS feeds will try to stop non-browser software from reading an RSS feed. Curl has useful facilities that allow it to mimic a browser to the extent of allowing cookies and letting you specify the user agent.

This is the stored procedure that grabs the contents of the feed, checks to see if is one we know about and, if not, saves its attributes. then it saves any items it has not already saved. As it saves the url of the feed, you can use it to add the feed to your repository, and then it will automatically refresh it (if you install the spUpdateAllFeeds stored procedure in the right place. This stored procedure is more complex than it might be because one can try to access a feed only to find it is a redirection to the actual feed. If this is the case then one has to read the file as an HTML file, get the anchor, and use the contents of the HREF as the source of the newsfeed, before trying again!

In the list of files to download will be some sample OMPL files to try out. Now when you’ve loaded in a few RSS feeds, all you have to do is to put this stored procedure on the scheduler and this will keep your News up-to-date. Instant Blogrolls! The first person to write an OMPL exporter for this will get a Simple-Talk goodie bag!