{"id":5211,"date":"2013-03-18T19:12:46","date_gmt":"2013-03-18T19:12:46","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/a-powershell-rss-reader-using-an-opml-file\/"},"modified":"2017-11-01T15:38:53","modified_gmt":"2017-11-01T15:38:53","slug":"a-powershell-rss-reader-using-an-opml-file","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/a-powershell-rss-reader-using-an-opml-file\/","title":{"rendered":"A PowerShell RSS Reader using an OPML file"},"content":{"rendered":"<p>To celebrate the announcement of the planned \u00a0demise of Google Reader, I&#8217;ve done a PowerShell script that gives you the items from the OPML collection of feeds that you import or export between your feed readers. Basically, you create your own primitive feed reader. I&#8217;m afraid it isn&#8217;t as good as Google Reader.<\/p>\n<p>So what is involved?\u00a0 RSS\/Atom is a rather loose definition, in that the only attribute a feed item actually needs is link and the content. The spec has been liberally interpreted too, so that there isn&#8217;t much you can really guarantee being able to read every RSS file&#8230;<\/p>\n<p>To get a well-constructed \u00a0RSS feed is trivial. In PowerShell v3, it is a one-liner.\u00a0 The problem is in getting resilience. \u00a0To get every feed to work is a struggle, and so I apologise for giving up at a point.<\/p>\n<p>Because I can throw lists of links at this routine instead of an OPML, or use it in a function with several OPML files, I use this type of PowerShell routine for specific tasks such as checking to see if particular groups of sites have had postings. It is very easy to set up an alert if a particular site gets a posting.<\/p>\n<p>I&#8217;ve added things to the script to take out all the HTML tags from the description and just view the first five-hundred characters. I&#8217;ve limited it to the first hundred feeds just to test it, and I&#8217;ve limited it to report just the current days articles. You&#8217;ll want to change all that, I expect.<\/p>\n<p>You&#8217;ll need to fill in the path to the location of your \u00a0OPML file (basically an XML list of links), and the number of days back you want to read items from,\u00a0 and either change or delete the &#8216;Select -first 100 | &#8216; bit, which just gets the first articles. You&#8217;ll want to change the (truncate ($_.xxx -replace &#8220;&lt;.*?&gt;&#8221;) 500) (take out all the HTML tags and truncate to 500 characters or less) to suit your tastes.<\/p>\n<p>\u00a0At the end of the pipeline you can, of course, save the results to a database or file, or maybe send it as an email, or format it into an HTML file: but there is no sense in adding all that stuff because you know it already!<\/p>\n<pre class=\"theme:powershell-ise lang:ps decode:true\">  $MyOPMLFile= '.\\AllMyFeeds.opml'  #change this to the name of your OPML file\r\n $RestError=[xml]'&lt;broken&gt;&lt;\/broken&gt;'\r\n $DaysBack=[int]-1 #the number of days back you want articles from\r\n\r\nfunction truncate([string]$value, [int]$MaxLength)\r\n {#can you believe there is no powershell built-in way of doing this?\r\n\u00a0\u00a0\u00a0 if ($value.Length -gt $MaxLength) { $value.Substring(0, $MaxLength) }\r\n\u00a0\u00a0\u00a0 else { $value }\r\n }\r\n\r\n [xml]$opml= Get-Content $MyOPMLFile # grab the  OPML file of feeds\r\n $opml.opml.body.outline.outline.xmlurl| Select -first 100 | # only the first few for  testing\r\n\u00a0foreach {try{Invoke-RestMethod $_} catch{ $RestError }} | # flag if an error happened\r\n\u00a0\u00a0 where {{try {$_.SelectSingleNode('link')} catch{$null} -ne $null}} | #filter out 404s, malformed items\u00a0 and bad links\r\n\r\n &lt;# Each &lt;item&gt; within a feed represents an article. The &lt;item&gt; must include at least the  following elements:\r\n\u00a0\u00a0\u00a0 &lt;link&gt;: The canonical URL for the article.\r\n\u00a0\u00a0\u00a0 &lt;content:encoded&gt;: The full HTML content of the article.\r\nBut you are also likely to find ..\r\n\u00a0\u00a0\u00a0 &lt;title&gt;: The article's headline. If it isn't there, you'd need to find it in the content\r\n\u00a0\u00a0\u00a0 &lt;pubDate&gt;: The date of the article's publication, in RFC822 format.\r\n\u00a0\u00a0\u00a0 &lt;description&gt;: A short, summary or abstract of the article.\r\n\u00a0\u00a0\u00a0 &lt;dc:creator&gt;: Name of the person who wrote the article.\r\n\u00a0\u00a0\u00a0 &lt;media:content&gt; and &lt;media:group&gt;: URLs and metadata for image, video, and audio assets.\r\n#&gt;\r\n\u00a0\u00a0\u00a0\u00a0 Select @{name=\"Title\"; Expression = {try {$_.title} catch {'Unknown title'}}},\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @{name=\"Description\";  # this isn't mandatory, but you can get the content\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Expression ={try { if ($_.SelectSingleNode('description') -eq $null)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 { truncate ($_.encoded.'#cdata-section' -replace \"&lt;.*?&gt;\") 500}\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0elseif ( $_.description.ToString() -eq 'description')\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {truncate ($_.description.'#cdata-section'\u00a0 -replace \"&lt;.*?&gt;\") 500 }\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else {truncate ($_.description\u00a0 -replace \"&lt;.*?&gt;\") 500 }}\r\n\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0catch {'error'}}},\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @{name=\"PubDate\"; Expression = {try {get-date ($_.PubDate -replace \"UT\")} # force it into a PS date\u00a0 \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 catch {Get-Date '01 January 2006 00:00:00'}}},\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @{name=\"author\"; Expression = {try {if ( $_.author.length -eq 0) {$_.creator}\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else {$_.author}}\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 catch{'Unknown Author'}}},\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 link | #we already checked for a link!\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0where-object {$_.Pubdate -gt\u00a0 (Get-Date).AddDays($DaysBack)}\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 # we only get the fresh  news from the last couple of days.\r\n<\/pre>\n<p>If\u00a0 you don&#8217;t already have an OPML file to practice on, here is one you can use that I&#8217;ve put together to give you exciting articles and blogs from Simple-Talk. Just save it to a file, extend it with your favourite blogs and sites, and you&#8217;ll soon be wondering why you ever felt that Google Reader was essential! Of course, you can still use the routine above with a simple list of RSS feeds, but then you wouldn&#8217;t have something that could be stitched into your news feed reader OPML file.<\/p>\n<pre class=\"theme:vs2012 lang:xhtml decode:true \">&lt;?xml version=\"1.0\" encoding=\"ISO-8859-1\"?&gt;\r\n     &lt;opml version=\"1.1\"&gt;\r\n        &lt;head&gt;\r\n            &lt;title&gt;SimpleTalk Subscriptions&lt;\/title&gt;\r\n            &lt;dateModified&gt;Wed, 20 Mar  2013 07:21:56 GMT&lt;\/dateModified&gt;\r\n        &lt;\/head&gt;\r\n        &lt;body&gt;\r\n            &lt;outline text=\"simple-talk\"&gt;\r\n                &lt;outline text=\"Home Page\" title=\"Simple Talk Home Page\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/feed\/\" htmlUrl=\"https:\/\/www.simple-talk.com\/\"\/&gt;\r\n                &lt;outline text=\"SQL Articles\" title=\"SQL Home\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/sql\/rss.aspx\" htmlUrl=\"https:\/\/www.simple-talk.com\/sql\/\"\/&gt;\r\n                &lt;outline text=\".NET Articles\" title=\".NET Articles\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/dotnet\/rss.aspx\" htmlUrl=\"https:\/\/www.simple-talk.com\/dotnet\/\"\/&gt;\r\n                &lt;outline text=\"SysAdmin Articles\" title=\"SysAdmin Articles\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/sysadmin\/rss.aspx\" htmlUrl=\"https:\/\/www.simple-talk.com\/sysadmin\/\"\/&gt;\r\n                &lt;outline text=\"Opinion and Geeks\" title=\"Opinion and Geeks\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/opinion\/rss.aspx\" htmlUrl=\"https:\/\/www.simple-talk.com\/opinion\/\"\/&gt;\r\n                &lt;outline text=\"Books and Book Reviews\" title=\"Books and Book Reviews\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/books\/rss.aspx\" htmlUrl=\"https:\/\/www.simple-talk.com\/books\/\"\/&gt;\r\n                &lt;outline text=\"Cloud\" title=\".NET Articles\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/cloud\/rss.aspx\" htmlUrl=\"https:\/\/www.simple-talk.com\/cloud\/\"\/&gt;\r\n                &lt;outline text=\"Blogs\" title=\".NET Articles\" type=\"rss\" xmlUrl=\"https:\/\/www.simple-talk.com\/blogs\/feed\/\" htmlUrl=\"https:\/\/www.simple-talk.com\/blogs\/\"\/&gt;\r\n            &lt;\/outline&gt;\r\n            &lt;outline text=\"SQL Server Central\"&gt;\r\n                &lt;outline title=\"Main Articles\" text=\"www.sqlservercentral.com\/Xml\/Rss\/articles\" type=\"rss\" xmlUrl=\"http:\/\/www.sqlservercentral.com\/Xml\/Rss\/articles\"\/&gt;\r\n                &lt;outline title=\"SQL Server Central Blogs\" text=\"www.sqlservercentral.com\/blogs\/feed\/\" type=\"rss\" xmlUrl=\"http:\/\/www.sqlservercentral.com\/blogs\/feed\/\"\/&gt;\r\n                &lt;outline title=\"Ask Sqlservercentral Questions\" text=\"ask.sqlservercentral.com\/feed\/questions.rss\" type=\"rss\" xmlUrl=\"http:\/\/ask.sqlservercentral.com\/feed\/questions.rss\"\/&gt;\r\n            &lt;\/outline&gt;\r\n        &lt;\/body&gt;\r\n     &lt;\/opml&gt;\r\n    <\/pre>\n<h2>References<\/h2>\n<ul>\n<li><a href=\"http:\/\/cyber.law.harvard.edu\/rss\/rss.html\">RSS Spec<\/a><\/li>\n<li><a href=\"http:\/\/www.rssboard.org\/media-rss\">Media RSS Spec<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>To celebrate the announcement of the planned \u00a0demise of Google Reader, I&#8217;ve done a PowerShell script that gives you the items from the OPML collection of feeds that you import or export between your feed readers. Basically, you create your own primitive feed reader. I&#8217;m afraid it isn&#8217;t as good as Google Reader. So what&#8230;&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[6813],"class_list":["post-5211","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/5211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=5211"}],"version-history":[{"count":13,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/5211\/revisions"}],"predecessor-version":[{"id":75709,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/5211\/revisions\/75709"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=5211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=5211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=5211"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=5211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}