John Ketley is a weatherman….
It’s often been said that the easiest and most accurate way to get a weather report is simply to look out the window and while this will give you an accurate reading of the weather at that exact moment, it does little to warn you of the needs for shorts or for a brolly in the coming days. The internet makes sourcing multiple forecasts easy and this script demonstrates how this can be done and this script is concerned with sourcing forecast commentaries – the sort of blurb one gets at the end of the news report – rather than quantitative forecasting data.
Originally, I’d written a version of this script to email a weather forecast to myself before I headed off for a morning run; one of the benefits of Python is the ease to which it can be re-purposed. The code I present here queries two sources – the Metgroup and the BBC – originally, there was a third in the form of the Met Office but they are in the process of migrating to a new website and thus the code to source forecasts from them would soon be outdated. Clearly, there are also numerous other weather sources that could be queried and similar querying approaches taken to them. Obviously, should you write such code, I’d be delighted to see a copy and perhaps we could build a ‘WeatherPy’ library.
…and so is Michael Fish
On to the code however. As I mentioned, originally, this code merely scrapped the forecast and emailed it to via a scheduled job. I realised however that it might also be useful to store the comments in a database with one eye on potentially analysing their accuracy at some later date. Over time I relegated the emailing code – although I would be happy to demonstrate how it could be re-imported – leaving the database and sourcing elements as presented below (the database elements are commented out for ease of demonstration). Additionally, I’ve localised the forecasts for a general UK view but feel free to add your own sources as you see fit.
The code starts by importing a pretty standard bunch to libraries – os, urllib2, re, sys, datetime, threading, time – and two old friends in the shape of BeautifulSoup and PyODBC. If you’ve no intention of interfacing with database then the later need not be included and we could also trim down the time and threading modules.
From there, it starts the BBC import on one thread and the MetGroup on another. Both threads then connect to their respective websites, pass the data through to BeautifulSoup and then runs a regular expression against the extract. You’ll also see that a there’s an array of strings which are there to clean up the extract; there are elements we don’t need in the returned and this is one easy way to replace them. If there are additional elements you’d like to remove then add them here. Finally, the code writes the extract out to a text file but you can uncomment and update for your database details should you wish to.
Code for Weather downloading
from BeautifulSoup import BeautifulSoup
import os, urllib2, re, sys, datetime, threading, time
rawMetGroup = urllib2.urlopen(“http://www.weathercast.co.uk/united-kingdom/united-kingdom-weather.html”).read()
soup = BeautifulSoup(rawMetGroup)
MetGroup = soup.findAll(‘p’)
strMetGroup = str(MetGroup)
reStrings = [‘T\w+\s\(\w+\)\:\s+’,’Tonight\s+\(\w+\s+\w+\)\:\s+’,’UK Outlook\s+\(\w+\s+\w+\s+\w+\)\:\s+’]
replaceStrings = [‘<p>’,’<br /><br />’,’</p>’]
for items in reStrings:
strMetGroup = re.sub(re.compile(items),””,str(strMetGroup))
for items in replaceStrings:
strMetGroup = strMetGroup.replace(items,”)
rawBBCWeather = urllib2.urlopen(“http://news.bbc.co.uk/weather/”).read()
soup = BeautifulSoup(rawBBCWeather)
strBBC = str(BBC)
replaceStrings = [‘<p class=”today”>Today: ‘,’</p>’,’<p lang=”en-GB”>’]
for items in replaceStrings:
strBBC = re.sub(re.compile(items),””,str(strBBC))
strBBC = strBBC.strip()
if __name__ == “__main__”:
format = “%Y-%m-%d”