Prices permeate the web

The BMRS website provides near real time and historic data about the Balancing Mechanism which is used by National Grid as a means of balancing power flows around Great Britain.

There’s an awful lot of useful information on there but Elexon – operators of the website – make it a little tricky to access the data. Whilst there is an API available, its not the easiest for novice users to use and doesn’t interoperate with common desktop software.

It’s perfectly possible to navigate to one of the pages on the website and manually copy the data into our desktop software but why do the hard work when we can employ a little code to do it for us!

CSV to ODBC – A change in acronym but just as easy

In a previous post I’d highlighted how it was possible to use Python, BeautifulSoup and a little bit of code to extract Oil prices from the web and to save them to our computer. In that example we were saving them as TXT/CSV  which is great if we want to store the data in individual files (or append to one growing master file) but storing them in a database offers far more flexibility. For example, we may wish to do some statistical analysis, to use the extract in another process or simply not want the hassle of dealing with many CSV files.

As with any modern language, Python has the ability through deployment of library’s to talk to almost any database system you’re likely to encounter. I’m writing this code for deployment to a Windows server which is running SQL Server 2005 and have found the pyodbc library to be perfect for this task. Obviously, if you’re using a different OS or different database backend you’ll need to adjust your library and connection string accordingly.

In the previous example we passed the website data through a regular expression in order to isolate the data we wanted, before saving this as a text/CSV. Saving this information to a database simply involves swapping out the text writing elements with database writing ones. There are three main steps we need to tackle to achieve this we need to connect to the data source,  formulate our SQL INSERT and then commit this to the database

First step – Setting up our dates and connecting to the website

For simplicity I am going to assume that we’re only going to be interested in extracting the previous day’s System Sell and System Buy price from BMRA. Obviously, we could change the URL to access other information or write some extra code to loop through a range of days but I’ll leave that up to you. Feel free to modify the existing code as you wish but I would ask that you provide a link back to this page. Also, in this example I’m not expressly setting out to write the most efficient code, more code which is the most readable.

Let’s kick things off by creating variables for our dates – we’ll need two – one for a date in YYYY-MM-DD and one in YYYYMMDD. We’ll also need to tell our program the URL for the data and, while we’re at it, we might as well embed a date variable in there so that the correct date will be found each day.

[sourcecode language=”python”]
longdate = (datetime.datetime.today() – datetime.timedelta(1)).strftime("%Y-%m-%d")
shortdate = (datetime.datetime.today() – datetime.timedelta(1)).strftime("%Y%m%d")
url = urllib2.urlopen("http://www.bmreports.com/servlet/com.logica.neta.bwp_SspSbpServlet?param2=" + longdate + "#")
soup = BeautifulSoup(url)
[/sourcecode]

Next step – Using a regular expression to extract the prices

The data that comes back from the website contains not just the data we want but also an awful of extra stuff we don’t; regular expressions provide a fantastic way to match patterns in text and then to process these matches. Thus, we can tell the regular expression engine what our data looks like and it will find every instance that matches that pattern. From testing, it was found that the data was contained in a string

 SSB,20120425,17,49.20000,101.99753,

Our regular expression matches this form of letters, numbers and comma’s and stores them in a variable called data_list.

[sourcecode language=”python”]
data_list = re.findall(r'(SSB,[0-9]*,[0-9]*,[0-9]*.[0-9]*,[0-9]*.[0-9]*)’, str(soup))
[/sourcecode]

Last step – Uploading to our Database

For the next step we need loop through each of the returned records in our data_list and upload these to our database. Depending upon the database you’re connecting to you’ll need to modify the exact connection string. I’ve always found ConnectionStrings.com to be useful for this. If you’re using SQL Server, as I am, then simply update the code by replacing ‘my_schema_name’ and ‘my_table_name’ with the appropriate names. As an aside, my original table is configured to by ReportDate (datetime) and Period, SSP & SSB (float).

One last tweak before we put all the code together is to streamline the records. This is simply a matter of removing the extra guff and swapping the comma’s with ” ‘,’ ” so that the SQL insert will work.

[sourcecode language=”python”]
for record in data_list:
record = record.replace(shortdate,longdate)
record = record.replace("SSB,","")
record = record.replace(",","’,’")
sSQL = "INSERT INTO my_schema_name.dbo.my_table_name (ReportDate,Period,SSP,SBP) VALUES (‘" + record +"’)"
cursor.execute(sSQL)
cnxn.commit()
[/sourcecode]

Source Code – Putting it all together

With all the hard lifting out the way, we’re now able to mix together each of our separate code elements together and generate a complete program.

[sourcecode language=”python”]
from BeautifulSoup import BeautifulSoup
import urllib2, re
import datetime
import pyodbc

def SourceSystemPrices():
url = urllib2.urlopen("http://www.bmreports.com/servlet/com.logica.neta.bwp_SspSbpServlet?param2=" + longdate + "#")
soup = BeautifulSoup(url)
data_list = re.findall(r'(SSB,[0-9]*,[0-9]*,[0-9]*.[0-9]*,[0-9]*.[0-9]*)’, str(soup))
cnxn = pyodbc.connect(‘DSN=DSN_NAME;UID=<userID>;PWD=<userPWD>’)
cursor = cnxn.cursor()

for record in data_list:
record = record.replace(shortdate,longdate)
record = record.replace("SSB,","")
record = record.replace(",","’,’")
sSQL = "INSERT INTO my_schema_name.dbo.my_table_name (ReportDate,Period,SSP,SBP) VALUES (‘" + record +"’)"
cursor.execute(sSQL)
cnxn.commit()

if __name__ == "__main__":
longdate = (datetime.datetime.today() – datetime.timedelta(1)).strftime("%Y-%m-%d")
shortdate = (datetime.datetime.today() – datetime.timedelta(1)).strftime("%Y%m%d")
SourceSystemPrices()
[/sourcecode]

I'd love to hear what your thoughts are...please feel free to leave a reply