Video

Background

I wanted to start building a tool to model the impact of wind speed on the output of the UK’s wind farms. There are the three principle components for this; the location of the wind farm, accurate weather data for that location and a generation function that calculates the output of the wind farm based upon physical and environmental conditions.

The first two elements can be professional sourced but I wanted build this tool using publicly available information as far as possible. This post is describes the sourcing of physical data about the wind farm.

Sourcing all wind farm data

RenewableUK is a great source of renewable power information and they have detailed information about every wind farm on their website. However, rather frustratingly they keep that behind a members only link but thankfully, they do publish an up-to-date table that includes all the data – longitude, latitude, type and capacity – we need on their site.

As of June 2012, the data is presented in a simple HTML table and thus using similar techniques to those used with BMRA prices we can easily write some code to scrape this information out.

Coding and demonstrating the extraction

The code presented here is a stem from which further code could be appended to; we might want to save the information into a separate file (as we did with the oil prices), upload them to a data source or retain internally for further analysis and these functions could easily be added.

I’m using Python 2.7.x and BeautifulSoup to handle the HTML parsing and full installation/configuration instructions are given from their respective sources. Additionally, the source data table has a fixed width of 13 elements but the number of rows is variable which will change when new units are commissioned or decommissioned.

Complete Code

The complete code is shown below. You’re welcome to use it as you wish but please attribute back to this site.

[sourcecode language=”python”]
#!/usr/bin/env python
“””Locates and scrapes Wind Farm data”””

__author__ = “Patrick Avis”
__email__ = “python@patrickavis.com”

from BeautifulSoup import BeautifulSoup
import urllib2
import datetime
import time

def pairs(l, n):
return zip(*[l[i::n] for i in range(n)])

def main():
url = urllib2.urlopen(“http://www.bwea.com/ukwed/operational.asp”)
soup = BeautifulSoup(url)
table_extract = soup.findAll(‘table’)[2]
rows = table_extract.findAll(‘tr’)
outputwind = []
i=0

for tr in rows:
while i < len(rows):
cols = tr.findAll(‘td’)
for td in cols:
text = ”.join(td.find(text=True))
text = str(text.strip())
outputwind.append(str(text))
i=i+1

del outputwind[0:3]

all_wind_farms = pairs(outputwind,13)
for windfarm in all_wind_farms:
print windfarm

if __name__ == “__main__”:
main()
[/sourcecode]

3 COMMENTS

  1. Thank you, this is helping me learn Beautiful Soup! Should line 27 read “while i <len(rows):" like in the video? Thanks again for this write up!

    • You’re quite right – thanks for letting me know! That’s what happens when you upload a post late at night!

      I’ve corrected the code and hopefully that should be fine now.

  2. Thanks for the tutorial, just a quick question. How can the code be modified to check if a cell is empty? I’m getting a TypeError because some of the cells are blank.

    Cheers

I'd love to hear what your thoughts are...please feel free to leave a reply