Oil prices with ease
The web is alive with fantastic sources of information on all manners of topics with many websites providing a RSS feed or API in order to link into their data. However many sites do not and we have to go to more elaborate ways to extract the data from them. Well, it would be great if we could get the computer to automatically query the web for the latest prices and return them to us. In the example here, we are going to pull in the latest Brent Crude price from Bloomberg, The Financial Times and BBC, work out the average price and save these off into text files before displaying.
So….what’s the idea?
One of the many great things about Python is how easy it is to throw together scripts to do relatively complex tasks which, in other languages, would take much more code. The standard library in Python is rich and there’s an extensive set of additional libraries that we can plug into. In this task urllib2 will be used to connect to the websites and read the underlying HTML code, Beautiful Soup to help process that information and then possibly throw in regular expression or two just to finish things off.
Beautiful Soup makes it all so easy
Python has inbuilt HTML parser but we can do an awful lot more using the rather fantastic Beautiful Soup which makes pulling and processing data from websites really simple; pass in the website and it will pull out all the links, find selected tables based on any criteria or simply match certain things you’d like to find with ease. We’ll use Beautiful Soup on two of the websites we visit to pick out the prices from certain rows.
The first stage is to import the four libraries we’re going to use. Urllib2, re and time are part of the standard library and will be part of your Python build automatically. BeautifulSoup will need to be installed (see link below) and I’m running 3.2.0 which is fully compatible with Python 2.7. After this, a timer object is initiated and the current system time passed to it which we’ll use to calculate the total elapsed time come the end of the import. Strictly speaking, this is a superflous step and one which can be removed in a production environment.
The three websites get scraped next. In the case of Bloomberg and the Financial Times, urllib2 is passed the UR, the raw contents of the page are read and then passed to BeautifulSoup for data extraction. The pages elements are static but populated by a backend datasource and thus the location of the table elements can be hard coded into the code – BloomSoup.findAll(‘tr’) and FTsoup.findAll(‘tr’) . In the case of the BBC data, the price is not held within a table element and thus BeautifulSoup does not need to be used. Finally, for each of the three source, a simple regular expression is used to remove the actual price from the parsed data element.
Saving and displaying
The objects BloomPrice, FTPrice and BBCPrice now hold the three extract oil prices and the float() function has converted them from strings into a numerical value. The penultimate step is to use the print statement to display the values on the console along with a short mention as to the source. A average could be calculated by importing Python’s math library, but, for our purposes would be overkill; a simple arithmetic mean is calculated by summing all the prices and dividing by three.
Finally, the prices are saved to a location given in the OutputPath variable and then presented in console for 10 seconds
Python Source Code
The source code presented here has been updated – thanks to SoonerBourne34 – to reflect changes to the BBC and Bloomberg website. Therefore, the code displayed in the YouTube video does not match perfectly match that shown in the video.
from BeautifulSoup import BeautifulSoup
import urllib2,re, time
start = time.time()
# Find Bloomberg Brent Price
rawBloomData = urllib2.urlopen(“http://www.bloomberg.com/energy/”).read()
BloomSoup = BeautifulSoup(rawBloomData)
brent = BloomSoup.findAll(‘tr’)
BloomPrice = float(re.search(re.compile (r”\d+\.\d*”),str(brent.contents)).group())
# Find FT Brent Price
rawFTData = urllib2.urlopen(“http://markets.ft.com/tearsheets/performance.asp?s=1054972″).read()
FTsoup = BeautifulSoup(rawFTData)
FT = FTsoup.findAll(‘tr’)
FTPrice = float(re.search(re.compile (r”\d+\.\d*”),str(FT)).group())
# Find BBC Brent Price
rawBBCData = urllib2.urlopen(“http://www.bbc.co.uk/news/business/market_data/commodities/default.stm”).read()
BBCSoup = BeautifulSoup(rawBBCData)
oyell = BBCSoup.findAll(‘tr’)
BBCPrice = float(re.search(re.compile (r”\d+\.\d*”),str(oyell)).group())
# Compile for display
print ” ”
print ” Brent Crude ($/Brl)”
print ” ————————”
print ” Bloomberg : %.2f” %(BloomPrice)
print ” Financial Times : %.2f” %(FTPrice)
print ” BBC : %.2f” %(BBCPrice)
print ” ————————”
print ” Average : %.2f” %((BloomPrice+FTPrice+BBCPrice)/3)
print ” ————————”
print ” ”
# Write to files
OutputPath = “C:\\Test\\”
BeautifulSoup : http://www.crummy.com/software/BeautifulSoup/