Screen Scrape Your Utility Bills
At the heart of my last project MinuteMate was the ability to programatically retrieve billing information from my Vodafone online account.
The motivation for doing this was an unexpected £90 bill I received from Vodafone and the discovery that the company had no mechanism to prevent or warn about such an occurrence.
The system allowed me - and I hoped others - to receive alerts when my usage surpassed my monthly “free” allowance.
In my ideal world Vodafone - in fact all utility companies - would provide APIs to programmatically query your usage. Instead, largely we have to rely on the (awful) web interface and (not bad) mobile apps to check usage.
I decided to open source the web scraping component of the system so anyone can use and expand upon it. Please feel free to wrap it in an API! I’ve released it as a command-line tool called *vodafone-scraper* which includes basic alert functionality with thresholds.
You can see the code and examples at https://github.com/paulfurley/vodafone-scraper or just install and run: $ pip install vodafone-scraper
Not a coder? Interested in scraping a utility website? Tell me about it!
Horrors of Vodafone Online
I’m no stranger to web scraping as I work in the Data Services team at ScraperWiki - we see a lot of weird and wonderful sites.
Furthermore each page of the site takes between 5 - 100 seconds to load (yes, I’ve actually had to allow a *two minute* timeout to get it working).
Use Selenium to perform full browser automation and *actually* access the site the Firefox as if I were a real user.
If I were doing these requests on behalf of thousands of users simultaneously, however, the tradeoff would be different - the overhead of running many full browsers could lead to significant computing costs. An obvious downside, however is that a subtle change to the website could send the scraper back to square 1.
The last time I tried to use PhantomJS, it unfortunately didn’t “just work” and I didn’t have the time try harder. However, I plan to have another go as I see this as an important future scraping and web-automation tool.
I offer web scraping as a professional service. Find out more.