DeadlineExceededError, really? That wasn’t me
One of the issues with using a managed hosting environment like Google AppEngine is that they measure everything and penalize you when you exceed the limits. For example, if you look at this stack trace for the DeadlineExceededError, it was clearly not my fault that their runtime spent too much time loading the regular expression system module. Anyway, moral of the story is for you to always check your logs.Just to make sure this post doesn’t end up on the lightweight category, I’ll link to a good post for emailing admins tracebacks when exceptions occur. In case you don’t notice, the post is now pointing to a new feature in AppEngine that stores, keeps counters and generate cool reports called ereporter. Unfortunately, I have not had much luck configuring that one. However, I did get a hook into Django 1.1 finally to capture exceptions, but not as
EzCloud: An exercise in laziness
At Lookery, I must wear many hats, one of them is systems administrators. In fact, I’d say that I’m becoming quite dangerous since I now deploy all of my configuration and code through Debian (apt-get) packages including my own repository, but more on that at a later post. The issue I had yesterday was managing Amazon’s Elastic Load Balancers. We’re now running several load balancers (for big customers) sending requests across multiple availability zones. The process is a bit clunky. Of course, I acknowledge I might be doing it all wrong, but the best thing about being a developer AND system administrator is that you can do any command line task to your heart’s content. Normally, I use a mixture of RightScale, AWS Console and AWS command line tools. The process is as follows:- Create Load Balancer (configure health check, etc)
- Launch Instances (but wait until they’re ready)
- Register with LB
Simple,
Hadoop Timelines
At one of the last sessions during the Hadoop Summit 2009, Arun Murthy (Yahoo) was going over the changes that were necessary in Hadoop to sort a terabyte of data in less than 60 seconds. Besides all of the good wisdom in the work, what I liked the most was his use of charts to understand where could Hadoop use some optimization. He described one of the charts (see image on the right) as the “ideal hadoop job”. I don’t remember everything, but the fact is that you see smooth lines/waves of both mappers and reducers, quick startup time, little wasted jobs and so on. This left me thinking: how would my jobs look like? Hence, the reason for Hadoop Timelines.
Hadoop Timelines is a Web service built using App Engine and a Python script using Dumbo that will take
Project Voldemort and Hadoop at Lookery
It’s definitely an interesting time as we create next generation tools to deal with lots of data. I just came back up from SFO after attending the Hadoop Summit and nosql.net where I got to hear developers from most of the distributed stores available today explain parts of their current and future work. Many of the talks left me thinking ways in which we can improve Project Voldemort. For now, I’m just doing my part of documenting and helping others try what’s already there:
At Lookery, we have been working very hard to transform most of our data processing tasks into batch-oriented workflows in order to deal with growth. For example, we were already using Hadoop to compute our index and data files for our largest database, but the process of serving that information took place over too many network hops (load balancers, reverse proxies and Amazon