<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Elias Torres</title>
	<atom:link href="http://torrez.us/feed/" rel="self" type="application/rss+xml" />
	<link>http://torrez.us</link>
	<description>Hi.</description>
	<lastBuildDate>Thu, 19 Aug 2010 04:50:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Boston CTOs, Lunch is ready!</title>
		<link>http://torrez.us/archives/2010/08/19/705/</link>
		<comments>http://torrez.us/archives/2010/08/19/705/#comments</comments>
		<pubDate>Thu, 19 Aug 2010 04:50:52 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=705</guid>
		<description><![CDATA[A couple of weeks ago, a few colleagues suggested we got together for an intimate lunch where we can discuss technology and leadership in startups. I mentioned it to David and he thought it was a great idea, so he said JFDI. A tweet and a landing page later, we had our first lunch. We [...]]]></description>
			<content:encoded><![CDATA[A couple of weeks ago, a few colleagues suggested we got together for an intimate lunch where we can discuss technology and leadership in startups. I mentioned it to <a href="http://www.davidcancel.com/">David</a> and he thought it was a great idea, so he said JFDI. A <a href="http://twitter.com/dcancel/status/19938430967">tweet</a> and <a href="http://www.bostonctolunch.com/">a landing page</a> later, we had our first lunch.

<p>We met on Aug. 12th at the Black Sheep in Cambridge and had 10 very interesting CTOs from Boston. It was just our first, but I learned a great deal already. It felt like the conversation hovered around hiring, devops, QA but there was a lot to be gleaned from the experiences of many around the table. I know there are other <a href="http://bostoncto.com/">similar groups</a>, but nothing beats face-to-face and great food with good folks (<a href="http://twitter.com/justinsheehy">@justinsheehy</a>, <a href="http://twitter.com/graysky">@graysky</a>, <a href="http://twitter.com/shearic">@shearic</a>, <a href="http://twitter.com/agoeldi">@agoeldi</a>, <a href="http://twitter.com/YoavShapira">@YoavShapira</a>, <a href="http://twitter.com/rseanlindsay">@rseanlinday</a> and a few others more).</p>

<p>Our next lunch will be on Sept. 9th at the Black Sheep as well and we&#8217;ll be rotating a small number of the folks to let some of the other interested join us. Please forgive me for the backlog, but we want folks to get to know each other and rotating the entire table every time might just not work.</p>

<p>Please don&#8217;t worry about the backlog, if you think you can benefit from meeting other CTOs and having them for a whole lunch to ask whatever you wanted in a relaxed yet confidential setting, please sign up for our <a href="http://www.bostonctolunch.com/">Boston CTO</a> Lunch.</p>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2010/08/19/705/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>DeadlineExceededError, really? That wasn&#8217;t me</title>
		<link>http://torrez.us/archives/2009/09/27/693/</link>
		<comments>http://torrez.us/archives/2009/09/27/693/#comments</comments>
		<pubDate>Sun, 27 Sep 2009 22:30:23 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[appengine]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[ereporter]]></category>
		<category><![CDATA[google]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=693</guid>
		<description><![CDATA[One of the issues with using a managed hosting environment like Google AppEngine is that they measure everything and penalize you when you exceed the limits. For example, if you look at this stack trace for the DeadlineExceededError, it was clearly not my fault that their runtime spent too much time loading the regular expression [...]]]></description>
			<content:encoded><![CDATA[One of the issues with using a managed hosting environment like <a href="http://appengine.google.com/">Google AppEngine</a> is that they measure everything and penalize you when you exceed the limits. For example, if you look at this stack trace for the DeadlineExceededError, it was clearly not my fault that their runtime spent too much time loading the regular expression system module. Anyway, moral of the story is for you to always check your logs.

<p>Just to make sure this post doesn&#8217;t end up on the lightweight category, I&#8217;ll link to a good post for <a href="http://andialbrecht.wordpress.com/2009/04/30/app-engine-tracebacks-via-email/">emailing admins tracebacks</a> when exceptions occur. In case you don&#8217;t notice, the post is now pointing to a new feature in AppEngine that stores, keeps counters and generate cool reports called <a href="http://code.google.com/p/googleappengine/wiki/SdkReleaseNotes">ereporter</a>. Unfortunately, I have not had much luck configuring that one. However, I did get a hook into Django 1.1 finally to capture exceptions, but not as advertised on this <a href="http://code.google.com/appengine/articles/django.html">AppEngine Django article</a>, instead I ended up using a <a href="http://docs.djangoproject.com/en/dev/topics/http/middleware/">Django Middleware&#8217;s process_exception</a> interface.</p>

<code><pre>09-26 08:38PM 11.392 / 500 28360ms 1400cpu_ms 0kb Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6; en-us) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9,gzip(gfe)
24.x.x.x - - [26/Sep/2009:20:38:39 -0700] "GET / HTTP/1.1" 500 0 - "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6; en-us) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9,gzip(gfe)" "www.abtests.com"

E 09-26 08:38PM 39.749
<class 'google.appengine.runtime.DeadlineExceededError'>: 
Traceback (most recent call last):
  File "/base/data/home/apps/.../3.336622978473149430/main.py", line 29, in <module>
    appengine_django.InstallAppEngineHelperForDjango()
  File "/base/data/home/apps/.../3.336622978473149430/appengine_django/__init__.py", line 517, in InstallAppEngineHelperForDjango
    InstallAuthentication()
  File "/base/data/home/apps/.../3.336622978473149430/appengine_django/__init__.py", line 407, in InstallAuthentication
    from django.contrib.auth import tests as django_tests
  File "/base/python_lib/versions/third_party/django-1.1/django/contrib/auth/tests/__init__.py", line 2, in </module><module>
    from django.contrib.auth.tests.views \
  File "/base/python_lib/versions/third_party/django-1.1/django/contrib/auth/tests/views.py", line 9, in </module><module>
    from django.test import TestCase
  File "/base/python_lib/versions/third_party/django-1.1/django/test/__init__.py", line 6, in </module><module>
    from django.test.testcases import TestCase, TransactionTestCase
  File "/base/python_lib/versions/third_party/django-1.1/django/test/testcases.py", line 1, in </module><module>
    import re</module></class></pre></code>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/09/27/693/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>EzCloud: An exercise in laziness</title>
		<link>http://torrez.us/archives/2009/07/17/675/</link>
		<comments>http://torrez.us/archives/2009/07/17/675/#comments</comments>
		<pubDate>Fri, 17 Jul 2009 16:41:19 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[EC2]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=675</guid>
		<description><![CDATA[At Lookery, I must wear many hats, one of them is systems administrators. In fact, I&#8217;d say that I&#8217;m becoming quite dangerous since I now deploy all of my configuration and code through Debian (apt-get) packages including my own repository, but more on that at a later post. The issue I had yesterday was managing [...]]]></description>
			<content:encoded><![CDATA[At <a href="http://www.lookery.com/">Lookery</a>, I must wear many hats, one of them is systems administrators. In fact, I&#8217;d say that I&#8217;m becoming quite dangerous since I now deploy all of my configuration and code through Debian (apt-get) packages including my own repository, but more on that at a later post. The issue I had yesterday was managing Amazon&#8217;s <a href="http://aws.amazon.com/elasticloadbalancing/">Elastic Load Balancers</a>. We&#8217;re now running several load balancers (for big customers) sending requests across multiple availability zones. The process is a bit clunky. Of course, I acknowledge I might be doing it all wrong, but the best thing about being a developer AND system administrator is that you can do any command line task to your heart&#8217;s content. Normally, I use a mixture of RightScale, AWS Console and AWS command line tools. The process is as follows:

<ul>
	<li>Create Load Balancer (configure health check, etc)</li>
	<li>Launch Instances (but wait until they&#8217;re ready)</li>
	<li>Register with LB</li>
</ul>

<p>Simple, right? The issues arise, when things start to go wrong. For example, the output all of those commands use those pesky instance ids. If I want to ssh into one of those machines, I have to look up the hostname. Annoying. If I want to check URLs from those machines (too many steps to write here). Annoying. If I want to drop into distributed shell (dsh), I have to create a group file. Annoying. If I want to get some metrics or check the health of the instances, I have to setup more command line tools. Annoying. I think by now you get the point. Enter <a href="http://github.com/eliast/ezcloud/tree/master">EzCloud</a>.</p>

<code>
<pre>
[elias@manta] ~/envs/boto $ ezcloud
Welcome to ezcloud!

# List available/configure load balancers
>>> cloud.lbs()
LoadBalancers:lookery-lb1,lookery-lb2,...

# Show their status. If all in service, then just show OK.
>>> cloud.lbs().status()
Load Balancer:  lookery-lb1
Everything is OK.
Load Balancer:  lookery-lb2
Everything is OK.

# If there's a problem, show full details.
>>> cloud.lbs().status()
Load Balancer:  lookery-lb1
Everything is OK.
Load Balancer:  lookery-lb2
1 instances out of service.
ec2-XXX.compute-1.amazonaws.com wrong availability zone

# Now to the good stuff. If I want to see instances, I want to see HOSTNAMEs!
>>> cloud.lbs()[0].instances
ec2-XXX.compute-1.amazonaws.com	us-east-1b	2009-07-16T22:54:28	running	ami-AAA
ec2-YYY.compute-1.amazonaws.com	us-east-1b	2009-07-16T22:54:28	running	ami-AAA
...

# Metrics, you ask me? Here you go.
>>> cloud.lbs()[1].metrics()
Metrics(lookery-lb1):[u'requestcount', u'healthyhostcount', u'latency', u'unhealthyhostcount']

# Functions under metrics, take start, end, period, zones.
>>> cloud.lbs()[1].metrics().requestcount()
Time Sum
2009-07-17T03:12:00Z 1149389.0
2009-07-17T02:12:00Z 1222028.0
2009-07-17T01:12:00Z 1266844.0
...

# VS:
# bin/mon-get-stats RequestCount --start-time 2009-07-17T00:00:00 \
#      --end-time 2009-07-18T00:00:00 --period 3600 --namespace "AWS/ELB" \
#      --statistics "Sum" --dimensions "LoadBalancerName=lookery-lb1, AvailabilityZone=us-east-1a, AvailabilityZone=us-east-1b"
#
# You decide.

# Now, what if I want to do a quick HTTP GET check on all of the machines (besides the health check)
>>> cloud.lbs()[1].instances.get('/some-test-uri-path')
(200, u'ec2-XXX.compute-1.amazonaws.com')
(200, u'ec2-YYY.compute-1.amazonaws.com')
(200, u'ec2-ZZZ.compute-1.amazonaws.com')
...

# OK, I want the whole response
>>> cloud.lbs()[1].instances.get('/some-test-uri-path').debug()
----------------------------------------
(200) ec2-XXX.compute-1.amazonaws.com
content-length : 118
x-powered-by : PHP/5.2.6-1+lenny3
vary : Accept-Encoding
server : Apache/2.2.9
connection : close
date : Fri, 17 Jul 2009 16:14:52 GMT
content-type : text/plain
{"error":"bad_request","description":"Missing required parameter. Please pass either format or valid redirect url."}
...

# Filter by response code, anything that doesn't return 200, please show me.
>>> cloud.lbs()[0].instances.get('/some-test-uri-path').expect(200)
ec2-NNN.compute-1.amazonaws.com	us-east-1b	2009-07-16T22:54:28	running	ami-AAA

# Many operations return EzCloud's InstanceList, so you can do things like write a dsh group file to do some debugging
>>> cloud.lbs()[0].instances.get('/some-test-uri-path').expect(200).dsh('foo')
user@ec2-NNN.compute-1.amazonaws.com
>>> ^Z 
[1]+  Stopped                 ezcloud
(boto)[elias@manta] ~/envs/boto $ dsh -g foo -M -c -- 'hostname'
user@ec2-NNN.compute-1.amazonaws.com: ec2-NNN

# You can even do some basic filtering/orderby (placement equals ZONE)
(boto)[elias@manta] ~/envs/boto $ fg
>>> cloud.lbs()[0].instances.get('/some-test-uri-path').expect(200).orderby('placement')
ec2-X.compute-1.amazonaws.com	us-east-1a	2009-07-16T22:33:32	running	ami-A
ec2-X.compute-1.amazonaws.com	us-east-1a	2009-07-16T22:33:32	running	ami-A
ec2-X.compute-1.amazonaws.com	us-east-1a	2009-07-16T22:33:32	running	ami-A
ec2-X.compute-1.amazonaws.com	us-east-1b	2009-07-16T22:54:28	running	ami-A
ec2-X.compute-1.amazonaws.com	us-east-1b	2009-07-16T22:54:28	running	ami-A
ec2-X.compute-1.amazonaws.com	us-east-1c	2009-07-16T22:24:47	running	ami-A
ec2-X.compute-1.amazonaws.com	us-east-1c	2009-07-16T22:24:47	running	ami-A
ec2-X.compute-1.amazonaws.com	us-east-1c	2009-07-16T22:24:47	running	ami-A
</pre>
</code>


<p>I have a couple of functions left (like actually launching instances, waiting for them and register with LB, hehe), but this is already useful to me as it is. It even saves your history and supports some TAB completion. This is my first DSL and it&#8217;s something I just hacked in a couple of hours. It&#8217;s not well thought out, but it was fun learning how to use python&#8217;s <a href="http://docs.python.org/library/code.html">InteractiveConsole</a> and as always it&#8217;s a pleasure to use <a href="http://code.google.com/p/boto">boto</a>.</p>

<p>NOTE: you must get <a href="http://code.google.com/p/boto/source/checkout">boto trunk</a> for this to work. Enjoy the <a href="http://github.com/eliast/ezcloud/tree/master">ezcloud source</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/07/17/675/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Hadoop Timelines</title>
		<link>http://torrez.us/archives/2009/06/29/660/</link>
		<comments>http://torrez.us/archives/2009/06/29/660/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 05:41:12 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=660</guid>
		<description><![CDATA[At one of the last sessions during the Hadoop Summit 2009, Arun Murthy (Yahoo) was going over the changes that were necessary in Hadoop to sort a terabyte of data in less than 60 seconds. Besides all of the good wisdom in the work, what I liked the most was his use of charts to [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://tinyurl.com/np9gol" class="top pull-1 right" alt="ideal hadoop job"/> At one of the last sessions during the Hadoop Summit 2009, <a href="http://twitter.com/acmurthy">Arun Murthy</a> (Yahoo) was going over the changes that were necessary in Hadoop to <a href="http://developer.yahoo.net/blogs/hadoop/Yahoo2009.pdf">sort a terabyte of data</a> in less than 60 seconds. Besides all of the good wisdom in the work, what I liked the most was his use of charts to understand where could Hadoop use some optimization. He described one of the charts (see image on the right) as the &#8220;ideal hadoop job&#8221;. I don&#8217;t remember everything, but the fact is that you see smooth lines/waves of both mappers and reducers, quick startup time, little wasted jobs and so on. This left me thinking: how would my jobs look like? Hence, the reason for <a href="http://hadoop-timelines.appspot.com/">Hadoop Timelines</a>.</p>

<p>Hadoop Timelines is a Web service built using App Engine and a Python script using Dumbo that will take care of everything to replicate Arun&#8217;s Task Timelines for your own Hadoop jobs. My goal with this project is to raise the awareness of Hadoop developers in understanding job execution and performance, maybe even crazier, that we collaborate and analyze together individual job performance through comments on specific graphs.</p>

<p>If you&#8217;re already comfortable with Hadoop, using Timelines should be really easy. The first thing you&#8217;ll need to do is follow Klaas&#8217; tip for collecting job logs into HDFS using a <a href="http://dumbotics.com/2009/03/04/simple-job-logs-analysis/">simple cron job</a>. If you don&#8217;t already have Dumbo and want to keep your dev environment clean, you can follow another excellent post from Klaas on using <a href="http://dumbotics.com/2009/05/24/virtual-pythonenvironments/">virtualenv with dumbo</a>. Once you are up and running with Dumbo, download my <a href="http://github.com/eliast/hadoop-timelines/tree/master">dumbo/timelines.py</a> job script that will process the joblogs.txt and submit them for public viewing to Hadoop Timelines.</p>

<code>dumbo start timelines.py -input joblogs.txt -output results</code>

<p>WARNING: Please be aware that very basic information on your job tasks will be uploaded for public viewing. Anybody will be able to see the number of tasks, job duration, start and end time but nothing else. Please take a look at an <a href="http://hadoop-timelines.appspot.com/timeline/1053">example job</a> if you want to be sure you&#8217;ll be comfortable uploading the same information for your jobs. The entire <a href="http://github.com/eliast/hadoop-timelines/tree/master">source</a> for the project is available as well.</p>

<p>Now that I wrapped this little side project up, I&#8217;m going to start looking into my very own scary looking job graphs and possibly will be blogging whatever lessons I extract from them in the near future. Many thanks to Arun and Owen for uploading their <a href="http://people.apache.org/~omalley/tera-2009/">code and data</a> to compute the TeraSort graphs, <a href="http://twitter.com/klbostee">Klaas</a> for his amazing <a href="http://wiki.github.com/klbostee/dumbo">Dumbo</a> and everyone else who makes writing these type of projects so much fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/06/29/660/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Project Voldemort and Hadoop at Lookery</title>
		<link>http://torrez.us/archives/2009/06/18/653/</link>
		<comments>http://torrez.us/archives/2009/06/18/653/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 02:10:27 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Lookery]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=653</guid>
		<description><![CDATA[It&#8217;s definitely an interesting time as we create next generation tools to deal with lots of data. I just came back up from SFO after attending the Hadoop Summit and nosql.net where I got to hear developers from most of the distributed stores available today explain parts of their current and future work. Many of [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s definitely an interesting time as we create next generation tools to deal with lots of data. I just came back up from SFO after attending the <a href="http://developer.yahoo.com/events/hadoopsummit09/">Hadoop Summit</a> and <a href="http://nosql.eventbrite.com/">nosql.net</a> where I got to hear developers from most of the distributed stores available today explain parts of their current and future work. Many of the talks left me thinking ways in which we can improve Project Voldemort. For now, I&#8217;m just doing my part of documenting and helping others try what&#8217;s already <a href="http://project-voldemort.com/blog/2009/06/voldemort-and-hadoop/">there</a>:

<blockquote>At <a href="http://www.lookery.com/">Lookery</a>, we have been working very hard to transform most of our data processing tasks into batch-oriented workflows in order to deal with growth. For example, we were already using Hadoop to compute our index and data files for our largest database, but the process of serving that information took place over too many network hops (load balancers, reverse proxies and Amazon S3). Therefore, as soon as I learned that Project Voldemort supported offline building of distributed stores, I decided to try it and we’re now running it in production.</blockquote></p>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/06/18/653/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Hadoop User Group (Boston)</title>
		<link>http://torrez.us/archives/2009/03/11/638/</link>
		<comments>http://torrez.us/archives/2009/03/11/638/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 16:16:07 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Lookery]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=638</guid>
		<description><![CDATA[Yesterday I attended WebInno21 with @dcancel and @meattle and after talking to a few startup folks about infrastructure and deployment it reminded me that I have been meaning to probe the local gang to see if there was enough interest in a Hadoop User meeting. The Hadoop User Group (Bay Area) meets pretty regularly. Others [...]]]></description>
			<content:encoded><![CDATA[Yesterday I attended <a href="http://webinno21.eventbrite.com/">WebInno21</a> with <a href="http://twitter.com/dcancel">@dcancel</a> and <a href="http://twitter.com/meattle">@meattle</a> and after talking to a few startup folks about infrastructure and deployment it reminded me that I have been meaning to probe the local gang to see if there was enough interest in a <a href="http://hadoop.apache.org/core/">Hadoop</a> User meeting. The Hadoop User Group (Bay Area) meets <a href="http://upcoming.yahoo.com/search/?quick_date=past&#038;q=Hadoop+User+Group&#038;loc=CA&#038;rt=0">pretty regularly</a>. Others are meeting in <a href="http://www.meetup.com/Hadoop-SD/">San Diego</a>, <a href="http://www.meetup.com/Hadoop-DC/">DC</a>, <a href="http://www.meetup.com/Hadoop-NYC/">NY</a>, <a href="http://www.meetup.com/hadoopla">LA</a>, etc. At <a href="http://www.lookery.com/">Lookery</a>, we have a couple of decent-sized Hadoop clusters running on Amazon EC2 full-time. It&#8217;s at the core of our infrastructure: log analyzing, report generation, data warehousing, creating databases for our APIs, billing, etc. I think I can share a good amount on using Hadoop, but I&#8217;d rather not do this alone. We don&#8217;t have to meet regular and don&#8217;t need to have bleeding-edge project presentations, but simply pick a few topics from deployment, usage, programming model, current experiences, etc. I know personally a few good folks that expressed interest in working with it like <a href="http://www.hubspot.com/">HubSpot</a>, <a href="http://the.echonest.com/">The Echo Nest</a>, <a href="http://www.stylefeeder.com/">StyleFeeder</a>, etc. 

<p>If you&#8217;re interested in either sharing or attending please drop me a comment, <a href="http://twitter.com/eliast">@eliast</a>, mail, whatever so I can gauge the interest level on this.</p>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/03/11/638/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Another weapon against WordPress Theme hacking</title>
		<link>http://torrez.us/archives/2009/02/25/623/</link>
		<comments>http://torrez.us/archives/2009/02/25/623/#comments</comments>
		<pubDate>Wed, 25 Feb 2009 04:36:41 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[wordpress theme hacking]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=623</guid>
		<description><![CDATA[I was trying out the new Safari 4 Beta when I discovered that my site was being considered armed and dangerous. At first, I thought it was Google down again, but I wouldn&#8217;t think they can make the same mistake twice like that. After I read their report I figured maybe it was because I&#8217;m [...]]]></description>
			<content:encoded><![CDATA[I was trying out the new Safari 4 Beta when I discovered that my site was being considered armed and dangerous. At first, I thought it was <a href="http://googleblog.blogspot.com/2009/01/this-site-may-harm-your-computer-on.html">Google down again</a>, but I wouldn&#8217;t think they can make the same mistake twice like that. After I read <a href="http://google.com/safebrowsing/diagnostic?site=202.73.57.6">their report</a> I figured maybe it was because I&#8217;m hosted at DreamHost and my server IP address is shared with other malicious sites. I posted on Twitter and <a href="http://enthusiasm.cozy.org/">Ben Hyde</a> suggested I checked on #dreamhost on freenode for a possible answer and I&#8217;m glad I did.  A guy asks for my URL, checks it and asks whether I meant for a hidden iframe at the bottom of my page to some shady domain. Bingo! I&#8217;ve been compromised again. This simply sucks.

<p>Fool me once, shame on me and you can <a href="http://video.google.com/videosearch?q=fool%20me%20once%20bush">ask Bush for the rest</a>, but the first thing that came to mind was to write a cron job that saves a hash of my template files, checks it daily and mails me whenever it changes. And don&#8217;t worry, I was thinking of you too, WordPress-using reader. So I wrote the function as a WordPress plugin so others can benefit from it, until the spammers catch up with it. It&#8217;s very simple and to the point, but I made sure it was as hands-off as possible.</p>

<h5>Instructions</h5>
<ul>
	<li>Make sure your WordPress installation and template are safe.</li>
	<li><a href="http://torrez.us/code/donthackmytemplate/">Download</a> the code to your WP plugins folder and rename the file extension to .php</li>
	<li>Activate the plugin.</li>
        <li>That&#8217;s pretty much it.</li>
</ul>

<p>The first time it runs it will compute a signature based on your template files&#8217; content. Then, it will check the content hasn&#8217;t changed once a day. In the case your theme get hacked and the contents of your files change, my plugin will discover that and mail you immediately. In fact, it will continue emailing you every day until you fix it. Additionally, it&#8217;ll show you a nice notice in your admin panel to warn you that someone changed your theme. If you want to reset it the warning, simply deactivate/activate the plugin to start fresh.</p>

<p>I must say that as much as I hate having to figure out how WordPress works on its entirety every time I write one of these, the end result is short, sweet and very powerful.</p>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/02/25/623/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Erlang and Distributed Key Value Stores</title>
		<link>http://torrez.us/archives/2009/02/24/609/</link>
		<comments>http://torrez.us/archives/2009/02/24/609/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 19:44:55 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[erlang]]></category>
		<category><![CDATA[Lookery]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[tokyo tyrant]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=609</guid>
		<description><![CDATA[I was reading yet another post from James Hamilton that reminded me to blog about a weekend project I released earlier this week: an Erlang client for MongoDB. MongoDB is written by the folks at 10gen and it&#8217;s kind of like CouchDB and many of the other distributed key value stores, except that unfortunately it [...]]]></description>
			<content:encoded><![CDATA[I was reading yet another post from <a href="http://perspectives.mvdirona.com/2009/02/22/KeyValueStores.aspx">James Hamilton</a> that reminded me to blog about a weekend project I released earlier this week: an <a href="http://github.com/eliast/mongo-erlang-driver/tree/master">Erlang client</a> for <a href="http://www.mongodb.org/display/DOCS/Home">MongoDB</a>. MongoDB is written by the folks at <a href="http://www.10gen.com/">10gen</a> and it&#8217;s kind of like CouchDB and many of the other distributed key value stores, except that unfortunately it was only mentioned in the comments of <a href="http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/">Richard&#8217;s post</a>. Anyway, because I&#8217;ve known <a href="http://blogs.codehaus.org/people/geir/">Geir</a> through friends in the Boston tech scene, I decided to take a crack at writing a client in Erlang for them. I had been following a couple of similar projects: <a href="http://code.google.com/p/erlang-mysql-driver/">erlang-mysql-driver</a> from the erlang-prolific <a href="http://code.google.com/u/yarivvv/">Yariv</a> and <a href="http://blog.poundbang.in">Harish Mallipeddi</a>&#8216;s <a href="http://github.com/mallipeddi/tora/tree/master">Tora</a> project, a <a href="http://tokyocabinet.sourceforge.net/tyrantdoc/">Tokyo Tyrant</a> erlang driver so I was looking for an excuse. The good thing is that much of the scaffolding was already in place, it&#8217;s was now a &#8220;simple&#8221; matter of reading the MongoDB <a href="http://www.mongodb.org/display/DOCS/Mongo+Wire+Protocol">wire protocol</a> and <a href="http://www.mongodb.org/display/DOCS/BSON">BSON binary format</a>, writing a few binary patterns, gen_servers, eunit tests and voila!

<p>My next step would be to do the same for <a href="http://project-voldemort.com/">Project Voldemort</a> since I already had discussed it with Jay Kreps, but I am waiting for the Protocol Buffers implementation of their network protocol to be stable enough. It should be a one-nighter instead of two. Anybody interested in hacking this one?</p>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/02/24/609/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scaling counts and uniques using Hadoop</title>
		<link>http://torrez.us/archives/2009/02/24/599/</link>
		<comments>http://torrez.us/archives/2009/02/24/599/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 19:05:57 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Lookery]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=599</guid>
		<description><![CDATA[Reading James Hamilton&#8217;s post on Google App Engine reminded me to blog about a quick solution to a memory problem I was having in one of our Hadoop jobs. In the analytics world, every metric usually comes with two numbers: counts and uniques. At Lookery we use Hadoop to process all of the data from [...]]]></description>
			<content:encoded><![CDATA[Reading James Hamilton&#8217;s <a href="http://perspectives.mvdirona.com/2009/02/23/BuildingScalableWebAppsWithGoogleAppEngine.aspx">post on Google App Engine</a> reminded me to blog about a quick solution to a memory problem I was having in one of our Hadoop jobs. In the analytics world, every metric usually comes with two numbers: counts and uniques. At Lookery we use <a href="http://hadoop.apache.org/">Hadoop</a> to process all of the data from our logs and <a href="http://www.audioscrobbler.net/development/dumbo/">Dumbo</a> to quickly write new jobs using Hadoop Streaming and Python. Furthermore, we started using the <a href="http://hadoop.apache.org/core/docs/r0.15.2/api/org/apache/hadoop/mapred/lib/aggregate/package-summary.html">Hadoop Aggregate package</a> for efficiency as we try to keep as much as we can in the JVM during the combiner/reducer phases. Everything was going great, until I began to get Out of Memory errors while performing a <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/aggregate/UniqValueCount.html">UniqValueCount</a>. The solution was as simple and elegant as Brett Slakin&#8217;s suggestion for counters in Google App Engine:

<blockquote>For example, Brett shows how to implement a scalable counter and (nearly) ordered comments using App Engine Megastore. For the former, shard the counter to get write scale and sum them on read.</blockquote>

<p>Imagine millions of log lines containing &#8216;timestamp+TAB+userid&#8217;, the goal is to keep a count of how many calls in total and how many unique users we saw during that timeframe. Using Hadoop Aggregate Package all you have to do is emit records like these: &#8220;LongValueSum:Key+TAB+Count&#8221; or &#8220;UniqValueCount:Key+TAB+Value&#8221; and Hadoop will use a built-in combiner/reducer to return either &#8220;Key+TAB+TotalSum&#8221; or &#8220;Key+TAB+UniqueCount&#8221;. But in the case you run into Hadoop throwing out of memory exceptions when performing unique counts, all you have to do is &#8220;Shard It!&#8221; Simply put, hash your value and extend the key to include the partition. I use a &#8216;#&#8217; separator to know when I sharded a key and by performing a second iteration on the job I can then LongValueSum all of the individual partitions to get my final count. Below is a simplified Python+Dumbo script to show the idea. In production, you just let Hadoop take over the reducer/combiner function.</p>

<p>Beware though that I was lazy to actually find the issue with Hadoop, since it shouldn&#8217;t need much memory to perform a unique count if the results are sorted, but at least I solved my problem and got to write a blog post about it.</p>

<code>
<pre>
#!/usr/bin/env python
import sys

def mapper1(key, value):
    key, value = value.split('t')
    yield "LongValueSum:TotalCount", 1
    yield "UniqValueCount:%d#Uniques" % (hash(value) % (2 &lt;&lt; 14),), value

def mapper2(key, value):
    key, value = value.split('t')

    if key.find("#") &gt; -1:
        h, key = key.split('#')
    
    yield "LongValueSum:%s" % (key,), value
        
def aggregate_combiner(key, values):
    agg, subkey = key.split(":")
    if agg.startswith('LongValueSum'):
        yield key, sum(values)
    elif agg.startswith('UniqValueCount'):
        uniq = set()
        map(uniq.add, values)
        for value in uniq:
            yield key, value

def aggregate_reducer(key, values):
    agg, subkey = key.split(":")
    if agg.startswith('LongValueSum'):
        yield key[13:], sum(values)
    elif agg.startswith('UniqValueCount'):
        uniq = set()
        map(uniq.add, values)
        yield key[15:], len(uniq)

if __name__ == "__main__":

    import dumbo
    job = dumbo.Job()
    job.additer(mapper1, aggregate_reducer, aggregate_combiner)
    job.additer(mapper2, aggregate_reducer, aggregate_combiner)
    job.run()
</pre>
</code>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/02/24/599/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Fun with Fluid for monitoring load balancers</title>
		<link>http://torrez.us/archives/2009/02/24/593/</link>
		<comments>http://torrez.us/archives/2009/02/24/593/#comments</comments>
		<pubDate>Tue, 24 Feb 2009 18:14:08 +0000</pubDate>
		<dc:creator>Elias Torres</dc:creator>
				<category><![CDATA[Lookery]]></category>

		<guid isPermaLink="false">http://torrez.us/?p=593</guid>
		<description><![CDATA[I guess you could say that I&#8217;ve been on an admin/ops kick for the past week. Here&#8217;s another quick thing I did recently to monitor HAProxy status pages. We use a few number of HAProxy instances to load balance across different services at Lookery, so like my previous problem, I want to look at several [...]]]></description>
			<content:encoded><![CDATA[I guess you could say that I&#8217;ve been on an admin/ops kick for the past week. Here&#8217;s another quick thing I did recently to monitor <a href="http://haproxy.1wt.eu/">HAProxy</a> <a href="http://demo.1wt.eu/">status pages</a>. We use a few number of HAProxy instances to load balance across different services at Lookery, so like my <a href="http://torrez.us/archives/2009/02/24/579/">previous problem</a>, I want to look at several load balancer status at the same time. This time I took a different approach. I wanted HTML and pretty colors. 

<ul>
	<li>If necessary setup a shell script to create ssh tunnels to all of your load balancers</li>
	<li>Create a local HTML file with iframe elements pointing to each of the tunnels</li>
        <li>Hack with CSS for 20 minutes until you have reached the layout you wanted (I used overflow:hidden, position:absolute and z-index to get the desired look)</li>
        <li>Install <a href="http://fluidapp.com/">Fluid</a> and create a new application pointing to your local html file</li>
        <li>Make sure you uncheck the &#8216;Fluid attempts to show badge lables&#8217; preference in YourFluid.app so it doesn&#8217;t conflict with your own userscript</li>
        <li>Create a new userscript to update the badge label so it includes current activity if above a certain treshold</li>
        <li>Visit the <a href="http://www.flickr.com/groups/fluid_icons/pool/">Flick Fluid Icon Pool</a> to pick a cool transparent icon for your dock</li>
</ul>

<p><a href="http://flickr.com/photos/eliast/3306312609/"><img alt="" src="http://farm4.static.flickr.com/3624/3306312609_3f3a6a3ba8.jpg" title="Load Balancer Monitoring using Fluid" class="alignnone" width="500" height="285" /></a></p>

<p>At first I tried using Dashboard and the Web Clip function in Safari to create a widget for each server so I wouldn&#8217;t have to lay them out myself, but the widgets kept resizing themselves and I wasn&#8217;t getting the dock badge labels. But that&#8217;s up to you and your skillz.</p>]]></content:encoded>
			<wfw:commentRss>http://torrez.us/archives/2009/02/24/593/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

