The road so far….

October 29, 2012

Lets Crunch big data

Filed under: java — Tags: , — Rahul Sharma @ 2:39 pm

As developers our focus is on simpler, effective solutions and thus one of the most valued principle is “Keep it simple and stupid”. But with Hadoop  map-reduce it was a bit hard to stick to this. If we are evaluating data in multiple Map Reduce jobs we would end up with code that is not related to business but more related to infra. Most of the non-trivial business data processing involves quite a few of  map-reduce tasks. This means longer tread times and  harder to test solutions.

Google presented solution to these issues in their FlumeJava paper. The same paper has been adapted in implementing Apache-Crunch. In a nutshell Crunch is a java  library which simplifies development on MapReduce pipelines. It provides a bunch of  lazily evaluated collections which can be used to perform various operations in form of map reduce jobs. (more…)

June 1, 2010

MapReduce and The Cloud

Filed under: java — Tags: , , — Rahul Sharma @ 8:50 pm

The other day I was discussing with one of my colleagues the spring-vmforce-GAE cloud platform, that is yet to be launched, and  the discussion led to MapReduce as a cloud platform. There seems to be an understanding about Hadoop MapReduce as some competitive Cloud platform. (more…)