The road so far….

October 29, 2012

Lets Crunch big data

Filed under: java — Tags: , — Rahul Sharma @ 2:39 pm

As developers our focus is on simpler, effective solutions and thus one of the most valued principle is “Keep it simple and stupid”. But with Hadoop  map-reduce it was a bit hard to stick to this. If we are evaluating data in multiple Map Reduce jobs we would end up with code that is not related to business but more related to infra. Most of the non-trivial business data processing involves quite a few of  map-reduce tasks. This means longer tread times and  harder to test solutions.

Google presented solution to these issues in their FlumeJava paper. The same paper has been adapted in implementing Apache-Crunch. In a nutshell Crunch is a java  library which simplifies development on MapReduce pipelines. It provides a bunch of  lazily evaluated collections which can be used to perform various operations in form of map reduce jobs. (more…)

Back in action

Filed under: java — Rahul Sharma @ 2:37 pm

It has been quite some time since I blogged . The last blog that I wrote was around a year back, and since then so much has happened. I have learnt new trick of trade in Salesforce, Maven practices, Camel, Solr, Hadoop MapReduce, Netty, Javaassist, Crunch etc. There are so many things to talk about and so little time at hand to write about. Hopefully I would make a justice to it.