The road so far….

June 1, 2010

MapReduce and The Cloud

Filed under: java — Tags: , , — Rahul Sharma @ 8:50 pm

The other day I was discussing with one of my colleagues the spring-vmforce-GAE cloud platform, that is yet to be launched, and  the discussion led to MapReduce as a cloud platform. There seems to be an understanding about Hadoop MapReduce as some competitive Cloud platform. But in my opinion the MapReduce is a programming paradigm rather a platform. Cloud Computing on the other hand is a shared resource platform i.e. PaaS.

Cloud computing is about using resources from various vendors that they have in place and utilize them according to your needs. MapReduce can be deployed on this platform so that the vast pool of resources can be used as and when required. But not all of the vendors support this paradigm. I believe only Amazon EC2 supports this via Elactic MapReduce, and correspondingly Hadoop also has the support of using Amazon cloud as a storage via the S3FileSystem API. It would be interesting if other vendors like GAE or the new spring-vmforce-GAE cloud platform enable some support for such a paradigm.

MapReduce is a programming model that was introduced by Google in their paper for data intensive jobs.  It is possible to make an application based on Google MapReduce model if the following characteristics are there in your architecture model:

  • Should be able to support Master-worker data structures
  • Should be fault tolerant i.e. there can be failures from various places eg Master, worker.
  • Should support Data Locality i.e. Data for the task should be there on the node on which the task is running.
  • Should be able to break tasks into granular tasks.
  • Should support Backup tasks.

These are just the prerequisites for the architecture to enable development of a MapReduce application, but the Google paper also describes more characteristics that are desired to make a better MapReduce applications.

Google used this model to analyse the data available on the internet. The same model has been developed in Apache Hadoop and has been used by Yahoo! to analyze and process data. You can also make your own MapReduce model and use it, provided you have enough support from your architecture.

Besides Apache Hadoop, Sun JavaSpaces and the corresponding Space Based Architecture in my opinion also have support for this kind of paradigm.  There have been vendor specific implementations of the JavaSpaces  specifications e.g. GigaSpaces and  corresponding MapReduce framework have been developed over them. There are many more implementations of  MapReduce paradigm, and not Cloud services.

References:

Google MapReduce paper

Advertisements

1 Comment »

  1. Reblogged this on HadoopEssentials.

    Comment by Nitin Kumar — August 17, 2014 @ 7:20 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: