A trillion…anything…in your Hadoop cluster is cool

My colleague Vinay Goel is running some Hadoop jobs on the link graph of a recent web-wide crawl of ours, a dataset similar to the 80TB web-wide crawl we made available for research purposes.

For one of his jobs, which processes all of the links found on all the archived pages, the Hadoop JobTracker statistics page reads:

Map output records | 1,127,187,232,984

Seeing our Hadoop cluster process a trillion of anything is pretty cool.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>