[Editor’s Note] This ThrowbBack Thursday celebrates Teradata making Presto enterprise-ready in a recent announcement.
This coming fall Facebook is set to release Presto. For those not in the loop, Presto is not some consumer novelty or mobile accessory such as Home – it is actually part of Facebook’s internal backend. The Presto engine is Facebook’s way of dealing with its massive scale and is able to search through 250 petabytes of data, and it is soon to be made available as open source.
Facebook, like many companies that are applying big data analytics, has maintained a Hadoop and Hive implementation – the largest in the world in fact. However, there has been one problem with this: Hadoop, being geared towards matrix-oriented solutions such as PageRank and batch processing, does not map over well to Facebook’s needs. Hence the need for Presto, which replaces Hadoop (though Hive is still the underlying data warehouse). Presto can handle all of the data under Facebook’s ownership, and has been shown to execute queries eight to ten times faster than the comparable implementation using Hadoop.
It is very hard to tell at this point what Presto’s release as open source will mean for data science, however. Facebook’s needs are not necessarily like those of all outfits that deal in big data since they are primarily social graph-based and because its scale of data is especially large.
In addition to Hadoop and the soon-to-be-released Presto, the big data space also contains HPCC, a solution that has its origins with one of the original players in electronic data search, LexisNexis. Furthermore, Presto has drawn some comparisons to Cloudera Impala even though the two are completely separate technologies. Ultimately, the primary tools for big data analysis are multiplying and evolving. This trend is an excellent sign for the viability of big data.
For careers in Big Data, please visit our Career Portal.