How Nostradamus Predicted Presto Engine for Querying 250 PB Data

Who is Nostradamus?

Nostradamus is the french seer, who has done predictions of the future?

As per bible, god created a man ( Mark Zuckerberg).

Man ( Mark Zuckerberg) created a Facebook

The database, information was so small, when he started.Summary of all the cloud-related stuff after this quote

MapReduce is used in distributed computing environment. It maps various data points. After computation and logical operations reduces, and results are produced.

You have to understand “Google Search Architecture“, “Google Search architecture part2“

HBase (database)

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

Hadoop and HDFS

Hadoop was created by Doug Cutting and Michael J. Cafarella.Doug, who was working at Yahoo at the time, named it after his son’s toy elephant. It was originally developed to support distribution for the Nutch search engine project.

Hadoop Architecture

1 Hadoop and its “Hadoop Distributed File System” (HDFS), an open source Java product

Problems with Hive and others

Historically, our data scientists and analysts have relied on Hive for data analysis,” Traverso said. “The problem with Hive is it’s designed for batch processing. We have other tools that are faster than Hive, but they’re either too limited in functionality or too simple to operate against our huge data warehouse. Over the past few months, we’ve been working on Presto to basically fill this gap.”

Why PRESTO?

Presto solve the hive and other problems, as mentioned above/

Presto is open source project from Facebook

According to netflix team

We had been in search of an interactive querying engine that could work well for us. Ideally, we wanted an open source project that could handle our scale of data & processing needs, had great momentum, was well integrated with the Hive metastore, and was easy for us to integrate with our DW on S3. We were delighted when Facebook open sourced Presto.

In terms of scale, we have a 10 petabyte data warehouse on S3. Our users from different organizations query diverse data sets across expansive date ranges. For this use case, caching a specific dataset in memory would not work because cache hit rate would be extremely low unless we have an unreasonably large cache. The streaming DAG execution architecture of Presto is well-suited for this sporadic data exploration usage pattern.

In terms of integrating with our big data platform, Presto has a connector architecture that is Hadoop friendly. It allows us to easily plug in an S3 file system. We were up and running in test mode after only a month of work on the S3 file system connector in collaboration with Facebook.

Hope this helps

	Enterprise Computing… on Lean, Agile – snaps…
	Mark Graban on Lean, Agile – snaps…
	Quora on Facebook Architecture
	Dan Hudson on WhatsApp Architecture
	peterpalal on WhatsApp Architecture

Bhaskaruni's Blog

Technology and Management Views

How Nostradamus Predicted Presto Engine for Querying 250 PB Data

Who is Nostradamus?

As per bible, god created a man ( Mark Zuckerberg).

Man ( Mark Zuckerberg) created a Facebook

Problems with Hive and others

Why PRESTO?

Leave a comment Cancel reply

Who is Nostradamus?

As per bible, god created a man ( Mark Zuckerberg).

Man ( Mark Zuckerberg) created a Facebook

Problems with Hive and others

Why PRESTO?

Share this:

Related

Leave a comment Cancel reply