Who is Nostradamus?
Nostradamus is the french seer, who has done predictions of the future?
As per bible, god created a man ( Mark Zuckerberg).
Man ( Mark Zuckerberg) created a Facebook
The database, information was so small, when he started.Summary of all the cloud-related
stuff after this quote
MapReduce
MapReduce is used in distributed computing environment. It maps various data points. After computation and logical operations reduces, and results are produced.
HBase (database)
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.
Hadoop and HDFS
Hadoop was created by Doug Cutting and Michael J. Cafarella.Doug, who was working at Yahoo at the time, named it after his son’s toy elephant. It was originally developed to support distribution for the Nutch search engine project.
|
Hadoop Architecture |
1 Hadoop and its “Hadoop Distributed File System” (HDFS), an open source Java product
Problems with Hive and others
Historically, our data scientists and analysts have relied on Hive for data analysis,” Traverso said. “The problem with Hive is it’s designed for batch processing. We have other tools that are faster than Hive, but they’re either too limited in functionality or too simple to operate against our huge data warehouse. Over the past few months, we’ve been working on Presto to basically fill this gap.”
Why PRESTO?
Presto solve the hive and other problems, as mentioned above/
Presto is open source project from Facebook
According to netflix team
We had been in search of an interactive querying engine that could work well for us. Ideally, we wanted an open source project that could handle our scale of data & processing needs, had great momentum, was well integrated with the Hive metastore, and was easy for us to integrate with our DW on S3. We were delighted when Facebook open sourced Presto.
In terms of scale, we have a 10 petabyte data warehouse on S3. Our users from different organizations query diverse data sets across expansive date ranges. For this use case, caching a specific dataset in memory would not work because cache hit rate would be extremely low unless we have an unreasonably large cache. The streaming DAG execution architecture of Presto is well-suited for this sporadic data exploration usage pattern.
In terms of integrating with our big data platform, Presto has a connector architecture that is Hadoop friendly. It allows us to easily plug in an S3 file system. We were up and running in test mode after only a month of work on the S3 file system connector in collaboration with Facebook.
Hope this helps