Yahoo is one of the top search engine website in now a days internet world,After google search engine Yahoo is the best search engine.Yahoo also providing so many services to the users and developing so many applications for business intelligence by using so many technologies.Hadoop is one of the major using technology in Yahoo. Now in this posting i will explain about How Yahoo Using Hadoop In Real Time .
Yahoo is birth place of Hadoop and When it comes about the size of the hadoop cluster,yahoo beats all by having the 42000 nodes in about 20 YARN (aka MapReduce 2.0)clusters with 600 petabytes of data on HDFS.
Yahoo uses hadoop to block around 20.5 billion messages and checks it to enter it into its email server.Yahoo’s spam detection abilities has increased to manifolds since it started using hadoop.
In the ever growing family of hadoop,yahoo has been one of the major contributor.
Yahoo has been the pioneer of many new technologies which have already embraced itself into hadoop ecosystem.
Few notable technologies which yahoo has been using apart from mapreduce and hdfs is Apache tez and spark.
One of the main vehicle of yahoo’s hadoop chariot is Apache pig which started in yahoo and it still tops the chart as 50-60 percent of jobs are processed using pig scripts.
Yahoo’s primary Hadoop users are the data analysts and scientists who use the technology’s massively advanced analytics horsepower to optimize many customer-facing business processes.
These include optimizing ad placement, improving real-time customer experiences, targeting offers through behavioral analysis, and performing high-volume content mining and transformation.
Business process professionals who are considering exploring the connection between big data and business process transformation should follow Yahoo’s lead in setting up the technology as a shared enterprise component with robust availability, administration, and resource allocation across disparate business processes.
How Yahoo Using Hadoop In Real Time