Skip to main content

Facebook Publishes Blog,Corona



Facebook Publishes Super Nerdy Big Data Engineering Blog Post To Attract Hardcore Coders 

100 petabyte clusters! 60,000 hive queries a day! Facebook’s latest 1,800-word engineering blog post has one goal: proving to the world’s top programmers that if they want a challenge, they should work for the social network. There’s not much for the layman beyond that Facebook’s data warehouse is 2,500 times bigger than in 2008. This is back-end geek porn, and it’s critical to Facebook’s longterm success.
Facebook has the same talent retention problem as any tech startup that goes public. Without the massive upside of a little stock potentially being worth a lot of money one day, getting the best coders, designers, product visionaries, and biz whizzes to come aboard or not jump ship is tough.
There’s the lure of founding a company and calling the shots. There’s the excitement of joining an ass-kicking little startup as it hits its hockey stick. If Facebook can’t outshine those, it could stagnate in its maturity and become more vulnerable to disruption.
But Facebook has one thing young startups don’t have. Or should I say one billion things. Its massive user base means that what it builds seriously influences the world, and it’s trying to solve engineering problems on the forefront of computer science. At first glance, though, it might just seem like another consumer product. That’s why it needs blog posts like “Under the Hood: Scheduling MapReduce jobs more efficiently with Corona”.
The note details the limits of the Hadoop MapReduce scheduling framework, and how Facebook built its own version of Corona to surpass those limits. Facebook has open-sourced Corona and it’s now on GitHub. The benefits include dropping slot refill times from 10 seconds with MapReduce to just 600 milliseconds, cutting job latency in half, and better cluster utilization and scheduling fairness. I’m not going to paraphrase them any more, so if that stuff fascinates you, read the post.
Facebook has been publishing engineering blog posts for years, but the Under The Hood series started right about when it filed to IPO. Old eng blog posts used to be more about the human story of building Facebook’s back-end, but seem to have gotten more hardcore since it went public. And that’s smart, because it doesn’t have the financial windfall of a rapidly rising valuation to attract engineers anymore.
Facebook must show it is a riddle, wrapped in a mystery, inside an enigma, because that’s what gets great programmers fired up.

Comments

Popular posts from this blog

Installing pyspark with Jupyter

Installing pyspark with Jupyter Check List Python is a wonderful programming language for data analytics. Normally, I prefer to write python codes inside   Jupyter Notebook  (previous known as  IPython ), because it allows us to create and share documents that contain live code, equations, visualizations and explanatory text.  Apache Spark  is a fast and general engine for large-scale data processing.  PySpark  is the Python API for Spark. So it’s a good start point to write PySpark codes inside jupyter if you are interested in data science: IPYTHON_OPTS="notebook" pyspark --master spark://localhost:7077 --executor-memory 7g Install Jupyter If you are a pythoner, I highly recommend installing  Anaconda . Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science. Go to  https://www.continuum.io/downloads , find the ins...

Spark A to Z

Spark A to Z Simplicity, Flexibility and Performance are the major advantages of using Spark . Criteria Hadoop MapReduce Apache Spark Memory  Does not leverage the memory of the hadoop cluster to maximum. Let's save data on memory with the use of RDD's. Disk usage MapReduce is disk oriented. Spark caches data in-memory and ensures low latency. Processing Only batch processing is supported Supports real-time processing through spark streaming. Installation Is bound to hadoop. Is not bound to Hadoop. ·  Spark is 100 times faster than Hadoop for big data processing as it stores the data in-memory, by placing it in Resilient Distributed Databases (RDD). ·  Spark is easier to program as it comes with an interactive mode. ·  It provides complete recovery using lineage graph whenever something goes wrong. high availability in Apache Spark ·  Implementing single node recovery with local file system ·  Using Sta...

Google Nexus 7 - Buy in India

Don't get hurry it's better to wait for another month or two and get it cheaper! The 8GB Google Nexus 7 officially costs USD 199/- (approx Rs. 11,000 ) and 16GB costs USD 249/- (approx Rs. 13,750) in US. Pre-order listing at Grabmore is a 8GB model.  The good part though about this listing is that Rs. 16499 is all inclusive which includes shipping, handling, taxes as well as customs charges. Given that Nexus 7 has a quad core Tegra processor, 1GB ram and comes loaded with Google’s latest Jelly Bean version on Android – it still looks quite attractive compared to other tablets on offer in Indian market currently. Before you hit the order button though, you have to keep couple of things in mind. This is a pre-order listing and expected delivery time is about 4 to 5 weeks. Even in the US, Nexus 7 is currently not available and expected to start shipping from mid July. The expected delivery date on Grabmore is from 13 th  August to 18 th  August. So, if you a...