Skip to main content

Facebook Publishes Blog,Corona



Facebook Publishes Super Nerdy Big Data Engineering Blog Post To Attract Hardcore Coders 

100 petabyte clusters! 60,000 hive queries a day! Facebook’s latest 1,800-word engineering blog post has one goal: proving to the world’s top programmers that if they want a challenge, they should work for the social network. There’s not much for the layman beyond that Facebook’s data warehouse is 2,500 times bigger than in 2008. This is back-end geek porn, and it’s critical to Facebook’s longterm success.
Facebook has the same talent retention problem as any tech startup that goes public. Without the massive upside of a little stock potentially being worth a lot of money one day, getting the best coders, designers, product visionaries, and biz whizzes to come aboard or not jump ship is tough.
There’s the lure of founding a company and calling the shots. There’s the excitement of joining an ass-kicking little startup as it hits its hockey stick. If Facebook can’t outshine those, it could stagnate in its maturity and become more vulnerable to disruption.
But Facebook has one thing young startups don’t have. Or should I say one billion things. Its massive user base means that what it builds seriously influences the world, and it’s trying to solve engineering problems on the forefront of computer science. At first glance, though, it might just seem like another consumer product. That’s why it needs blog posts like “Under the Hood: Scheduling MapReduce jobs more efficiently with Corona”.
The note details the limits of the Hadoop MapReduce scheduling framework, and how Facebook built its own version of Corona to surpass those limits. Facebook has open-sourced Corona and it’s now on GitHub. The benefits include dropping slot refill times from 10 seconds with MapReduce to just 600 milliseconds, cutting job latency in half, and better cluster utilization and scheduling fairness. I’m not going to paraphrase them any more, so if that stuff fascinates you, read the post.
Facebook has been publishing engineering blog posts for years, but the Under The Hood series started right about when it filed to IPO. Old eng blog posts used to be more about the human story of building Facebook’s back-end, but seem to have gotten more hardcore since it went public. And that’s smart, because it doesn’t have the financial windfall of a rapidly rising valuation to attract engineers anymore.
Facebook must show it is a riddle, wrapped in a mystery, inside an enigma, because that’s what gets great programmers fired up.

Comments

Popular posts from this blog

Spark A to Z

Spark A to Z Simplicity, Flexibility and Performance are the major advantages of using Spark . Criteria Hadoop MapReduce Apache Spark Memory  Does not leverage the memory of the hadoop cluster to maximum. Let's save data on memory with the use of RDD's. Disk usage MapReduce is disk oriented. Spark caches data in-memory and ensures low latency. Processing Only batch processing is supported Supports real-time processing through spark streaming. Installation Is bound to hadoop. Is not bound to Hadoop. ·  Spark is 100 times faster than Hadoop for big data processing as it stores the data in-memory, by placing it in Resilient Distributed Databases (RDD). ·  Spark is easier to program as it comes with an interactive mode. ·  It provides complete recovery using lineage graph whenever something goes wrong. high availability in Apache Spark ·  Implementing single node recovery with local file system ·  Using Sta...

Installing pyspark with Jupyter

Installing pyspark with Jupyter Check List Python is a wonderful programming language for data analytics. Normally, I prefer to write python codes inside   Jupyter Notebook  (previous known as  IPython ), because it allows us to create and share documents that contain live code, equations, visualizations and explanatory text.  Apache Spark  is a fast and general engine for large-scale data processing.  PySpark  is the Python API for Spark. So it’s a good start point to write PySpark codes inside jupyter if you are interested in data science: IPYTHON_OPTS="notebook" pyspark --master spark://localhost:7077 --executor-memory 7g Install Jupyter If you are a pythoner, I highly recommend installing  Anaconda . Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science. Go to  https://www.continuum.io/downloads , find the ins...
microsoft surface microsoft surface price With its Windows business facing increasing competition thanks to a dizzying array of consumer devices like Apple's iPhone and tablet computers running on Google's Android operating system,  Microsoft  is placing its bets on its own hardware and software with the "Surface". How can the software giant distinguish its yet-to-be-released  tablet computer ? Some analysts believe price could be the difference. "The company did not disclose any pricing details, but said that pricing will be competitive with a comparable ARM tablet and Intel Ultrabook," Deutsche Bank Securities analysts said in a research note last week. "However, we believe that with its own device and with control on pricing, MSFT will be able to better use price as a competitive differentiator." Nonetheless, if a recent report proves to be accurate, Microsoft's tablet computer -- particularly its higher-end version -- prob...