FACEBOOK THRIFTS
Thrift is a
software library and a set of code generation tool which was developed at the
Facebook Office at Palo Alto, California, to expedite development and
implementation of scalable and efficient backend services. The primary goal of
thrift is enable efficient and reliable communication across programming
languages by abstracting the portions of each language that tend to require the
most customization into a common library that is implemented in each language.
This is done by allowing the users to define
the data types and service interfaces in a common Interface Definition Logic
File (IDL File) which is supposed to be language neutral file and it generates
all the necessary code to build Remote Procedure Calls to clients and servers.
This report explains the design choices and implementation level details and
also tries to demonstrate a sample Thrift Service.
The whole
concept of Thrift stemmed out from the fact that a new direction was required
to tackle the resource demands problems for many of Facebook's on-site
applications, which couldn’t be addressed by staying within the LAMP framework.
LAMP is the acronym for Linux, MySQL, Apache and PHP. When Facebook was being
laboriously designed, it was done from ground up using this LAMP framework. By
2006 Facebook was widely accepted all over the world as the social networking
site and consequently its network traffic also grew giving rise to the need for
scaling its network structure for many of its onsite applications like, search,
ad selection and delivery and event logging.
Scaling these
operations to match the resource demands was not possible within the LAMP
framework. In their implementation of creating many of these services like
search, event logging various programming languages had been selected to
optimize for the right combination of performance, ease and speed of
development, availability of existing libraries etc. Also a large portion of
the Facebook's culture has always preferred to choose the best tools and
implementations over the standardizing on any one programming language and
begrudgingly accepting its inherent limitations. Most of the programming
languages either suffered from subpar performance or constrained data type
freedom. Given all these technical challenges and design choices, the engineers
at Facebook were presented with a herculean task of building a scalable,
transparent and high performance bridge across various programming languages.
Thrift Design Features
The primary idea
behind Thrift is that it consists of a language neutral stack which is
implemented across various programming languages and an associated code
generation engine which transforms a simple interface and data definition
language into client and server remote procedure call libraries. Thrift is
designed to be as simple as possible for the developers who can define all the
necessary data structures and interfaces for a complex service in a single
short file. This file is called as Thrift Interface Definition Logic File or
Thrift IDL File. The developers identified some important features while
evaluating the technical challenges of cross language interactions in a
networked environment.
Types of thrifts
A common type
system should exist across all the programming languages without requiring the
need for the developers to write their own serialization code. Serialization is
the process of transforming an object of one type to another. For example if a
programmer has written an application implementing a strongly typed STL map for
a Python dictionary. Neither programmer should be forced to write any code
below the application layer. Dictionary is a data type in Python which allows
sequencing a collection of items or elements using keys. It is very similar to
'Associative Arrays'.
Transport
Each language
must have a common interface to bidirectional raw data transport. Consider a
scenario where there are 2 servers in which, one is deployed in Java and the other
one is deployed in Python. So a typical service written in Java should be able
to send the raw data from that service to a common interface which will be
understood by the other server which is running on Python and vice-versa. The
Transport Layer should be able to transport the raw data file across the two
ends. The specifics about how this transport is implemented shouldn’t matter to
facebook thrifts.The service
developer. The same application code should be able to run against TCP Stream
Sockets, raw data in memory or files on disk.
Protocol
In order to
transport the raw data, they have to be encoded into a particular format like
binary, XML etc. Therefore the Transport Layer uses some particular protocol to
encode or decode the data. Again the application developer will not be bothered
about this. He is only worried whether the data can be read or written in some
deterministic manner.
Versioning
For the services
to be robust they must evolve from their present version. They should
incorporate new features and in order to do this the data types involved in the
service should provide a mechanism to add or delete fields of an object or
alter the arguments list of a function without any interruption in service.
This is called Versioning.
Processors
Processors are
the ones which process the data streams and accomplish Remote Procedure Calls.
Thrift allows
programmers to develop completely using thrift's native data type rather than
using any wrapper objects or special dynamic types. It also does not require
the developer to write any serialization code for transport. The developer is
given the freedom to logically annotate their data structures in Thrift
Interface Definition Logic File (IDL File), with minimal amount of extra
information necessary to tell the code generator how to safely transport the
objects across languages.
Structs
A thrift struct
defines a common object to be used across languages. A struct is essentially
similar to a class in object oriented programming languages. A Thrift struct
has a strongly typed field with unique field identifiers. The basic syntax for
Thrift struct is very similar to the structs used in C. The fields in a Thrift
struct may be annotated with unique field identifiers unique to the scope of
the struct and also with optional default values. The concept of field
identifiers can be omitted also and this concept of field identifers was
introduced strictly for versioning purposes.
This is how a
Thrift Struct looks like:
struct Example
{
1: i32 number
=10, 2: i64 bignumber,
3: double
decimals,
4: string name=
“NB”
};
As you can see
the fields inside the Thrift struct are labeled with unique field identifiers.
Facebook Thrift Services
Thrift has been
employed in a large number of applications at Facebook, including search,
logging, mobile, ads and the developer platform. Two specific usages are
discussed below.
Search
Thrift is used
as the underlying protocol and transport layer for the Facebook Search service.
The multi-language code generation is well suited for search because it allows
for application development in an efficient server side language (C++) and
allows the Facebook PHP-based web application to make calls to the search
service using Thrift PHP libraries. There is also a large variety of search
stats, deployment and testing functionality that is built on top of generated
Python code. Additionally, the Thrift log file format is used as a redo log for
providing real-time search index updates. Thrift has allowed the search team to
leverage each language for its strengths and to develop code at a rapid pace.
Logging
The Thrift FileTransport
functionality is used for structured logging. Each service function definition
along with its parameters can be considered to be a structured log entry
identified by the function name. This log can then be used for a variety of
purposes, including online and offline processing, stats aggregation and as a
redo log.
Thrift has enabled
Facebook to build scalable backend services efficiently by enabling engineers
to divide and conquer. Application developers can focus on application code
without worrying about the sockets layer. We avoid duplicated work by writing
buffering and I/O logic in one place, rather than interspersing it in each
application. Thrift has been employed in a wide variety of applications at
Facebook, including search, logging, mobile, ads, and the developer platform.
We have found that the marginal performance cost incurred by an extra layer of
software abstraction is far eclipsed by the gains in developer efficiency and
systems reliability. Finally Thrift has been added to Apache Software
Foundation as the Apache Thrift Project, making it open source framework for
cross-language services implementation.
No comments:
Post a Comment