What do you use Apache Thrift for?

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

jacques01
Posts: 42
Joined: Thu Oct 08, 2015 4:56 am UTC

What do you use Apache Thrift for?

Postby jacques01 » Thu Jun 09, 2016 11:54 pm UTC

I've read a little bit about Apache Thrift.

However, all the articles I've read talk about what it is, and how to use it.

What they don't mention is, WHEN do you use it.

For example, if I have an embarrassingly parallel problem and access to a cluster of computers, I can get my job done faster by using Hadoop, rather than doing a parallelizable computation on a single machine.

Can Apache Thrift make code more scaleable / faster? If I write my heavy lifting code in C/C++ and use Thrift to communicate it to my Java web server, will I see a significant gain in performance?

The original white paper (see http://thrift.apache.org/static/files/t ... 1465515663) suggests that Thrift can even cause performance losses:

We have found that the marginal performance cost in- curred by an extra layer of software abstraction is far eclipsed by the gains in developer efficiency and systems reliability.


So it appears that the aim of Thrift is not to provide performance gains, but rather a global interface to facility cross-language communication?

Tub
Posts: 326
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: What do you use Apache Thrift for?

Postby Tub » Fri Jun 10, 2016 6:43 pm UTC

Thrift is indeed for communication only. We evaluated it, but decided against it for three reasons:
* their main feature is cross-language-portability, which we don't need
* we need guarantees of zero-copy-transmission, because some of our buffers fill more than half the available memory. If they have such guarantees, they're undocumented.
* it does not seem to have a mechanism for executing code before transmitting a buffer, e.g. to acquire a read lock or something. We need that.

Hence, I use apache thrift for nothing. But if it fits your IPC requirements, it's doing a good job of generating all that boilerplate-code out of very small and readable configuration files.


Whether or not homebrew distribution + thrift will beat the one-size-fits-all package from hadoop depends on your specific problem. Hadoop was made to scale up, and it does. Scaling down.. eh.. no. We have parallel problems that take weeks, and we have parallel problems that take seconds. The latter set of problems really should not use a framework that takes a minute just to initialize itself. Hence, we don't use hadoop either.

Don't worry too much about thrift's overhead though, in most cases it should be minimal, and it's probably still faster than 90% of the handwritten IPC code out there. IPC is really not trivial. Besides, the overhead you're incurring by using hadoop is a lot larger than the overhead from thrift, and you seem to be fine with that.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 14 guests