Page 1 of 1

What do you use Apache Thrift for?

Posted: Thu Jun 09, 2016 11:54 pm UTC
by jacques01
I've read a little bit about Apache Thrift.

However, all the articles I've read talk about what it is, and how to use it.

What they don't mention is, WHEN do you use it.

For example, if I have an embarrassingly parallel problem and access to a cluster of computers, I can get my job done faster by using Hadoop, rather than doing a parallelizable computation on a single machine.

Can Apache Thrift make code more scaleable / faster? If I write my heavy lifting code in C/C++ and use Thrift to communicate it to my Java web server, will I see a significant gain in performance?

The original white paper (see ... 1465515663) suggests that Thrift can even cause performance losses:

We have found that the marginal performance cost in- curred by an extra layer of software abstraction is far eclipsed by the gains in developer efficiency and systems reliability.

So it appears that the aim of Thrift is not to provide performance gains, but rather a global interface to facility cross-language communication?

Re: What do you use Apache Thrift for?

Posted: Fri Jun 10, 2016 6:43 pm UTC
by Tub
Thrift is indeed for communication only. We evaluated it, but decided against it for three reasons:
* their main feature is cross-language-portability, which we don't need
* we need guarantees of zero-copy-transmission, because some of our buffers fill more than half the available memory. If they have such guarantees, they're undocumented.
* it does not seem to have a mechanism for executing code before transmitting a buffer, e.g. to acquire a read lock or something. We need that.

Hence, I use apache thrift for nothing. But if it fits your IPC requirements, it's doing a good job of generating all that boilerplate-code out of very small and readable configuration files.

Whether or not homebrew distribution + thrift will beat the one-size-fits-all package from hadoop depends on your specific problem. Hadoop was made to scale up, and it does. Scaling down.. eh.. no. We have parallel problems that take weeks, and we have parallel problems that take seconds. The latter set of problems really should not use a framework that takes a minute just to initialize itself. Hence, we don't use hadoop either.

Don't worry too much about thrift's overhead though, in most cases it should be minimal, and it's probably still faster than 90% of the handwritten IPC code out there. IPC is really not trivial. Besides, the overhead you're incurring by using hadoop is a lot larger than the overhead from thrift, and you seem to be fine with that.