I have been doing some research on Entropy coding and discovered some interesting relationships between Arithmetic Coding and Huffman Coding. I made an article and am presenting it here for review:

h t t p (colon) (slash) (slash) siara.cc (slash) arithmetic_coding_new_approach

Sorry about the above URL. It does not allow me to post if I include the regular URL. Please copy paste to browser. My article is too lengthy to paste here.

Please let me know your views.

Regards

Arun

## A formula based approach to Arithmetic Coding

**Moderators:** phlip, Moderators General, Prelates

### A formula based approach to Arithmetic Coding

Last edited by siara on Sat Sep 12, 2015 3:44 am UTC, edited 1 time in total.

### Re: A formula based approach to Arithmetic Coding

The URL is :

h t t p (colon) (slash) (slash) siara.cc (slash) arithmetic_coding_new_approach

Arun

h t t p (colon) (slash) (slash) siara.cc (slash) arithmetic_coding_new_approach

Arun

### Re: A formula based approach to Arithmetic Coding

http://siara.cc/arithmetic_coding_new_approach/

Arithmetic coding is easy if you're doing maths, but a nightmare if you're doing computer science and can't just assume arbitrary precision.

Your approach seems mathematically sound. You could independently calculate each symbol's contribution to the final value and then add everything up. But that requires you to do all calculations with at least as many bits of precision as the shannon entropy of your plaintext. Which is possible with "Hello world" (a 64-byte float should be enough for <32 bits of compressed data), but will get you into a lot of trouble once you try to apply the principle to any reasonably large amount of data. Requiring this precision means that your runtime complexity goes from O(n) to O(n^2) or worse, and no amount of parallelization can save you from that.

Arithmetic coding is easy if you're doing maths, but a nightmare if you're doing computer science and can't just assume arbitrary precision.

Your approach seems mathematically sound. You could independently calculate each symbol's contribution to the final value and then add everything up. But that requires you to do all calculations with at least as many bits of precision as the shannon entropy of your plaintext. Which is possible with "Hello world" (a 64-byte float should be enough for <32 bits of compressed data), but will get you into a lot of trouble once you try to apply the principle to any reasonably large amount of data. Requiring this precision means that your runtime complexity goes from O(n) to O(n^2) or worse, and no amount of parallelization can save you from that.

### Re: A formula based approach to Arithmetic Coding

Tub wrote:Arithmetic coding is easy...

I agree with you on all points. You seem to have gone through it fully. Thanks. Particularly I liked the statement "no amount of parallelization can save you from that.".

I didn't intend it for practical implementation. The formula could be useful for further research, thats why I made the article. I am also hoping that someday I would be able to find a way to get over the precision problem.

I implemented it on C++. As you said it loses precision after 64 bits and can't decode any further. But thats a problem with any approach. Thats why they are talking of reduced precision, range encoding etc.

By the way, how were you able to post URL in the reply? I couldn't.

Thanks again. Nice meeting you.

### Re: A formula based approach to Arithmetic Coding

siara wrote:By the way, how were you able to post URL in the reply? I couldn't.

The answer to that question can be found in the xkcd Forum Rules, along with other useful information.

### Who is online

Users browsing this forum: No registered users and 8 guests