## Is this a known probability distribution?

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

Ingolifs
Posts: 195
Joined: Thu Mar 19, 2009 6:35 am UTC
Location: Victoria university, New Zealand

### Is this a known probability distribution?

Originally posted on stackexchange, but with no answers or even comments, I'm posting it here:

For work, I plotted a set of shops by how many times they did business in the past week. The plot I got out looked something like this:

distribution.png (7.92 KiB) Viewed 3929 times

This looked obviously like a Pareto distribution. After all, that's the distribution of earthquakes and rich people. It would make sense in this sort of context. However, when plotting the log-log of this graph, instead of a straight line, I get this:

distributionlog.png (9.13 KiB) Viewed 3929 times

After further play with the data, I found that this curve could be straightened if I took a root of the log of the y value (in this case, the 3.9th root), and from that I derived a general formula for this and similar distributions I had seen:

p=exp(k√(-a log(x)+b))

where k,a,b are all positive

I've done various searches and have looked through lists of named distributions, and haven't come across any distributions that resemble this one. Is this a known distribution?
I belong to the tautologist's school of thought, that science is by definition, science.

DaBigCheez
Posts: 835
Joined: Tue Jan 04, 2011 8:03 am UTC

### Re: Is this a known probability distribution?

What are X and Y in these graphs? Presumably one of them is business-transactions-per-week, but what's the other axis?

This may be way off the mark, but what it reminds me of most strongly is when I've accidentally taken data intended to be used as a scatter plot, sorted it, and plotted it based on its index in the sorted list - which, in my applications, tended to produce very pretty-looking and totally meaningless graphs. Would the data perhaps be more usefully examined as a histogram or box-and-whisker plot or the like, if "transactions per week" is in fact the only 'real' variable, or is there another variable you didn't mention that has the extremely tight correlation?
existential_elevator wrote:It's like a jigsaw puzzle of Hitler pissing on Mother Theresa. No individual piece is offensive, but together...

If you think hot women have it easy because everyone wants to have sex at them, you're both wrong and also the reason you're wrong.

Ingolifs
Posts: 195
Joined: Thu Mar 19, 2009 6:35 am UTC
Location: Victoria university, New Zealand

### Re: Is this a known probability distribution?

What are X and Y in these graphs? Presumably one of them is business-transactions-per-week, but what's the other axis?

Y is transactions per week, and X is Order, or Count, or whatever you want to call it. The graphs are recreations of the data, because I'm at home sick and the data is not to leave the workplace in any case.
So yes, this is a real effect and not something I accidentally mashed together. The actual data has a few more lumps in it, but I managed a correlation of 0.996 for my best fit, so there isn't any significant deviation from the graphs I presented.
I belong to the tautologist's school of thought, that science is by definition, science.

DeGuerre
Posts: 48
Joined: Mon Feb 04, 2008 6:41 am UTC

### Re: Is this a known probability distribution?

Have you tried fitting a log-normal distribution? Without knowing anything, my first hypothesis would be that your data follows Gibrat's Law.

Ingolifs
Posts: 195
Joined: Thu Mar 19, 2009 6:35 am UTC
Location: Victoria university, New Zealand

### Re: Is this a known probability distribution?

Yes, I looked at that. On a Log-Log plot, a lognormal distribution will show up as a parabola. This data shows up as an Nth root in the Log-Log plot.
I belong to the tautologist's school of thought, that science is by definition, science.

SuicideJunkie
Posts: 354
Joined: Sun Feb 22, 2015 2:40 pm UTC

### Re: Is this a known probability distribution?

DaBigCheez wrote:This may be way off the mark, but what it reminds me of most strongly is when I've accidentally taken data intended to be used as a scatter plot, sorted it, and plotted it based on its index in the sorted list - which, in my applications, tended to produce very pretty-looking and totally meaningless graphs.
I've done that kind of plot intentionally a few times.
I find it is good for deciding on cutoff points - you can visually see step changes in your data, and then quickly pick a point on the near-vertical portion of the step to be the cutoff between two groups (eg plot fault rate, and there tends to be a fuzzy step change between 'good' and 'broken' units with only a few marginal ones on the step itself).

Derek
Posts: 2179
Joined: Wed Aug 18, 2010 4:15 am UTC

### Re: Is this a known probability distribution?

Ingolifs wrote:Yes, I looked at that. On a Log-Log plot, a lognormal distribution will show up as a parabola. This data shows up as an Nth root in the Log-Log plot.

If you're looking for a log-normal distribution, you wouldn't apply the log-log transform to this graph of sales-versus-orders. You would apply it to a density curve (x-axis is sales, y-axis is number of stores with that many sales). I'm not sure the best way to get a density curve from a set of discrete samples though, but if you want to check the log-normal hypothesis I guess the thing to do would be to take the log of all the sales numbers, find the mean and standard deviation of that to get your normal distribution, then somehow measure the accuracy of this distribution against the actual data.