Is this a known probability distribution?

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

User avatar
Ingolifs
Posts: 190
Joined: Thu Mar 19, 2009 6:35 am UTC
Location: Victoria university, New Zealand

Is this a known probability distribution?

Postby Ingolifs » Thu Aug 10, 2017 5:38 am UTC

Originally posted on stackexchange, but with no answers or even comments, I'm posting it here:

For work, I plotted a set of shops by how many times they did business in the past week. The plot I got out looked something like this:

distribution.png
distribution.png (7.92 KiB) Viewed 1289 times


This looked obviously like a Pareto distribution. After all, that's the distribution of earthquakes and rich people. It would make sense in this sort of context. However, when plotting the log-log of this graph, instead of a straight line, I get this:

distributionlog.png
distributionlog.png (9.13 KiB) Viewed 1289 times


After further play with the data, I found that this curve could be straightened if I took a root of the log of the y value (in this case, the 3.9th root), and from that I derived a general formula for this and similar distributions I had seen:

p=exp(k√(-a log(x)+b))

where k,a,b are all positive

I've done various searches and have looked through lists of named distributions, and haven't come across any distributions that resemble this one. Is this a known distribution?
I belong to the tautologist's school of thought, that science is by definition, science.

User avatar
DaBigCheez
Posts: 809
Joined: Tue Jan 04, 2011 8:03 am UTC

Re: Is this a known probability distribution?

Postby DaBigCheez » Fri Aug 11, 2017 12:24 am UTC

What are X and Y in these graphs? Presumably one of them is business-transactions-per-week, but what's the other axis?

This may be way off the mark, but what it reminds me of most strongly is when I've accidentally taken data intended to be used as a scatter plot, sorted it, and plotted it based on its index in the sorted list - which, in my applications, tended to produce very pretty-looking and totally meaningless graphs. Would the data perhaps be more usefully examined as a histogram or box-and-whisker plot or the like, if "transactions per week" is in fact the only 'real' variable, or is there another variable you didn't mention that has the extremely tight correlation?
existential_elevator wrote:It's like a jigsaw puzzle of Hitler pissing on Mother Theresa. No individual piece is offensive, but together...

If you think hot women have it easy because everyone wants to have sex at them, you're both wrong and also the reason you're wrong.

User avatar
Ingolifs
Posts: 190
Joined: Thu Mar 19, 2009 6:35 am UTC
Location: Victoria university, New Zealand

Re: Is this a known probability distribution?

Postby Ingolifs » Fri Aug 11, 2017 2:25 am UTC

What are X and Y in these graphs? Presumably one of them is business-transactions-per-week, but what's the other axis?

Y is transactions per week, and X is Order, or Count, or whatever you want to call it. The graphs are recreations of the data, because I'm at home sick and the data is not to leave the workplace in any case.
So yes, this is a real effect and not something I accidentally mashed together. The actual data has a few more lumps in it, but I managed a correlation of 0.996 for my best fit, so there isn't any significant deviation from the graphs I presented.
I belong to the tautologist's school of thought, that science is by definition, science.

DeGuerre
Posts: 46
Joined: Mon Feb 04, 2008 6:41 am UTC

Re: Is this a known probability distribution?

Postby DeGuerre » Tue Aug 15, 2017 2:46 am UTC

Have you tried fitting a log-normal distribution? Without knowing anything, my first hypothesis would be that your data follows Gibrat's Law.

User avatar
Ingolifs
Posts: 190
Joined: Thu Mar 19, 2009 6:35 am UTC
Location: Victoria university, New Zealand

Re: Is this a known probability distribution?

Postby Ingolifs » Wed Aug 16, 2017 9:39 am UTC

Yes, I looked at that. On a Log-Log plot, a lognormal distribution will show up as a parabola. This data shows up as an Nth root in the Log-Log plot.
I belong to the tautologist's school of thought, that science is by definition, science.

SuicideJunkie
Posts: 143
Joined: Sun Feb 22, 2015 2:40 pm UTC

Re: Is this a known probability distribution?

Postby SuicideJunkie » Wed Aug 16, 2017 2:36 pm UTC

DaBigCheez wrote:This may be way off the mark, but what it reminds me of most strongly is when I've accidentally taken data intended to be used as a scatter plot, sorted it, and plotted it based on its index in the sorted list - which, in my applications, tended to produce very pretty-looking and totally meaningless graphs.
I've done that kind of plot intentionally a few times.
I find it is good for deciding on cutoff points - you can visually see step changes in your data, and then quickly pick a point on the near-vertical portion of the step to be the cutoff between two groups (eg plot fault rate, and there tends to be a fuzzy step change between 'good' and 'broken' units with only a few marginal ones on the step itself).

Derek
Posts: 2136
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: Is this a known probability distribution?

Postby Derek » Thu Aug 17, 2017 5:36 am UTC

Ingolifs wrote:Yes, I looked at that. On a Log-Log plot, a lognormal distribution will show up as a parabola. This data shows up as an Nth root in the Log-Log plot.

If you're looking for a log-normal distribution, you wouldn't apply the log-log transform to this graph of sales-versus-orders. You would apply it to a density curve (x-axis is sales, y-axis is number of stores with that many sales). I'm not sure the best way to get a density curve from a set of discrete samples though, but if you want to check the log-normal hypothesis I guess the thing to do would be to take the log of all the sales numbers, find the mean and standard deviation of that to get your normal distribution, then somehow measure the accuracy of this distribution against the actual data.


Return to “Mathematics”

Who is online

Users browsing this forum: biorriniure, Google [Bot], Google Feedfetcher and 20 guests