## How often do you use statistics?

### How often do you use statistics?

Given the diversity of fields on this board, I'm quite curious to see to what degree you or your colleagues use statistical analytical methods for your research. They're pretty well essential in ecology and wildlife biology, and are used to some degree in most non-theoretic quantitative studies.

Personally, not so often as I'm not taking any experimental physics at the moment and instead get to play with theoretical quantum stuff. On the other hand, the basic ideas of probability are part of statistics, so I might be lieing and in fact use it regularly.

In the realm of physics, theres plenty of less trivial statistics in the world of statistical mechanics (which spawns thermodynamics, but you can do thermo without knowledge of stats), and the experimentalists are hopefully always using it to give suitable error bars. Pretty much every science needs to collect experimental data, and so they'd all use it plenty in that regard for any real research. The only ones who might be able to get away from it entirely is perhaps pure mathematicians, but then they're the sort to potentially be deriving the statistical methods in the first place.

I'm in a theoretical biology / nano-particle simulation stuff / computational group, so I do end up using a bit of statistics. However, just about all of it is pretty basic - just enough to compute error bars, averages, spreads, et cetera.
I use statistics all the time for my error bars in my experiments (I do most of my stuff in nuclear physics), as well as modelling & calculations of physical systems (e.g. can't predict what one radium nucleus will do, but a mol behaves pretty predictably).
I'm a molecular biologist and use stats fairly often. Mostly simple stuff like t-tests, chi square, some basic anova.
They use statistics quite a lot in the papers I read which are all to do with medical science fields (immunology, physiology, etc).
Biostatistician/epidemiologist grad student here, so... it's about 90% of what I do.
I have a degree in applied mathematics and I work at a national statistical organisation. I use stats ... surprisingly rarely.
Tuesday and Friday for one hour each, seeing as I'm doing maths at A level, at least the teacher's hot.
miedvied wrote:Biostatistician/epidemiologist grad student here, so... it's about 90% of what I do.

Really? I'm a full time statistical consulting and 75% of my time is spent in perl and SQL trying to herd data into a useable form so that I can do statistics in the remaining 75% of my time.
Every hour of every day?

...I do econometrics and statistical programming. I may be an outlier. KF
That's cause you're out of school. I still get my data in neat .csv's provided by professors

BTW, wtf is up with data messiness? I mean, really, why the hell can't people ever put things in in something resembling a clean format? I consulted at an academic MD's office and their access DB looked like cocaine-fueled apes threw feces at the keyboard.
miedvied wrote:BTW, wtf is up with data messiness? I mean, really, why the hell can't people ever put things in in something resembling a clean format? I consulted at an academic MD's office and their access DB looked like cocaine-fueled apes threw feces at the keyboard.

The basic answer is that the data you receive is never really designed for you.

On one end, you have data designed for accounting, tracking, or other purposes. For instance, data at a doctor's office might be designed only for the purpose of looking up accounts and forwarding information to insurance companies. Things like missing values, inconsistent coding methods, etc., might not matter for those purposes. In my experience, data that we get is usually for basic recording purposes, and we have to figure a way to clean and format it to run several different tests (and you never know which ones pan out until you get to the end, meaning a lot of dead-ends in the meantime).

On the other end, even data designed for statisticians is going to involve a lot of cleaning (e.g., data provided by the US Census Bureau, BLS, CDC, etc.). This is because the data has to be designed so that anyone can use it, regardless of the software, operating system, encoding method (like ASCII, EBCDIC, Unicode), etc. Some places are pretty experienced and provide, say, the raw data plus programs to encode it in a dozen different ways, but usually all you'll have is the codebook and data.

The kicker is that the more useful your analysis, the more work is gonna be involved in reshaping the data. If the data is already nice and clean, chances are it's because there are a ton of people who've already done all the analyses possible and squeezed a bunch of info out of the data (which is what you usually deal with while you're a student). If it's some new problem that people haven't tackled before, or if it's all-new data that not many people have checked out yet, then you're more likely to be on your own.

I use statistics every D days, where D is a normally-distributed variable with mean 12 and standard deviation 4, and the lower tail (D < 0) omitted.

A biochemistry/biophysics masters student, depending on how you define it, I use statistics quite often, or never at all.

As in, all my experiments need error bars, and all work I do is considered alongside the errors given - but I practically never touch it myself, it's an automated function in the programs I use to analyze my data, so I never have to actually think about it, it just calculates the data value for each peak and a nice corresponding error (which occasionally is actually smaller than the data value itself!)

miedvied - raw data is messy, learn python if you want to do it the easy way - the hard way being pasting columns from one .txt file to another for all eternity in the wonderful "a monkey could do this!" style.
Ulc wrote:miedvied - raw data is messy, learn python if you want to do it the easy way - the hard way being pasting columns from one .txt file to another for all eternity in the wonderful "a monkey could do this!" style.

If your raw data is in a table-like structure with the values you are interested in, it is not messy.

The probability that I will use statistics on day X depends on the usage of statistics on all previous days, with decreasing influence for large time differences.
Ulc wrote:miedvied - raw data is messy, learn python if you want to do it the easy way - the hard way being pasting columns from one .txt file to another for all eternity in the wonderful "a monkey could do this!" style.

If you've a table-like structure, then it's hardly messy; I've found that AWK is pretty good at this sort of text-file in columns thing, and fairly easy to set up and use.
I do reliability calculations for electronic components and model hardware systems in space environments. I swim in statistics but typically only rely on the same few tricks over and over and over again.

Also used a ton of statistics for thesis work. Another tid bit: I've also found it easier to think of quantum mechanics as a "real world" example of statistics. The electron really is in both places at the same time. But you still only have a %50 chance of finding it in one place or the other. Don't fight it. Feel it.
raike wrote:
Ulc wrote:miedvied - raw data is messy, learn python if you want to do it the easy way - the hard way being pasting columns from one .txt file to another for all eternity in the wonderful "a monkey could do this!" style.

If you've a table-like structure, then it's hardly messy; I've found that AWK is pretty good at this sort of text-file in columns thing, and fairly easy to set up and use.

It can easily be, when the dataset becomes large enough.
All the time in my every day life, I look at what information I'm given and I choose an outcome on an system I try to ballance to get myself the things most positive for my life
I'm an Epidemiology graduate student. Essentially every day of my life involves statistics in one form or another, to the point that my home page is the StackExchange statistics site.

Range from very simple summary statistics to survival analysis and some techniques to describe the results of large stochastic simulations.

My background is in experimental nanotechnology and nanoscience, and as far as statistical analysis goes, my software gives me error bars, and I don't have to worry about chi-squared tests and all that stuff (honestly, if I needed a chi-squared test to determine the validity of my data, I have done a shit experiment). But, of course, quantum mechanics and statistical thermodynamics are all based on probability, and my research exploits a lot of quantum mechanical features of low-dimensional materials (van Hove singularities, for example), so in that sense, I use lots of statistics (just not directly).
I'm currently taking 2 upper division economics courses, for which stats is a requirement, and we haven't and apparently aren't going to use much beyond some very basic stuff.

I can only assume it gets more complex later, but I'm wondering when and where exactly. I know there are lot of complex economic models, but their accuracy is about on par with average weather reporting. As in unless I'm going to help program the exascale supercomputer that can actually make accurate predictions I'm not sure when truly advanced statistics will actually come in handy for practical purposes.

A lot of economics seems to be more arguing about the theory than actually applying it. But I suppose if the theory were particularly accurate the world would be doing a lot better in terms of efficient material output.
A large deal of the papers I've read involving security and risk analysis have a great deal of statistics. This has lead to, on several occasions, having to read through large reports and pulling out the important information and statistics and quickly summarizing it. Is it weird that I enjoyed this process? In presentations I have also been required to find and show relevant statistics, which I also enjoyed (although I was also angered as it appeared that many others didn't do the work and still got decent grades on it).
It's not weird ... statistics are fun!
Second year biochemistry grad student here. And I'm becoming increasingly more aware of a lack of statistics understanding, in myself, my fellow grad students and even seniors. It's just not something that is taught/practiced in the master programs around here. Since 2008 however, there's a mandatory course equivalent of a 1/4th semester in statistics, which apparently is despised by many. Ideally I would say that statistics is best used and taught in context of ordinary courses, so that the techniques and way of thinking is gradually introduced over a large period of time.

Anyway, to answer your question. I use statistics maybe 10-25% of my time, depending on how you define "use".
