The use of statistics has long been important in the human sciences. An early example is an analysis by William Sealy Gosset (alias “Student”) of biometric data obtained by Scotland Yard around 1900. The heights of 3,000 male criminals fit a bell curve almost perfectly:
Standard statistical methods allow the identification of correlations, which mark possible causal links:

XKCD teaches us that “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there.’”
Newer, more sophisticated statistical methods allow the exploration of time series and spatial data. For example, this project looks at the spatial distribution of West Nile virus (WNV) – which disease clusters are significant, and which are merely tragic coincidence:
SPSS has been the mainstay of statistical analysis in the human sciences, but many newer techniques are better supported in the free R toolkit. For example, this paper discusses detecting significant clusters of diseases using R. The New York Times has commented on R’s growing popularity, and James Holland Jones points out that R is used by the majority of academic statisticians (and hence includes the newest developments in statistics), R has good help resources, and R makes really cool graphics.

A really cool graph in R, using the ggplot2 R package (from Jeromy Anglim’s Psychology and Statistics Blog)
An increasing quantity of human-science-related instructional material is available in R, including:
- The psych and other R packages supporting the e-book An introduction to psychometric theory with applications in R
- An R Companion and datasets supporting the book Statistics for Archaeologists
- Notes on the use of R for psychology experiments and questionnaires
- R resources at the R psychologist blog
- A list of Must-Have R Packages for Social Scientists
- Why use R? An (economics) grad student’s 2 cents
- The Edinburgh Psychology R-users group
Through the igraph, sna, and other packages (and the statnet suite), R also provides easy-to-use facilities for social network analysis, a topic dear to my heart. For example, the following code defines the valued centrality measure proposed in this paper:
library("igraph") valued.centrality <- function (g) { recip <- function (x) if (x == 0) 0 else 1/x f <- function (r) sum(sapply(r, recip)) / (length(r) - 1) apply (shortest.paths(g), MARGIN=1, f) }
This definition has the advantage of allowing disconnected network components, so that we can use these centrality scores to add colour to a standard plot (using the igraph package within R):
– Tony