The use of statistics has long been important in the human sciences. An early example is an analysis by William Sealy Gosset (alias “Student”) of biometric data obtained by Scotland Yard around 1900. The heights of 3,000 male criminals fit a bell curve almost perfectly:

Standard statistical methods allow the identification of correlations, which mark **possible** causal links:

*XKCD teaches us that “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there.’”*

Newer, more sophisticated statistical methods allow the exploration of time series and spatial data. For example, this project looks at the spatial distribution of West Nile virus (WNV) – which disease clusters are significant, and which are merely tragic coincidence:

SPSS has been the mainstay of statistical analysis in the human sciences, but many newer techniques are better supported in the free R toolkit. For example, this paper discusses detecting significant clusters of diseases using R. The New York Times has commented on R’s growing popularity, and James Holland Jones points out that R is used by the majority of academic statisticians (and hence includes the newest developments in statistics), R has good help resources, and R makes really cool graphics.

*A really cool graph in R, using the*

**ggplot2**R package (from Jeromy Anglim’s Psychology and Statistics Blog)An increasing quantity of human-science-related instructional material is available in R, including:

- The
**psych**and other R packages supporting the e-book*An introduction to psychometric theory with applications in R* - An R Companion and datasets supporting the book
*Statistics for Archaeologists* - Notes on the use of R for psychology experiments and questionnaires
- R resources at the R psychologist blog
- A list of Must-Have R Packages for Social Scientists
- Why use R? An (economics) grad student’s 2 cents
- The Edinburgh Psychology R-users group

Through the **igraph**, **sna**, and other packages (and the **statnet** suite), R also provides easy-to-use facilities for social network analysis, a topic dear to my heart. For example, the following code defines the valued centrality measure proposed in this paper:

library("igraph") valued.centrality <- function (g) { recip <- function (x) if (x == 0) 0 else 1/x f <- function (r) sum(sapply(r, recip)) / (length(r) - 1) apply (shortest.paths(g), MARGIN=1, f) }

This definition has the advantage of allowing disconnected network components, so that we can use these centrality scores to add colour to a standard plot (using the **igraph** package within R):

– Tony

Reblogged this on Science on the Land.

…and I’ll bet that any male falling in that height range is more likely to be a criminal. I hope our Homeland Security uses that to flag suspicious plane passengers.

Well, I think that height range (an average of about 3 inches below current English heights) probably reflects malnutrition due to poverty.

Reblogged this on orgcomplexity.com and commented:

Meet R- sharing fellow blog post on stats and SNA

[…] about network analysis using the igraph package of R (Part I of III and Part II of III). I’ve expressed myself elsewhere on how useful R is, and these posts do a very good job of explaining the network-related aspects of […]

[…] I have discussed the benefits of the R statistical toolkit. The image below uses R to plot some data from […]