We present new tools for categorizing chords based on corpus data, applicable to a variety of representations from Roman numerals to MIDI notes. Using methods from information theory, we propose that harmonic theories should be evaluated by at least two criteria, accuracy (how well the theory describes the musical surface) and complexity (the efficiency of the theory according to Occam’s razor). We use our methods to consider a range of approaches in music theory, including function theory, root functionality, and the figured-bass tradition. Using new corpus data as well as eleven datasets from five published works, we argue that our framework produces results consistent both with musical intuition and previous work, primarily by recovering the tonic/subdominant/dominant categorization central to traditional music theory. By showing that functional harmony can be analysed as a clustering problem, we link machine learning, information theory, corpus analysis and music theory.