The Times analyzed Los Angeles Police Department violent crime data from 2005 to 2012. The records were obtained through a California Public Records Act request.
The analysis expanded on an investigation by The Times last year that reviewed 12 months of LAPD crime data ending in fall 2013.
To conduct the new analysis, The Times used a machine-learning algorithm. The computer program pulled crime data from the previous Times review to learn key words that identified an assault as serious or minor. The algorithm then analyzed nearly eight years of data in search of classification errors.
Reporters refined the algorithms and selected a random sample of nearly 2,400 minor crimes from 2005 to 2012 to determine their accuracy. The sample was stratified by crime categories and the margin of error was plus or minus 2%.
The results were manually checked to establish what proportion of incidents were flagged correctly as misclassified crimes. The algorithms incorrectly identified classification errors in 24% of flagged incidents, the manual review found. The Times adjusted the estimated tally of misclassified crimes based on the error rate.
Nearly a third of all minor assaults — 84,000 incidents — were missing narratives during the years examined. These reports were excluded from the analysis.