Do you miss the wacky misspellings once endemic to written communication in the days before computers caught our mistakes? Take heart. One of L.A.'s most prolific class of scribblers still misspells with abandon.
We're talking about traffic cops.
The chances are better than 1 in 5 that the traffic citations they write will contain a spelling error in what is often a key piece of evidence: the address where the violation occurred.
A painstaking Los Angeles Times analysis of 75,000 computerized traffic citations has found the street name mangled beyond all but the most hopeful inference about 20% of the time.
Los Angeles Police Department cops have come up with 49 ways to spell Cahuenga: Cahaunga, Caheinga, Cahenga, Caheunga, Cahienga, Cahlenga, Cahoengia, Cahu, Cahubnga and Cahue.
Of course it's possible that Cahu and Cahue may actually be failed attempts to spell some other street. But how would we know?
In addition, the record contains 34 variants of Figueroa, 28 of Sepulveda and 15 of Lankershim. Our favorite is Lanrumshim, but it's really hard to say with authority whether that's better than Lanicershim, Lankershire or the tongue-twister Lankserhim.
The street with the most spelling variations is one of the shortest streets in the city: Chick Hearn Court outside Staples Center, with 56 spellings, from Chick Heave to Chick Horn.
Some of the lists of street misspellings read with the lilting rhythm of a Latin conjugation: "Colf, Colfa, Colfar, Colfax, Colfay, Colfayan, Colfx." Was there even a pluperfect subjunctive in there?
Others are simply impenetrable -- Gacameda, Gaickgtt, Grammferly -- or outright funny: Feswando (San), Figubroa, Flusthtute.
We're not so cynical that we'd spend our time sifting through this mayhem just to get a laugh. Our purpose was to determine whether traffic citations or other civic inputs such as crossing lights have any effect in reducing pedestrian deaths.
We know when and where the accidents occur. That information comes from the California Highway Patrol's massive vehicle accident database commonly known as SWITRS.
We thought it would be a good idea to see if accidents are less likely in places where the LAPD bears down on drivers who violate pedestrian right-of-way and pedestrians who jaywalk.
To do that, we needed to convert the accident locations and the citation locations into points on an electronic map, a process called geocoding.
Our goal was to build a statistical model that included demographic data from the census, pedestrian counts from the Los Angeles Department of Transportation, accident records and records of street and crosswalk improvements.
But instead of sailing on the wings of regression analysis, our hapless interns found themselves wading through the swamp of Hanritrost, Howtharn and Hillhuest.
They've applied a number of tools to the job.
For starters, they wrote a lot of code that roughly follows this form: "if Street_Name = 'Jeffeiso, Jeffer, Jeffercon, Jeffereson, Jefferon, Jeffers, Jefferseon' then Street_Name really = "Jefferson.'"
Then they moved on to a program called Google Refine. It does a lot of the work by identifying possible matches and allowing the user to simply choose yes or no.
This process, called data cleaning, is the assiduous work that data journalists know is a prelude to any successful investigative project. As we say, "All data is dirty."
But in 15 years of examining the data of public institutions, The Times' data team had never seen such unrelenting inventiveness permeating such a large data set, from A to Zalzah, Zelezah, Zelza, Zelzah, Zelzam, Zelzan, Zelzoh and Zlzah.
So how do these spelling atrocities happen?
Like a doctor writing a prescription, the traffic cop scrawls the street name on a ticket pad, often fighting off the distractions of wind, rain or the rant of an unhappy citizen. Later, a copy is sent to the Los Angeles Superior Court. It hands the copies off to a contractor -- Boeing subsidiary CDG -- that turns them into data.
CDG's marketing director, who fielded The Times' call, wasn't sure how the process worked but said it could include both human labor -- a roomful of keypunchers -- and a form of scanning called optical character recognition, in which a clever computer does its best to read the scrawl of LAPD cops.
Theoretically, a cop could have accurately, even if illegibly, written Chick Hearn, only to have a computer change it to Chtck Hiarn. So it's possible the physical ticket could still back up the charge.
But sadly, whatever the explanation, we've ruled the citation information inadmissible in our analysis of pedestrian deaths. The only thing we can say with 90% certainty about data like this is "Argh!"
Someday, no doubt, traffic cops will use devices like those on UPS delivery trucks, which will automatically relay their location, the offense, the identify of the violator and the time of day to headquarters, where it probably will be used to keep track of how the officer is doing his job.
But the data could also end up in a UCLA computer, where it could be used to learn what it takes to safeguard the growing ranks of L.A.'s walking public.
Until then, it's a shame that a decade after former Chief William Bratton brought computer crime-busting to Los Angeles, the LAPD still has no way to know whether its traffic citations are doing any good.
Photo: Los Angeles Police officers are seen at the corner of Hollywood and Cahuenga Boulevards. Credit: Los Angeles TimesJennifer Chuu and Joanne Lo graduated this spring from the UCLA school of statistics. Doug Smith is The Times' database editor.
- CHICK HARD
- CHICK HEDRN
- CHTCK HIARN
- CHICK HEDR
- CHICK HEATON
- CHICK HEAVEN
- CHICK MEARN
- CHICK HEARRY
- CHICK HEALTH
- CHECK HERNA
- CHICK HEAANS
- CHICK HENRY
- LANKER SHIM
- LAN KERSHIM