Saturday, March 30, 2013

Extracting Radio Format Information from Wikipedia

I have two radio station apps in the App Store, AM Towers USA which is free, and FM Towers USA which is not. They try to map all the active radio stations in the United States. The information comes from two places, the FCC and Wikipedia.
If my apps are missing a radio station, it's likely because I couldn't find a format for them on Wikipedia. Assuming the station has a Wikipedia page, which the vast majority do, then either the format is missing from the page's radio infobox or it's badly formatted. As a public service, I've reformatted dozens of these boxes like this one for a small station in Arkansas.
{{Infobox radio station
| name                 = KDQN
| image                = 
| city                 = [[De Queen, Arkansas|De Queen]], [[Arkansas]]
| area                 = 
| slogan               =
| branding             = 
| frequency            = 1390 [[kHz]]
| repeater             = 
| airdate              = 
| language             = Spanish
| power                = 500 [[watt]]s day
| class                = D
| facility_id          = 30600
| coordinates          = {{coord|34|1|57|N|94|19|43|W|region:US-AR_type:landmark|display=inline,title}}
| callsign_meaning     = 
| former_callsigns     = 
| owner                = Jay W. Bunyard & Anne W. Bunyard
| licensee             = 
| sister_stations      = 
| webcast              = 
| website              = 
| affiliations         = 
}}

It's missing a format field, so AM Towers didn't pick it up, but the text of the article says it broadcasts a Spanish Music format. Now, I don't know what specific kind of Spanish music this station broadcasts, so I can't in good faith edit this entry, but let's assume it's Spanish Contemporary. I'd insert a field, like so:
| format = [[Spanish Contemporary]]
The [[ ]] brackets are important as they will cause a link to be shown to the Wikipedia page on that radio format, or in this case to the page on the various regional styles of Mexican music.
But what if the station broadcasts in another format, like Spanish Oldies, well then I'd insert:
| format = [[Spanish Contemporary]]/[[Spanish Oldies]]

There isn't any consistency as to what people use to separate a list of formats. Some use semicolons, others slashes, or spaces or HTML breaks. I wish they were consistent, it'd make my job easier.
Then I make sure that all the [ are balanced by ], and that all the { are balanced by }. A stray } can ruin an otherwise parseable infobox.
 
Google+