POINT DETECTION SYSTEM
1The Architecture of the Event Detection System
The top-view architecture of the system is presented in Fig. 14.1. After video decodingthe stream analysis is split into two independent threads. The first, termed thescene thread, works on extracted video frames and audio packets. The second, thescoreboard thread needs only video frames. The common data is stored in a sharedbuffer in order to simplify the processing flow. The two threads are described inmore detail in the following sections. All their features are presented in the bottomupapproach—we start with a description of the low-level modules that gather thedata extracted by feature detectors. All these data are then sent to the mid-level moduleswhere, depending on the processing thread, it is either used to detect the text(scoreboard) region or predict what kind of scene is presented to the viewer. The topmodule gathers the events from both threads, summarizing the description of the system.The inputs (i.e., recognized score change or specific—event related sequence ofshots) to this module are treated separately which means that there are two kinds ofevent markers at the system output. In general the primary design constraint was thelatency introduced by the modules working at all levels throughout the system. Thisis reflected in the following sections, where time efficiency is the main parameterinvestigated in each module. The two threads are described in detail below.
2 Feature-Based Event Detection SystemThe scene analysis thread is responsible for detection of events that can subsequentlybe observed on the scoreboard (i.e., goals) but also situations like close-misses orany other actions that can be classified as interesting from the viewer’s point of view.It belongs to the class of the algorithms that analyze the sequence of shots presented.Earlier works on event detection state that 97% of the interesting moments arefollowed by close-up shots (i.e., shots that present a player) . If we add to this theinformation about where the action has taken place (e.g., end of pitch/court—closeto the goal-post/basket) before the close-up shot occurred, we can more preciselydistinguish between usual and unusual events.
The scene analysis thread utilizes extracted video frames and audio packets inorder to detect the events. Most of the information needed for this task is stored invisual data. For this reason we have chosen 14 different classes that represent mostof the situations presented on the field/court during a match. In order to evaluate thegeneralization properties of the algorithm some of the classes were split into threesubclasses: shot with simple background (a dominant color easily distinguishable),shot with complex background (no dominant color in the background) and a mixtureof the two—Table
.However, we have also included simple audio analysis for better robustness of thealgorithm. The reason for this was that we have observed how usually the interestingmoment in the game is followed with increased audio activity intervals (e.g., roundof applause or excitation of the commentator). From a real-time perspective audiotrack analysis does not introduce significant overhead in processing time since thedecoder does audio extraction in parallel to the analysis of the video stream andmoreover, we are only calculating a temporal energy of the audio signal which is arelatively simple operation