A Script for Combining the Outputs of Keyword Spotting Systems

The Keyword Spotting (KWS) task seems to be quite popular lately. For some work I am doing, I have had the need to be able to quickly merge the outputs of multiple systems (and sometimes the same system). I have written a simple python script to take care of this.

KWS is the task of detecting specific words in audio. The typical output of a KWS system is an XML file (at least in the case of NIST-style evaluations). The basic format is

<kwslist system_id=“ID1" language=“Unknown" kwlist_filename=“hitlist.xml”>
 <detected_kwlist kwid=“KW001" search_time="1" oov_count="0">
    <kw tbeg="195.23" dur="0.27" file=“file1 score="0.482474" channel="1" decision="NO" />
    <kw tbeg="314.55" dur="0.83" file=“file2" score="0.470213" channel="1" decision="NO" />
  </detected_kwlist>
</kwslit>

My script takes a set of these XML files and combines them in one of several ways; run with the -h option to get the full help. I refer to each of these files as hitlists and each of the individual entries as hits.

If we can assume that the XML files contain no overlapping entries, then we can use the fast (-m ‘fast’) option. When using this option, the merging is very fast because it does not have to search over the previous entries when adding a new one. Often I perform KWS by using a separate process for each audio file. I then merge them using this script.

In the case of overlapping hits, a decision must be made on how to combine the probabilities. The script gives five options: min, max, mean, gmean, and rank. Min, max, and mean are obvious. Gmean is similar, but uses the geometric mean instead of arithmetic mean.

The final option is rank. Rank simply considers the XML files in order. If there are overlapping entries, it takes the probability of the entry seen first.

There is one other option I wanted to mention, ‘–unscored’. When this option is given, any XML files that do not have an overlapping entry for a detection seen in another file are treated as having a probability of 0.

To be honest, I only use ‘–merge fast’ when merging files from the same system, and ‘—merge mean’ when merging files from separate systems. I have also never had a use for the ‘—unscored’ option. In all of my experiments, ‘–merge mean’ gives the best results out of all the merging options. However, others may find a use for the other options.

Once again, the script is called merge_hitlists.py.

Advertisements
This entry was posted in Code, Research and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s