Speaker identification under noisy conditions is one of the challenging topics in the field of speech processing applications. Motivated by the fact that the neural responses are robust against noise, this paper proposes a new speaker identification system using 2-D neurograms constructed from the responses of a physiologically-based computational model of the auditory periphery. The responses of auditory-nerve fibers for a wide range of characteristic frequency were simulated to speech signals to construct neurograms. The neurogram coefficients were trained using the well-known Gaussian mixture model-universal background model classification technique to generate an identity model for each speaker. In this study, three text-independent and one text-dependent speaker databases were employed to test the identification performance of the proposed method. Also, the robustness of the proposed method was investigated using speech signals distorted by three types of noise such as the white Gaussian, pink, and street noises with different signal-to-noise ratios. The identification results of the proposed neural-response-based method were compared to the performances of the traditional speaker identification methods using features such as the Mel-frequency cepstral coefficients, Gamma-tone frequency cepstral coefficients and frequency domain linear prediction. Although the classification accuracy achieved by the proposed method was comparable to the performance of those traditional techniques in quiet, the new feature was found to provide lower error rates of classification under noisy environments.
Various studies on medial olivocochlear (MOC) efferents have implicated it in multiple roles in the auditory system (e.g., dynamic range adaptation, masking reduction, and selective attention). This study presents a systematic simulation of inferior colliculus (IC) responses with and without electrical stimulation of the MOC. Phenomenological models of the responses of auditory nerve (AN) fibers and IC neurons were used to this end. The simulated responses were highly consistent with physiological data (replicated 3 of the 4 known rate-level responses all MOC effects-shifts, high stimulus level reduction and enhancement). Complex MOC efferent effects which were previously thought to require integration from different characteristic frequency (CF) neurons were simulated using the same frequency inhibition excitation circuitry. MOC-induced enhancing effects were found only in neurons with a CF range from 750 Hz to 2 kHz. This limited effect is indicative of the role of MOC activation on the AN responses at the stimulus offset.
A phenomenological model of the auditory periphery in cats was previously developed by Zilany and colleagues [J. Acoust. Soc. Am. 126, 2390-2412 (2009)] to examine the detailed transformation of acoustic signals into the auditory-nerve representation. In this paper, a few issues arising from the responses of the previous version have been addressed. The parameters of the synapse model have been readjusted to better simulate reported physiological discharge rates at saturation for higher characteristic frequencies [Liberman, J. Acoust. Soc. Am. 63, 442-455 (1978)]. This modification also corrects the responses of higher-characteristic frequency (CF) model fibers to low-frequency tones that were erroneously much higher than the responses of low-CF model fibers in the previous version. In addition, an analytical method has been implemented to compute the mean discharge rate and variance from the model's synapse output that takes into account the effects of absolute refractoriness.
In forest clearings of the Malaysian rainforest, chirping and trilling Mecopoda species often live in sympatry. We investigated whether a phenomenon known as stochastic resonance (SR) improved the ability of individuals to detect a low-frequent signal component typical of chirps when members of the heterospecific trilling species were simultaneously active. This phenomenon may explain the fact that the chirping species upholds entrainment to the conspecific song in the presence of the trill. Therefore, we evaluated the response probability of an ascending auditory neuron (TN-1) in individuals of the chirping Mecopoda species to triple-pulsed 2, 8 and 20 kHz signals that were broadcast 1 dB below the hearing threshold while increasing the intensity of either white noise or a typical triller song. Our results demonstrate the existence of SR over a rather broad range of signal-to-noise ratios (SNRs) of input signals when periodic 2 kHz and 20 kHz signals were presented at the same time as white noise. Using the chirp-specific 2 kHz signal as a stimulus, the maximum TN-1 response probability frequently exceeded the 50% threshold if the trill was broadcast simultaneously. Playback of an 8 kHz signal, a common frequency band component of the trill, yielded a similar result. Nevertheless, using the trill as a masker, the signal-related TN-1 spiking probability was rather variable. The variability on an individual level resulted from correlations between the phase relationship of the signal and syllables of the trill. For the first time, these results demonstrate the existence of SR in acoustically-communicating insects and suggest that the calling song of heterospecifics may facilitate the detection of a subthreshold signal component in certain situations. The results of the simulation of sound propagation in a computer model suggest a wide range of sender-receiver distances in which the triller can help to improve the detection of subthreshold signals in the chirping species.