Abstract
Evidence from infant studies indicates that language learning can be facilitated by multimodal cues. We extended this observation to adult language learning by studying the effects of simultaneous visual cues (nonassociated object images) on speech segmentation performance. Our results indicate that segmentation of new words from a continuous speech stream is facilitated by simultaneous visual input that it is presented at or near syllables that exhibit the low transitional probability indicative of word boundaries. This indicates that temporal audio-visual contiguity helps in directing attention to word boundaries at the earliest stages of language learning. Off-boundary or arrhythmic picture sequences did not affect segmentation performance, suggesting that the language learning system can effectively disregard noninformative visual information. Detection of temporal contiguity between multimodal stimuli may be useful in both infants and second-language learners not only for facilitating speech segmentation, but also for detecting word–object relationships in natural environments.