The timing of acoustic events is central to human speech and music. Tempo tends to be slower in aesthetic contexts: rates in poetic speech and music are slower than non-poetic, running speech. We tested whether a general aesthetic preference for slower rates can account for this, using birdsong as a stimulus: It structurally resembles human sequences but is unbiased by their production or processing constraints. When listeners selected the birdsong playback tempo that was most pleasing, they showed no bias towards any range of note rates. However, upon hearing a novel stimulus, listeners rapidly formed a robust, implicit memory of its temporal properties, and developed a stimulus-specific preference for the memorized tempo. Interestingly, tempo perception in birdsong stimuli was strongly determined by the individual, internal preferences for rates of 1–2Hz. This suggests that processing complex sound sequences relies on a default time window, while aesthetic appreciation appears flexible, experience-based, and not determined by absolute event rates.