^音声分析合成システム ... 音声の特徴を活用し、音声をパラメータとして表現する分析法(森勢 2018, p. 53)
^aperiodicity is defined as the power ratio between the speech signal and the aperiodic component of the signal以下より引用。Morise (2016). “D4C, a band-aperiodicity estimator for high-quality speech synthesis”. Speech Communication. 84: 57–65.
^有声音のかすれの程度に相当する非周期性指標(Aperiodicity)(森勢 2018b, p. 608)
^the mean overall spectrum characteristics during ongoing speech and the singing of a song through the utilization of long-term-average spectrum (LTAS). an LTAS typically stabilizes after some 30-40 seconds of running speech or singing(Cleveland 2001, p. 54)
^it reveals the sound level, averaged over time along the frequency axis and provides a reproducible representation of overall voice spectral characteristics.(Cleveland 2001, p. 54)
^The LTAS contour reflects contributions from both the voice source and the resonance or formant characteristics of the voice.(Cleveland 2001, p. 55)
^peak typically occurs near 500 Hz, presumably because F1 is often located in this frequency range in speech and singing.(Cleveland 2001, p. 55)
^ abc音声は時間とともに特徴が変わるため、短時間分析により短時間ごとの性質と時間的な特性の変化を観測することが望ましい。(森勢 2018, p. 19)
^pitch-synchronous analysis windows ... Their lengths are ... proportional to the local pitch period,(Moulines 1990, pp. 454–455)
^Studies on unsupervised speech representation learning can roughly be divided into reconstruction and self-supervised learning methods.以下より引用。Polyak, et al. (2021). Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.