MT

Supervisor

RNDr. Marek Nagy, PhD.

Anotation

Detektor hlasovej aktivity má mnohostranné využitie. Vo zvukovom signále identifikuje pozície nahrávky hlasu a neželaného šumu (ticha). Pomôže to napríklad redukovať dátové prenosy pri audiokonferenčných aplikáciách. Prínos je aj pre rozpoznávač reči, ktorý dostane na rozpoznanie menšie úseky záznamu, čím sa zmenší jeho chybovosť.

Goal

Vytvoriť algoritmus detekcie hlasu v zvukovej nahrávke v aplikácii Octave(Matlab). Treba brať zreteľ, že sa VAD bude využívať v reálnom čase. Následne tento algoritmus efektívne prepísať do Javascriptu, aby sa dal použiť ako modul do webovej aplikácie s audiokonferenčnou možnosťou.

Main Chapters

Introduction
Problem statement and its existingsolutions
Technical details
Formant-Based Robust Voice Activity Detection
Noise Spectrum Estimation inAdverse Environments
Proposal, Implementation, Results
Conclusion

Time schedule

October 2020 : Digital Signal Processing

November 2020 : Prepare and study resources

December 2020 : Implement the first resource

January 2021 : Research datasets

February 2021 : Implement mixing of recording

March 2021 : Run experiments

April 2021 : Implement results evaluation

May 2021 : Evaluate experiments

10.5.2021 : Create website and presentation

November 2021 : Implement second resource

November 2021 : Run experiments

December 2021 : Prepare for presentation

January 2022 : Evaluate experiments

February 2022 : Propose, choose a solution

March 2022 : Implement the solution

April 2022 : Evaluate the solution

May 2022 : Final touches, presentation

Resources

Type	Title	Author(s)
Article	Formant-Based Robust Voice Activity Detection	I.Yoo, H.Lim, D.Yook
Article	Vowel formants compared with resonances of the vocal tract	Aalto Daniel, Huhtala Antti, Kivelä A., Malinen Jarmo, Palo Pertti, Saunavaara Jani, Vainio Martti
Dataset	English multi-speaker corpus for CSTR voice cloning toolkit	Yamagishi Junichi
Dataset	MUSAN: A Music, Speech, and Noise Corpus	David Snyder and Guoguo Chen and Daniel Povey
Article	Signal-to-noise ratio (SNR) as a measure of reproducibility: Design, estimation, and application	Elkum, Naser and Shoukri, Mohamed
Article	Speech enhancement for non-stationary noise environments	Cohen, Israel and Berdugo, Baruch
Article	Noise power spectral density estimation based on optimal smoothing and minimum statistics	Martin, R.
Article	Computationally Efficient Speech Enhancement By Spectral Minima Tracking In Subbands	Gerhard Doblinger
Article	Assessing local noise level estimation methods: Application to noise robust ASR	Christophe Ris and Stéphane Dupont
Article	Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator	Ephraim, Y. and Malah, D.
Article	Speech enhancement using a soft-decision noise suppression filter	McAulay, R. and Malpass, M.

Current Version

Theory

Check it out here!

Application

Code is here! (Datasets are not present)

Presentation

Presentation can be viewed here!