2020
- Zhang M, Sisman B, Zhao L, Li H. DeepConversion: Voice conversion with limited parallel training data[J]. Speech Communication, 2020.
paper, samples - Zhou K, Sisman B, Zhang M, et al. Converting Anyone’s Emotion: Towards Speaker-Independent Emotional Voice Conversion[J]. arXiv preprint arXiv:2005.07025, 2020.
preprint2019
- Berrak Sisman, Mingyang Zhang, Minghui Dong, and Haizhou Li, “On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019.
paper - Zhang, M., Wang, X., Fang, F., Li, H., Yamagishi, J. (2019) Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet. Proc. Interspeech 2019, 1298-1302, DOI: 10.21437/Interspeech.2019-1357.
paper, samples - ZHANG Mingyang;ZHA Cheng;Tashpolat Nizamidin;XU Xinzhou;ZHAO Li, “Continuous speech emotion trend detection based on data field emotion space and shuffled frog-leaping algorithm”, Acta Acustica, 2019, v.44(01) 12-19
- B. Sisman, M. Zhang and H. Li, “Group Sparse Representation with WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing.
doi: 10.1109/TASLP.2019.2910637
paper
2018
- M. Zhang, B. Sisman, S. S. Rallabandi, H. Li and L. Zhao, “Error Reduction Network for DBLSTM-based Voice Conversion,” 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Honolulu, HI, USA, 2018, pp. 823-828.
paper, samples - Sisman B, Zhang M, Li H. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder. Proc. Interspeech 2018, 2018: 1978-1982.
paper - B. Sisman, M. Zhang, S. Sakti, H. Li and S. Nakamura, “Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion,” 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018, pp. 282-289.
paper - Xiao, J., Yang, S., Zhang, M., Sisman, B., Huang, D., Xie, L., … & Li, H. The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018.
paper
2016
- M. Zhang, C. Zou, R. Liang and L. Zhao, “Speech Recognition and Synthesis Algorithm for Digital Hearing Aids under Background Noise,” 2016 International Conference on Information System and Artificial Intelligence (ISAI), Hong Kong, 2016, pp. 347-351.
paper