2020

Zhang M, Sisman B, Zhao L, Li H. DeepConversion: Voice conversion with limited parallel training data[J]. Speech Communication, 2020.
paper, samples
Zhou K, Sisman B, Zhang M, et al. Converting Anyone’s Emotion: Towards Speaker-Independent Emotional Voice Conversion[J]. arXiv preprint arXiv:2005.07025, 2020.
preprint
2019
Berrak Sisman, Mingyang Zhang, Minghui Dong, and Haizhou Li, “On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019.
paper
Zhang, M., Wang, X., Fang, F., Li, H., Yamagishi, J. (2019) Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet. Proc. Interspeech 2019, 1298-1302, DOI: 10.21437/Interspeech.2019-1357.
paper, samples
ZHANG Mingyang;ZHA Cheng;Tashpolat Nizamidin;XU Xinzhou;ZHAO Li, “Continuous speech emotion trend detection based on data field emotion space and shuffled frog-leaping algorithm”, Acta Acustica, 2019, v.44(01) 12-19
B. Sisman, M. Zhang and H. Li, “Group Sparse Representation with WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing.
doi: 10.1109/TASLP.2019.2910637
paper

2018

M. Zhang, B. Sisman, S. S. Rallabandi, H. Li and L. Zhao, “Error Reduction Network for DBLSTM-based Voice Conversion,” 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Honolulu, HI, USA, 2018, pp. 823-828.
paper, samples
Sisman B, Zhang M, Li H. A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder. Proc. Interspeech 2018, 2018: 1978-1982.
paper
B. Sisman, M. Zhang, S. Sakti, H. Li and S. Nakamura, “Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion,” 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018, pp. 282-289.
paper
Xiao, J., Yang, S., Zhang, M., Sisman, B., Huang, D., Xie, L., … & Li, H. The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018.
paper

M. Zhang, C. Zou, R. Liang and L. Zhao, “Speech Recognition and Synthesis Algorithm for Digital Hearing Aids under Background Noise,” 2016 International Conference on Information System and Artificial Intelligence (ISAI), Hong Kong, 2016, pp. 347-351.
paper