# 谷歌 AI 记事：通过序列转导实现联合语音识别和说话人分类

## 一个集成的语音识别和说话者二值化系统

The integrated model can be trained just like a speech recognition system. The reference transcripts for training contain words spoken by a speaker followed by a tag that defines the role of the speaker. For example, “When is the homework due?” ≺student≻, “I expect you to turn them in tomorrow before class,” ≺teacher≻. Once the model is trained with examples of audio and corresponding reference transcripts, a user can feed in the recording of the conversation and expect to see an output in a similar form. Our analyses show that improvements from the RNN-T system impact all categories of errors, including short speaker turns, splitting at the word boundaries, incorrect speaker assignment in the presence of overlapping speech, and poor audio quality. Moreover, the RNN-T system exhibited consistent performance across conversation with substantially lower variance in average error rate per conversation compared to the conventional system.

A comparison of errors committed by the conventional system vs. the RNN-T system, as categorized by human annotators.

Furthermore, this integrated model can predict other labels necessary for generating more reader-friendly ASR transcripts. For example, we have been able to successfully improve our transcripts with punctuation and capitalization symbols using the appropriately matched training data. Our outputs have lower punctuation and capitalization errors than our previous models that were separately trained and added as a post-processing step after ASR.

This model has now become a standard component in our project on understanding medical conversations and is also being adopted more widely in our non-medical speech services.

## Acknowledgements

We would like to thank Hagen Soltau without whose contributions this work would not have been possible. This work was performed in collaboration with Google Brain and Speech teams.

暂无评论~~
• 请注意单词拼写，以及中英文排版，参考此页
• 支持 Markdown 格式, **粗体**、~~删除线~~、单行代码, 更多语法请见这里 Markdown 语法
• 支持表情，使用方法请见 Emoji 自动补全来咯，可用的 Emoji 请见 :metal: :point_right: Emoji 列表 :star: :sparkles:
• 上传图片, 支持拖拽和剪切板黏贴上传, 格式限制 - jpg, png, gif
• 发布框支持本地存储功能，会在内容变更时保存，「提交」按钮点击时清空
请勿发布不友善或者负能量的内容。与人为善，比聪明更重要！
Ctrl+Enter