{"id":77,"date":"2022-12-06T23:39:33","date_gmt":"2022-12-06T23:39:33","guid":{"rendered":"http:\/\/blog.vaniila-ai.catie-na.fr\/?p=77"},"modified":"2022-12-08T13:50:57","modified_gmt":"2022-12-08T13:50:57","slug":"conformer-convolution-augmented-transformer-for-speech-recognition","status":"publish","type":"post","link":"https:\/\/blog.vaniila-ai.catie-na.fr\/?p=77","title":{"rendered":"Conformer: Convolution-augmented Transformer for Speech Recognition"},"content":{"rendered":"<p>[et_pb_section fb_built=\u00a0\u00bb1&Prime; custom_padding_last_edited=\u00a0\u00bbon|tablet\u00a0\u00bb admin_label=\u00a0\u00bbHeader\u00a0\u00bb _builder_version=\u00a0\u00bb4.18.0&Prime; _module_preset=\u00a0\u00bbdefault\u00a0\u00bb use_background_color_gradient=\u00a0\u00bbon\u00a0\u00bb background_color_gradient_stops=\u00a0\u00bbrgba(0,0,0,0) 0%|#000000 86%\u00a0\u00bb background_color_gradient_overlays_image=\u00a0\u00bbon\u00a0\u00bb background_image=\u00a0\u00bbhttp:\/\/blog.vaniila-ai.catie-na.fr\/wp-content\/uploads\/2022\/12\/web-developer-28.jpg\u00a0\u00bb custom_padding=\u00a0\u00bb5%||||false|false\u00a0\u00bb custom_padding_tablet=\u00a0\u00bb60px||||false|false\u00a0\u00bb custom_padding_phone=\u00a0\u00bb60px||||false|false\u00a0\u00bb collapsed=\u00a0\u00bbon\u00a0\u00bb global_colors_info=\u00a0\u00bb{}\u00a0\u00bb][et_pb_row _builder_version=\u00a0\u00bb4.19.2&Prime; _module_preset=\u00a0\u00bbdefault\u00a0\u00bb global_colors_info=\u00a0\u00bb{}\u00a0\u00bb][et_pb_column type=\u00a0\u00bb4_4&Prime; _builder_version=\u00a0\u00bb4.18.0&Prime; _module_preset=\u00a0\u00bbdefault\u00a0\u00bb global_colors_info=\u00a0\u00bb{}\u00a0\u00bb][et_pb_text _builder_version=\u00a0\u00bb4.19.2&Prime; _module_preset=\u00a0\u00bbe6504a1b-67eb-4b3d-b023-bcab277610b6&Prime; text_font=\u00a0\u00bb|||on|||||\u00a0\u00bb header_4_font=\u00a0\u00bbArchivo|700||on|||||\u00a0\u00bb header_4_text_color=\u00a0\u00bbgcid-f1414204-51c0-48ff-bc68-c545a86d03e7&Prime; header_4_font_size=\u00a0\u00bb14px\u00a0\u00bb header_4_letter_spacing=\u00a0\u00bb1px\u00a0\u00bb header_4_line_height=\u00a0\u00bb1.5em\u00a0\u00bb text_orientation=\u00a0\u00bbcenter\u00a0\u00bb custom_margin=\u00a0\u00bb||0px||false|false\u00a0\u00bb global_colors_info=\u00a0\u00bb{%22gcid-f1414204-51c0-48ff-bc68-c545a86d03e7%22:%91%22header_4_text_color%22%93}\u00a0\u00bb]<\/p>\n<h1 class=\"style-scope ytd-watch-metadata\"><span style=\"color: #ffcc00;\"><span style=\"left: 121.038px; top: 138.261px; font-size: 23.9103px; font-family: sans-serif; transform: scaleX(0.976188);\" role=\"presentation\" dir=\"ltr\">Conformer: Convolution-augmented Transformer for Speech Recognition<\/span><\/span><\/h1>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=\u00a0\u00bb4.19.2&Prime; _module_preset=\u00a0\u00bbdefault\u00a0\u00bb filter_opacity=\u00a0\u00bb75%\u00a0\u00bb global_colors_info=\u00a0\u00bb{}\u00a0\u00bb][et_pb_column type=\u00a0\u00bb4_4&Prime; _builder_version=\u00a0\u00bb4.19.2&Prime; _module_preset=\u00a0\u00bbdefault\u00a0\u00bb global_colors_info=\u00a0\u00bb{}\u00a0\u00bb][et_pb_cta button_text=\u00a0\u00bbOuvrir le document\u00a0\u00bb _builder_version=\u00a0\u00bb4.19.2&Prime; _module_preset=\u00a0\u00bbdefault\u00a0\u00bb hover_enabled=\u00a0\u00bb0&Prime; global_colors_info=\u00a0\u00bb{}\u00a0\u00bb button_url=\u00a0\u00bbhttp:\/\/blog.vaniila-ai.catie-na.fr\/wp-content\/uploads\/2022\/12\/2-1.pdf\u00a0\u00bb link_option_url=\u00a0\u00bbhttp:\/\/blog.vaniila-ai.catie-na.fr\/wp-content\/uploads\/2022\/12\/2-1.pdf\u00a0\u00bb sticky_enabled=\u00a0\u00bb0&Prime; background_color=\u00a0\u00bb#000000&Prime; custom_button=\u00a0\u00bbon\u00a0\u00bb]<\/p>\n<h3 style=\"text-align: center;\"><strong>Abstract<\/strong><\/h3>\n<p style=\"text-align: left;\">Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way.<\/p>\n<p style=\"text-align: left;\">To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer<span>.<\/span> Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%\/4.3% without using a language model and 1.9%\/3.9% with an external language model on test\/testother. We also observe competitive performance of 2.7%\/6.3% with a small model of only 10M parameters.<\/p>\n<p style=\"text-align: left;\">Index Terms: speech recognition, attention, convolutional neural networks, transformer, end-to-end<\/p>\n<p><span style=\"left: 96px; top: 703.74px; font-size: 14.944px; font-family: sans-serif; transform: scaleX(0.909319);\" role=\"presentation\" dir=\"ltr\">\u00a0<\/span><\/p>\n<p>[\/et_pb_cta][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>zdgaigfuefuafgliuQGVCUF<br \/>\nFOPFJOZIFHOZMFH<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"<!-- wp:paragraph -->\n<p>Welcome to WordPress. This is your first post. Edit or delete it, then start writing!<\/p>\n<!-- \/wp:paragraph -->","_et_gb_content_width":"","footnotes":""},"categories":[3],"tags":[],"class_list":["post-77","post","type-post","status-publish","format-standard","hentry","category-audio"],"_links":{"self":[{"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=\/wp\/v2\/posts\/77","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=77"}],"version-history":[{"count":6,"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=\/wp\/v2\/posts\/77\/revisions"}],"predecessor-version":[{"id":89,"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=\/wp\/v2\/posts\/77\/revisions\/89"}],"wp:attachment":[{"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=77"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=77"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.vaniila-ai.catie-na.fr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=77"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}