Libri-Light 提供 60 k 小时以上的无标签语音、用于有限监督的小型培训集(10 小时、1 小时或 10 分钟的标签语音)
Libri-Light offers 60+ k hours of unlabelled speech, a small training set for limited supervision (10h, 1h or 10 minutes of labelled speech), and a common set of metrics to evaluated three settings:
Documentation for downloading Libri-Light or preparing the source files from scratch can be found in data_preparation
.
The eval
directory contains ABX, PER and WER evaluations on pretrained CPC models.
The baselines
directory contains pretrained wav2letter baseline models and information about reproduction.
@INPROCEEDINGS{librilight,
author={J. {Kahn} and M. {Rivière} and W. {Zheng} and E. {Kharitonov} and Q. {Xu} and P. E. {Mazaré} and J. {Karadayi} and V. {Liptchinsky} and R. {Collobert} and C. {Fuegen} and T. {Likhomanenko} and G. {Synnaeve} and A. {Joulin} and A. {Mohamed} and E. {Dupoux}},
booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Libri-Light: A Benchmark for ASR with Limited or No Supervision},
year={2020},
pages={7669-7673},
note = {\url{https://github.com/facebookresearch/libri-light}},
}
The Libri-light code is released under the MIT license. See LICENSE for additional details.