Libri-Light语音数据集

Libri-Light

Libri-Light Facebook 语音数据集

Libri-Light 提供 60 k 小时以上的无标签语音、用于有限监督的小型培训集（10 小时、1 小时或 10 分钟的标签语音）

下载所需积分：免积分下载

数据集发布者：数据集市

发布时间： 2020年12月31日

数据大小： 3.5TB

查看原始数据

数据介绍

Libri-Light offers 60+ k hours of unlabelled speech, a small training set for limited supervision (10h, 1h or 10 minutes of labelled speech), and a common set of metrics to evaluated three settings:

the unsupervised/zero-resource setting. Here, models are trained only on unlabelleds speech and attempt to construct 'good' speech representations. They are evaluated with the ABX metric.
the semi-supervised setting. Here, models are trained with the limited supervision dataset and exploit the unlabelled in various ways (as pretraining, to get pseudo-labels, etc). The models are evaluated using either PER or WER.
the distant supervision setting. Here, models can use additional unaligned text to build a decoder. These models are evaluated using WER.

Documentation

Documentation for downloading Libri-Light or preparing the source files from scratch can be found in data_preparation.

The eval directory contains ABX, PER and WER evaluations on pretrained CPC models.

The baselines directory contains pretrained wav2letter baseline models and information about reproduction.

Citing

@INPROCEEDINGS{librilight,
  author={J. {Kahn} and M. {Rivière} and W. {Zheng} and E. {Kharitonov} and Q. {Xu} and P. E. {Mazaré} and J. {Karadayi} and V. {Liptchinsky} and R. {Collobert} and C. {Fuegen} and T. {Likhomanenko} and G. {Synnaeve} and A. {Joulin} and A. {Mohamed} and E. {Dupoux}},
  booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={Libri-Light: A Benchmark for ASR with Limited or No Supervision},
  year={2020},
  pages={7669-7673},
  note = {\url{https://github.com/facebookresearch/libri-light}},
}

License

The Libri-light code is released under the MIT license. See LICENSE for additional details.

还没有任何文件记录.

Libri-Light语音数据集

Libri-Light

相关数据

谷歌随机生成的3D模型数据集

欧洲议会平行语料库1996-2011

IMDB电影评论数据集

数据介绍

Documentation

Citing

License