5033

Libri-Light语音数据集

Libri-Light

Libri-Light Facebook 语音数据集

Libri-Light 提供 60 k 小时以上的无标签语音、用于有限监督的小型培训集(10 小时、1 小时或 10 分钟的标签语音)

免积分下载
数据集市
2020年12月31日
3.5TB

相关数据

谷歌随机生成的3D模型数据集
谷歌随机生成的3D模型数据集
为了在模拟中训练机器人的抓地力和其他任务,随机生成的3D模型... 免积分下载
欧洲议会平行语料库1996-2011
欧洲议会平行语料库1996-2011
平行语料库对于统计机器翻译(SMT)的研究至关重要,欧洲议会... 免积分下载
IMDB电影评论数据集
IMDB电影评论数据集
IMDB上25,000条电影评论数据集 免积分下载

数据介绍

Libri-Light offers 60+ k hours of unlabelled speech, a small training set for limited supervision (10h, 1h or 10 minutes of labelled speech), and a common set of metrics to evaluated three settings:

  1. the unsupervised/zero-resource setting. Here, models are trained only on unlabelleds speech and attempt to construct 'good' speech representations. They are evaluated with the ABX metric.
  2. the semi-supervised setting. Here, models are trained with the limited supervision dataset and exploit the unlabelled in various ways (as pretraining, to get pseudo-labels, etc). The models are evaluated using either PER or WER.
  3. the distant supervision setting. Here, models can use additional unaligned text to build a decoder. These models are evaluated using WER.

Documentation

Documentation for downloading Libri-Light or preparing the source files from scratch can be found in data_preparation.

The eval directory contains ABX, PER and WER evaluations on pretrained CPC models.

The baselines directory contains pretrained wav2letter baseline models and information about reproduction.

Citing

@INPROCEEDINGS{librilight,
  author={J. {Kahn} and M. {Rivière} and W. {Zheng} and E. {Kharitonov} and Q. {Xu} and P. E. {Mazaré} and J. {Karadayi} and V. {Liptchinsky} and R. {Collobert} and C. {Fuegen} and T. {Likhomanenko} and G. {Synnaeve} and A. {Joulin} and A. {Mohamed} and E. {Dupoux}},
  booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={Libri-Light: A Benchmark for ASR with Limited or No Supervision},
  year={2020},
  pages={7669-7673},
  note = {\url{https://github.com/facebookresearch/libri-light}},
}

License

The Libri-light code is released under the MIT license. See LICENSE for additional details.

还没有任何文件记录.