康奈尔大学(Cornell)提供的影评数据集数据集

Cornell Movie Review Data

康奈尔大学 Cornell 影评文本数据集

由电影评论组成,其中持肯定和否定态度的各1,000 篇;另外还有标注了褒贬极性的句子各5,331 句,标注了主客观标签的句子各5,000 句。

下载所需积分：免积分下载

数据集发布者：数据集市

发布时间： 2020年10月31日

查看原始数据

Sentiment polarity datasets

polarity dataset v2.0 ( 3.0Mb) (includes README v2.0): 1000 positive and 1000 negative processed reviews. Introduced in Pang/Lee ACL 2004. Released June 2004.
Pool of 27886 unprocessed html files (81.1Mb) from which the polarity dataset v2.0 was derived. (This file is identical to movie.zip from data release v1.0.)
sentence polarity dataset v1.0 (includes sentence polarity dataset README v1.0: 5331 positive and 5331 negative processed sentences / snippets. Introduced in Pang/Lee ACL 2005. Released July 2005.
archive:
- polarity dataset v1.0 (2.8Mb) (includes README): 700 positive and 700 negative processed reviews. Released July 2002.
- polarity dataset v1.1 (2.2Mb) (includes README.1.1): approximately 700 positive and 700 negative processed reviews. Released November 2002. This alternative version was created by Nathan Treloar, who removed a few non-English/incomplete reviews and changing some of the labels (judging some polarities to be different from the original author's rating). The complete list of changes made to v1.1 can be found in diff.txt.
- polarity dataset v0.9 (2.8Mb) (includes a README):. 700 positive and 700 negative processed reviews. Introduced in Pang/Lee/Vaithyanathan EMNLP 2002. Released July 2002. Please read the "Rating Information - WARNING" section of the README.
- movie.zip (81.1Mb): all html files we collected from the IMDb archive.

Sentiment scale datasets

scale dataset v1.0 (includes scale data README v1.0): a collection of documents whose labels come from a rating scale. Introduced in Pang/Lee ACL 2005. Released July 2005.
- Sep 30, 2009: Yanir Seroussi points out that due to some misformatting in the raw html files, six reviews are misattributed to Dennis Schwartz (29411 should be Max Messier, 29412 should be Norm Schrager, 29418 should be Steve Rhodes, 29419 should be Blake French, 29420 should be Pete Croatto, 29422 should be Rachel Gordon) and one (23982) is blank.
original reviews for scale dataset v1.0 (includes scale data README v1.0): original reviews from which the subjective extracts in scale dataset v1.0 were extracted.

Subjectivity datasets

subjectivity dataset v1.0 (508K) (includes subjectivity README v1.0): 5000 subjective and 5000 objective processed sentences. Introduced in Pang/Lee ACL 2004. Released June 2004.
Pool of unprocessed source documents (9.3Mb) from which the sentences in the subjectivity dataset v1.0 were extracted. Note: On April 2, 2012, we replaced the original gzipped tarball with one in which the subjective files are now in the correct directory (so that the subjectivity directory is no longer empty; the subjective files were mistakenly placed in the wrong directory, although distinguishable by their different naming scheme).

还没有任何文件记录.