3749

清华大学-科技知识图谱

SciKG

知识图谱 科技 清华大学 文本数据集

SciKG是一个以科研为中心的大规模知识图谱,目前包含计算机科学领域,由概念、专家和论文组成。

免积分下载
数据集市
2020年04月21日
273MB

相关数据

Twitter情感分析训练语料库
Twitter情感分析训练语料库
该情感分析数据集 包含1,578,627条分类推文,每行标记... 免积分下载
多领域情感评论文本数据集
多领域情感评论文本数据集
多领域情感数据集包含从Amazon.com获取的部分产品评论... 免积分下载
Euler图学习开源数据集
Euler图学习开源数据集
Euler图学习平台自研算法对应的开源图数据与样本数据 免积分下载

数据介绍

SciKG是一个以科研为中心的大规模知识图谱,目前包含计算机科学领域,由概念、专家和论文组成。其中,科技概念及其关系是从ACM计算分类系统中提取出来的,并辅以每个概念的定义(大多数来自维基百科)。我们进一步使用AMiner将每个概念对应的顶尖专家和最相关的论文联系起来。每个专家包含职位、隶属机构、研究兴趣等属性,以及到AMiner系统的链接。每篇论文则包含标题,作者,摘要,出版地点和年份等元信息。SciKG可用于更好地了解计算机科学领域的动态和演化,并帮助用户进行计算机领域中专家和论文的搜索和推荐。

数据统计:

Type of Object Count
Concept 908
Expert 206,240
Publication 512,698
Keyword 9,668

数据描述:

Field Name Field Type Description Example
id int keyword ID 2
name string keyword Document types
name_zh string Chinese keyword 文献类型
level int path length to the root node 1
definition string definition Document types is ...
definition_zh string Chinese definition 文献类型是...
child_nodes list of int child nodes' ID [3, 34, 62]
parent int parent node' ID 1
experts.id string AMiner expert ID 53f4394cdabfaedce554a943
experts.name string name of expert Ehud Reiter
experts.name_zh string Chinese name of expert
experts.position string position of expert Department's Research Training Coordinator
experts.aff string affiliation of expert Department of Computing Science University of Aberdeen
experts.h_index int h-index of expert 35
experts.interests list of strings research interests of expert ["Natural Language Generation"]
publication.id string AMiner paper ID 53e9b8b4b7602d97044905e7
publication.title string title of paper Object and reference immutability using Java generics
publication.author.id string author ID of paper 53f44cc6dabfaee43ec972a0
publication.author.name string author name of paper Yoav Zibin

数据示例:

{
  "id":2,
  "name":"Document types",
  "name_zh":"文献类型",
  "level":1,
  "definition":"",
  "definition_zh":"",
  "child_nodes":[
    3,
    34
  ],
  "parent":1,
  "experts":[
    {
      "id":"54107080dabfae92b4283fd7",
      "name":"Ehud Reiter",
      "name_zh":"埃胡德赖特",
      "position":"Department's Research Training Coordinator",
      "affiliation":"Department of Computing Science University of Aberdeen",
      "h_index":35,
      "interests":[
         "Natural Language",
        "Knowledge Acquisition"
      ]
    }
  ],
  "publications":[
    {
      "id":"53e9b8b4b7602d97044905e7",
      "title":"Object and reference immutability using Java generics",
       "authors":[
        {
          "id":"53f44cc6dabfaee43ec972a0",
          "name":"Yoav Zibin"
        },
        {
          "id":"53f4740adabfaedf4367b0c0",
          "name":"Alex Potanin"
        }
      ]
    }
  ]
}

数据引用:

Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2008). pp.990-998. [PDF] [Slides] [System] [API]

序号 名称 大小
1 SciKG_min_jsonld_1.0.tar.gz 168.3M
2 SciKG_min_1.0.tar.gz 105.4M