Snapshot

  • 模型名称:Coarse-to-Fine Entity Representation (CFER)

  • 论文来源:arXiv 2020.12.04 北大

  • 面向任务:DocRED

  • 论文动机:1. GNNs on document-level graph fail to model the interactions between long-distance entities.(由于过平滑问题?) 2.encode path可以一定程度解决长距离实体交互,但fail to capture global contextual information (since they usually integrate only local contextual information);

    CFER: global contextual information和model长距离实体交互 我全都要!

  • 指标分数:

    • Ign F1: 57.89 F1: 59.82 (BERT-base)
  • 快看速评:

    1. multi-hop上的过平滑问题可以理解,那也是取决于构建的图上相关节点距离太远所以必须要多次hop,所以问题出在了构图上而不是图上的GNN上。第二点动机不是很能理解,为什么在编码节点的时候一定要用到global contextual information呢?encode path不就是为了摒弃掉无用信息吗?所以对这篇文章的动机持有怀疑态度!
    2. 对了,这篇基于语法解析树构建的图;
  • 代码复现:暂无

  • 基于图的两种方法和存在问题(摘自原文)

    • Integrate neighborhood information for each node, Although they consider the entire graph structure, they may fail to model the interactions between long-distance entities due to the inherent over-smoothing problem in GNN.
    • Encode path information between the target entity pair in the grap, They have the ability to alleviate the problem of modeling long-distance entity interactions, but they may fail to capture more global contextual information since they usually integrate only local contextual informa- tion for nodes in the graph

Method

Graph construction

  • syntactic dependency edge: 对each sentence 基于spaCy做依赖解析;

  • adjacent word edge:相邻单词;

    Adding edges between adjacent words can mitigate the dependency parser errors.

  • self-loop edge: 自环,自身的历史信息;

  • adjacent sentence edge:依赖树的root根节点相连;

  • coreferential mention edge:共指连接边;

Coarse-to-fine Entity representations

  • Text encoding module

    • embedding layer(BERT/Glove)+contextual layer(Bi-GRU)

      得到contextual embedding $\mathbf{h}_{i}=\left[\overrightarrow{\mathbf{h}}_{i} ; \overleftarrow{\mathbf{h}_{i}}\right] \in \mathbb{R}^{d_{h}}$

  • Coarse-level Representation Module

    DCGCN (密集连接图卷积神经网络) TODO#1

    DCGCN 可以capture local and global context information 来源于DCGCN

    $\mathcal{N}(i)$是节点$i$的邻居集合(那么就是说聚合的时候不区分边类型,被视为同质图), $\mathbf{x}_{j}^{(k)}$ 是第$k$个block的input, $\mathbf{x}_{j}^{(k)}$会在每个block中的layer输入与所有先前layer的输出concatenate作为下一次的输入(skip connection=?)

    第$k$个模块的输出为:

  • Fine-level Representation Module

    Coarse-level representations 无法对长距离实体之间的交互建模;

    在fine-level representation中,以coarse-level representations作为guidance,利用path information缓解这类问题;

    针对$(e_1,e_2)$实体对的所有mention pair,构建$\left|e_{1}\right| \times\left|e_{2}\right|$个最短路径(only syntactic
    dependency and adjacent sentence edges), 对于第$i$个路径$\left[w_{1}, \ldots, w_{l e n_{i}}\right]$

    Path encoder(the j-th node in the i-th path):

    $\mathbf{m}_{i}^{(h)},\mathbf{m}_{i}^{(h)}$为头实体与尾实体的path-aware representations;当然,不是所有的path都有用,$\left|e_{1}\right| \times\left|e_{2}\right|$个最短路径中有多少有用信息呢?以coarse-level representations作为guidance去选择;($\widetilde{\mathbf{h}},\widetilde{\mathbf{t}}$ 为 the coarse-level head and tail entity representations)

    这样attention score计算有什么道理吗?# TODO 3

    选择图中的最短路径,可能会包含我们所说的Inference path,之后再attention去选择,这样会显得更不那么人工一些?

  • Classifcaition Module

    一个二分类没有什么好说的;loss是binary cross entropy loss;

Experiment

  • 实验结果

    CFER静态图相比较于LSR动态图更稀疏,更简单;

  • 对于长尾分布中那些类别样本较少的关系类更加鲁棒(只对比了BERT-2-step,Bi-LSTM,挑了两个最软的柿子捏)

    • 作者说因为 we can capture both global information and subtle clues that may include special features of long-tail relations 可以说为强行胡扯吗?
  • 消融实验

    • the attention aggregator
    • used shortest paths ->random shortest path
    • DGGCN 对于捕获区全局与local上下文信息来说还是挺强的
  • Case study 挺有意思,没怎么看懂…

TODO

  1. Densely Connected Graph Convolutional Networks for Graph-to-Sequence Learning.
  2. 看一下后面开源的代码
  3. attention计算方式比较疑惑,第一次见