Relation Extraction

Automatic Knowledge Base Construction

Denis Savenkov
Emory NLP Group Meeting
February 9, 2015

"Scientia Potentia Est"

Solving difficult problems requires a lot of knowledge
Human spend a significant part of their life learning
Computers need to have this knowledge as well!

Structured Information

Most of the information in the world is unstructured:
- text
- images
- video
- ...
Unstructured information is hard to work with
We need to add some structure...

Example: Question answering

Who was the PhD advisor of Prof. Jinho Choi?

What if we had a database with facts?

We could just query the database and get the answer:

Who was the PhD advisor of Prof. Jinho Choi?


SELECT advisor
FROM phd_advisors
WHERE student = "Jinho D. Choi"

From words to entities

Entities

We may use different words to refer to the same thing
- Rafa, The King of Clay, Rafael Nadal
Attributes of entities
- types: e.g. tennis player
- characteristics: e.g. height, weight, birth date
Some entities are related
- [Tony Nadal] <coach> [Rafael Nadal]

Predicates

Entities are related in different ways:
- [Tony Nadal] <coach-of> [Rafael Nadal]
- [Tony Nadal] <uncle-of> [Rafael Nadal]
- [Rafael Nadal] <parents> [Sebastián Nadal]
- [Rafael Nadal] <parents> [Ana María Parera]
Entities with their relations constitute a knowledge base
We can represent a knowledge base as a graph

Knowledge Base

Knowledge Graphs

* image from KDD'14 "Constructing and Mining Web-scale Knowledge Graphs" workshop slides

Some challenges

Validation: knowledge graphs are not always correct
Interface: how to make it easier to access the knowledge?
Intelligence: how to create AI fom knowledge graphs?
Growth: knowledge graphs are incomplete
- link prediction
- ontology matching
- knowledge extraction (this presentation)

from KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014

Applications

Entity summarization

Applications

Question Answering

47M entities and 2.5B facts
fully structured (entities and relations come from a fixed lexicon rather than free text)
constructed by community members
Built by MetaWeb and acquired by Google in 2010
Data is publicly available
Will be shut down in 2015 and data transitioned to WikiData
Tuple: [/m/0jcx, /m/04m8, /m/019xz9] means Albert Einstein was born in Ulm

Incompleteness

71% of people in Freebase have no information on place of birth and 75% have no known nationality *

Long-tail distribution: we know a lot about popular entities, but there is a heavy tail of less known entities
How to increase coverage?
- Ask people: crowdsourcing
- Merge with other knowledge bases: ontology matching
- Extract from the available data

* from "Knowledge Vault : A Web-Scale Approach to Probabilistic Knowledge Fusion" by X.Dong et al. 2014

Web of Data



Avatar

		Director: James Cameron
		(born August 16, 1954)

Science fiction
Trailer

see http://schema.org

Wrapper Induction

"Wrapper Induction for Information Extraction" by N.Kushmerick et al. 1997

Tables on the web

Relational data on the web is often represented as tables and it is possible to extract this data (e.g. [1])

[1] "WebTables: Exploring the Power of Tables on the Web", M.Cafarella et al. 2008

DeepWeb

Large volumes of data is accessible only through HTML form interfaces
We can automatically make queries and extract the hidden knowledge e.g. [1]

[1] "Web-Scale Extraction of Structured Data" by M.Cafarella et al. 2008

Relation extraction from text

[Emory College] was founded in [1836] in [Oxford, Georgia] by the [Methodist Episcopal Church].

Focused extraction: need to find a particular attribute of a particular entity (slot-filling)
Unfocused extraction: process text and extract everything we can

TREC KBA (knowledge base acceleration) http://trec-kba.org/

Focused extraction

"Knowledge Base Completion via Search-Based Question Answering" by B.West et al 2014 (WWW)

Relation extraction from NL

Structured extractions (fixed entity/relations lexicon)
1. Supervised relation extraction
2. Semi-supervised relation extraction
3. Distant supervision for relation extraction
Open information extraction (entities and relations expressed in natural language)

Relation extraction from NL

Today, computers can't understant natural language text
How do we teach them to extract knowledge then?
M.Hearst* proposed to extract hyponyms using simple patterns (Hearst patterns)
- Bruises, wounds, broken bones or other injuries...
- temples, treasuries,and other important civic buildings
- All common-law countries, including Canada and England...
- ...

* "Automatic Acquisition of Hyponyms from Large Text Corpora" by Marti Hearst, 1992

Supervised relation extraction

Training dataset with sentence-level labels for each relation
- Emory College was founded in 1836 (+)
- Founded in 1836, Emory College ... (+)
- Emory College opened in 1838 (-)
Datasets: ACE 2004 (Automatic Content Extraction), MUC-7 (Message Understanding Conference), BioNLP challenges
Solves relation extraction as binary classification problem
Research studied various features* and training methods

* "Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations" by N.Kambhatla 2004

Features

words between entities
types of entities (person, location, organizaton, etc)
# of words between entities
path between entities in a parse tree
...

Kernel-based methods

Alternatively, one can define a kernel (think similarity measure) between text fragments and apply kernel-based ML method (e.g. SVM or anything else)

Parse-tree kernels: similarity of parse trees of 2 text fragments
Each node can have multiple attributes (word, POS, NER type, etc), which are than used to compute the kernel

"A shortest path dependency kernel for relation extraction" by R. Bunescu & R. Mooney, 2005
"Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information" by GuoDong Zhou et al, 2007

Supervised relation extraction

Cons:
1. Expensive to obtain the data!
2. Adding new relation requires labelling
Pros:
1. High quality training data
2. Explicit negative examples

Semi-supervised relation extraction

Reduce the amount of supervision required
Examples: DIPRE, Snowball, KnowItAll
Based on bootstrapping (iteratively improving the system)

From "Snowball: Extracting Relations from Large Plain-Text Collections" by E.Agichtein & L.Gravano, 2000

Semi-supervised relation extraction

Pros:
1. Less supervision required
2. Can extract more knowledge triples thanks to bootstrapping
Cons:
1. Semantic drift: as we iterate the system extracts more and more incorrect patterns/triples
2. Extending to new relations still requires seed data

Distant supervision

Utilize existing knowledge base to label data and train a model

Image from KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014

Distant supervision assumptions

Assume we have a knowledge triple $(e_1, p, e_2)$

All sentences that mentions $e_1$ and $e_2$ together expresses the predicate p
At least one sentence that mentions $e_1$ and $e_2$ together expresses the predicate p (multi-instance setting)
A sentence that mentions $e_1$ and $e_2$ together might express the predicate p and a pair of entities can be related with different predicates (multi-instance multi-label setting)

"Distant supervision for relation extraction without labeled data" by M.Mintz et al 2009
"Modeling Relations and Their Mentions without Labeled Text" by S.Riedel et al 2010
"Multi-instance Multi-label Learning for Relation Extraction" by M.Surdeanu et al 2012

Distant supervision training

Extract features for all sentences that mention a related pair of entities
Randomly sample sentences with non-related entities as negative examples
Train a multiclass classification model

Distant supervision

Pros:
1. Scalable!
2. Can be applied in different languages
Cons:
1. Training data is noisy!
2. No explicit negative examples

Open Information Extraction

Introduced in [1]
Extracts natural language triples from text:
- Apple announced a new iPhone 6. =>
  (Apple, announced, iPhone 6)
Extracts noun phrases as entities and verb phrases as predicates
A trained classifier is used to predict whether an extraction is good

TextRunner [trained extractor] -> ReVerb [chunking] -> Ollie [dependency tree] -> OpenIE 4 (Srlie + RelNoun) [semantic roles]

[1]. "Open Information Extraction from the Web" by M.Banko et al. 2007

Open Information Extraction

Pros:
1. Even more scalable! ($O(N)$ vs $O(N|R|)$)
2. Do not require any training data
Cons:
1. Lack of structure: need to cluster predicates

Link prediction

Some knowledge can be inferred from already acquired knowledge
- [Kyle Korver] plays_for [Atlanta Hawks]
  + [Atlanta Hawks] league [NBA]
  = Means that: [Kyle Korver] is [basketball player]

"Random Walk Inference and Learning in A Large Scale Knowledge Base" by N.Lao et al, 2011
"Logistic Tensor Factorization for Multi-Relational Data" by M.Nickel and B.Tresp, 2013

Never Ending Language Learning

"Toward an Architecture for Never-Ending Language Learning" by A. Carlson et al 2010

Google Knowledge Vault

"Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion" by X.Dong et al 2014

Google Knowledge Vault

from KDD 2014 Tutorial on Constructing and Mining Web-scale Knowledge Graphs, New York, August 24, 2014

Summary

Computers need data structures
Knowledge graphs can be used to structure knowledge: entities and relations (RDF graphs)
Knowledge can be effectively acquired from unstructured data, e.g. natural language text

Thanks!

Questions?