Open Information Extraction

The purpose of Information Extraction is extracting triples which have a form from input sentence. Arguments can be an NP and relation denotes relationship between each argument pair. We can see triple as a unit of knowledge. Therefore, extracting right triple is same as extracting right information.

Our research is the Open Information Extraction. Open Information Extraction system extracts triples without any prior domain knowledge which means domain independent. We just consider input sentence itself.

Open Information Extraction is a necessary paradigm when extending Information Extraction to web scale. Since there are infinitely many domains and sentence forms on the web, the system should cope with each unexpected, uncommon cases. Belows are some issues and methodologies related to Open Information Extraction.

  1. Training data
    • It is very laborious to gather training data for Open Information System because web contains infinite number of input data.
    • We gather templates for each sentence pattern autonomously from corpus. This is called distant supervision or self supervision. Then these templates are applied to each input sentence.
  2. Output constraints
    • Semi-supervised learning are suffering from semantic drift which are caused by error accumulation during training process.
    • We couples training process of each extractor. Each extractors constraint each other’s training process. This can constraints output of extractor efficiently.