Semi-supervised Information Extraction
We are developing the automatic content extraction system for information
agents. In order to reduce the cost of building requred resources for the
automatic content extraction with minimal performance loss, we present a
semi-supervised information extraction approach. We concentrate on improving
not only the precision of the result, but also the coverage of the
method.
- Automatic Content Extraction
- Extracting structured content-bases from natural language documents
- A Semi-supervised Information Extraction Approach
- Input : Natural Language Documents, Seed Instances
- Output : Structured Content-bases, such as RDBs or Semantic Web
Ontologies
- Context Pattern Induction is based on the simple surface
templates.
- Candidate Instance Extraction is based on the sequence alignment
model.
- Content Assessment is based on the redundant-based assessment
scheme.
- Cross-lingual Weakly-supervised Learning of Semantic Relations
- Motivation : Lack of resources for most languages
- Method : To leverage bilingual texts to learn the relation
extractor for resource-poor languages without significant annotation
efforts
- Demo Video
- You can download and play the demo video of the semi-supervised
information extraction system for extracting an EPG content-base from here
- Publications
- Seokhwan Kim, Minwoo Jeong, Jonghoon Lee, Gary Geunbae Lee. A
Cross-lingual Annotation Projection Approach for Relation Detection.
Proceedings of the 23rd International Conference on Computational
Linguistics (COLING 2010), pp 564-571, Beijing, Aug 2010
- Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee. A Local Tree
Alignment-based Soft Pattern Matching Approach for Information
Extraction. Proceedings of the North American Chapter of the Association
for Computational Linguistics/Human Language Technology (NAACL HLT 2009), pp
169-172, Colorado, May 2009
- Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee. An Alignment-based
Pattern Representation Model for Information Extraction. Proceedings of
the 31st Annual International ACM SIGIR Conference (poster), pp 875-876,
Singapore, July 2008
- Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee. Kwangil Ko, Zino Lee.
An alignment-based approach to semi-supervised relation extraction
including multiple arguments. Proceedings of the fourth Asian
Information Retrieval Symposium (AIRS 2008), Harbin, Jan 2008
- Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee. A semi-supervised
method for efficient construction of statistical spoken language
understanding resources. Proceedings of the Interspeech 2007-Eurospeech,
Antwerp, Aug 2007
- Contact