Knowledge Extraction from Natural Language Texts

Knowledge extraction is the creation of structured knowledge from unstructured sources (e.g. texts). The resulting knowledge needs to be in a machine-interpretable format that facilitates inferencing. The tasks of Named Entity Recognition (NER), Entity Linking (EL) and Relation Extraction are instrumental for knowledge extraction from natural language. While NER deals with the identification of named entity mentions (spans) and their semantic types in texts, EL deals with the task of disambiguating a mention (a named entity or its co-reference) by mapping it to a unique entity in a Knowledge Graph. Relation Extraction, on the other hand, deals with the task of extracting new relations between two entity mentions.

Although the above mentioned tasks are related to one another, traditionally, they have been studied independently without leveraging their mutual dependencies. Most of these models adopt a pipelined approach to accomplish these tasks, thus rendering these models vulnerable to error propagation. To deal with this issue, several neural models have been proposed that studied either the tasks of NER and EL together or the tasks of NER and relation extraction together. I propose an end-to-end trainable model that solves the three tasks simultaneously using a multi-task learning framework.

Related Papers

Knowledge Graph Representation Learning and Reasoning

Reasoning capability is one of the greatest hallmarks of human intelligence and is also one of the long standing challenges for Artificial Intelligence. Neuro-scientists and psychologists postulate that human brains are capable of storing infinite number of Concepts. Complex reasoning phenomenon arises from manipulating these concepts and inferring new relations among them. Incorporating reasoning capability into an AI system, therefore, depends on the ability to understand how the concepts in the real world interact with each other. To this end, Relational Reasoning, i.e. learning and inference with relational data, is a promising research direction.

Relational reasoning mostly focuses on learning representations for concepts and relations in existing relational data such as knowledge graphs, that helps to infer new relations among concepts. However, reasoning capability is not limited to simply inferring new relations (e.g. knowledge base completion). It often requires composition and abstraction of several concepts and their interactions (relations). Additionally, in real world, the concept space is not limited to factual knowledge, but also commonsense knowledge plays an important role in reasoning.

Our goal is to design a deep reasoning model, that can combine both factual and common sense knowledge, and has the ability to perform selective composition and abstraction required for complex reasoning.

Related Papers

Generating Fact-Allegiant Entity Summaries

Textual descriptions of entities enable both humans and machines to quickly grasp the most discernible information about an entity, thereby providing an intuitive basic understanding of the identity of an unknown entity. Large scale knowlegde graphs such as DBpedia, Wikidata, Google Knowledge Graph, and Diffbot's Knowledge Graph sparsely provide textual descriptions of varying length for some of the entities. Whereas Wikidata and Google's knowledge graph provide very short textual descriptions as noun phrases, DBpedia and Diffbot's Knowledge Graph provide multi-sentence textual descriptions (abstracts) that are more detailed than noun phrases and often incorporate additional factual details. Unfortunately, many millions of entities in these knowledge graphs still lack textual descriptions of either type.

The shorter form of textual descriptions have a wide range of applications such as question answering, named entity disambiguation, etc. Additionally, descriptions of this sort can also be useful to determine the fine-grained ontological type of an entity. Although many knowledge graphs already provide a fixed inventory of ontological types, many of these are abstract in nature (e.g.\ \emph{person}, \emph{company}, etc.), considering the vast diversity of different entities that are described on the Web, small inventories of ontological types are not sufficient, and it is better to have an open domain solution that can generate descriptions such as \emph{American basketball player} instead of \emph{person}. On the other hand, a multi-sentence textual description can highlight key facts about an entity and serve as a summary.

Related Papers