Mining Fun Facts from Wikipedia Tables & Real-Time Event Stream Processing

Mining Fun Facts from Wikipedia Tables

Introduction – Problem Definition

  • Modern search engines provide contextual information surrounding query entities beyond ten blue links in the form of information cards.
  • Among the various attributes displayed about entities, there has been recent interest in providing fun facts.
  • Obtaining such trivia at a large scale is, however, non-trivial: hiring professional content creators is expensive, and extracting statements from the Web is prone to uninteresting, out-of-context, and/or unreliable facts.
  • In this paper, they show how fun facts can be mined from superlative tables in Wikipedia.
  • The content is dynamic (i.e., updated over time).
  • Efforts to mine fun facts from the Web, rather than hiring professionals to curate material, are motivated not only by cost and scalability but also by reliability and freshness.
  • The first approach to automate this process is text extracting. Unfortunately, text extraction is prone to pulling statements out of context.
  • Wikipedia has thus served as a rich source for generating fun facts due to its associated structured data.

Contribution

  • They show how to identify from Wikipedia a large number of relational tables, and attributes within each table, that are an excellent source for generating fun facts.
  • They propose a templated approach that, when instantiated using table data, automatically generates many fun facts from a single table.
  • They propose two general classes of templates, namely rank-ordered and distributional, that lead to interesting sentences when instantiated with table data.
  • They give a semi-automated method for turning structured facts from tables into natural language templates.

Methodology

  • They propose two general template view classes.
  • The rank-ordered view class describes how exceptional an entity is compared to other entities in a given set, with respect to some given ordered attribute.
  • The distributional view class describes how exclusive an entity is compared to the other entities with respect to membership of some given unordered attribute value.
  • Inference Model (for Phrasal Verb Components)
  • Dynamic Maintenance.

GUZEL – Real-Time Multi-Pattern Detection over Event Streams

What is Event Processing?

Rapid advances in data-driven applications over recent years have intensified the need for efficient mechanisms capable of monitoring and detecting arbitrarily complex patterns in massive data streams

  • This task is usually performed by complex event processing (CEP) systems
  • CEP engines are required to process hundreds or even thousands of user-defined patterns in parallel under tight real-time constraints

Real-Life Example

Consider a security system monitoring a corporate building. Every room entrance is equipped with a sensor that emits a signal to the main controller whenever any large object passes through the doorway. We are interested in detecting a scenario in which an intruder is detected near doorway A, then immediately passes through entrance B, and finally enters doorway C.

Optimization Techniques

  1. Pattern reordering: Modifying the order in which the events are processed.
  2. Pattern sharing: Pattern sharing methods utilize the structural similarities between different patterns to unify the processing of common subexpressions.

Objective

  • Rather than merely maximize the sharing degree or create locally optimal plans, they aim to produce a globally optimal plan for the given workload of patterns using a mixture of the two.
  • At the core of the framework lies the optimizer that uses sharing and reordering techniques to generate candidate evaluation plans.