Mining Fun Facts from Wikipedia Tables & Real-Time Event Stream Processing

Posted on Jan 12, 2025 in Computers

Mining Fun Facts from Wikipedia Tables

Introduction – Problem Definition

Modern search engines provide contextual information surrounding query entities beyond ten blue links in the form of information cards.
Among the various attributes displayed about entities, there has been recent interest in providing fun facts.
Obtaining such trivia at a large scale is, however, non-trivial: hiring professional content creators is expensive, and extracting statements from the Web is prone to uninteresting, out-of-context, and/or unreliable facts.
In this paper, they show how fun facts can be mined from superlative tables in Wikipedia.
The content is dynamic (i.e., updated over time).
Efforts to mine fun facts from the Web, rather than hiring professionals to curate material, are motivated not only by cost and scalability but also by reliability and freshness.
The first approach to automate this process is text extracting. Unfortunately, text extraction is prone to pulling statements out of context.
Wikipedia has thus served as a rich source for generating fun facts due to its associated structured data.

Contribution

They show how to identify from Wikipedia a large number of relational tables, and attributes within each table, that are an excellent source for generating fun facts.
They propose a templated approach that, when instantiated using table data, automatically generates many fun facts from a single table.
They propose two general classes of templates, namely rank-ordered and distributional, that lead to interesting sentences when instantiated with table data.
They give a semi-automated method for turning structured facts from tables into natural language templates.

Methodology

They propose two general template view classes.
The rank-ordered view class describes how exceptional an entity is compared to other entities in a given set, with respect to some given ordered attribute.
The distributional view class describes how exclusive an entity is compared to the other entities with respect to membership of some given unordered attribute value.
Inference Model (for Phrasal Verb Components)
Dynamic Maintenance.

GUZEL – Real-Time Multi-Pattern Detection over Event Streams

What is Event Processing?

Rapid advances in data-driven applications over recent years have intensified the need for efficient mechanisms capable of monitoring and detecting arbitrarily complex patterns in massive data streams

This task is usually performed by complex event processing (CEP) systems
CEP engines are required to process hundreds or even thousands of user-defined patterns in parallel under tight real-time constraints

Real-Life Example

Consider a security system monitoring a corporate building. Every room entrance is equipped with a sensor that emits a signal to the main controller whenever any large object passes through the doorway. We are interested in detecting a scenario in which an intruder is detected near doorway A, then immediately passes through entrance B, and finally enters doorway C.

Optimization Techniques

Pattern reordering: Modifying the order in which the events are processed.
Pattern sharing: Pattern sharing methods utilize the structural similarities between different patterns to unify the processing of common subexpressions.

Objective

Rather than merely maximize the sharing degree or create locally optimal plans, they aim to produce a globally optimal plan for the given workload of patterns using a mixture of the two.
At the core of the framework lies the optimizer that uses sharing and reordering techniques to generate candidate evaluation plans.

Mining Fun Facts from Wikipedia Tables & Real-Time Event Stream Processing