In Université Marne la Vallée - building Copernic - room 4B08R
This meeting will gather researchers from Mathematics, Computer Science and Linguistics; the purpose is to compare approaches and look for possible interactions on text analysis and classification.
Abstract of Cristan Martinez & Claude Martineau:
In this talk we will describe a Local Grammar and Dictionary-Graph approach to develop resources for the extraction of complex text segments. A complex text segment is an extended notion of multi-word units (MWUs) that allows a large description of more complex and syntactically more flexible linguistic patterns. First we will present some basics about Unitex/GramLab, an open-source corpus processing suite. Then, we will show how to describe complex language constructions through graphs and how to produce on-the-fly electronic dictionary entries across graphs transductions. As example, we will illustrate a way to combine dictionaries, local grammars and dictionary-graphs to identify some complex text segments as part of an event extraction task. Finally, we will discuss some advantages and drawbacks of our approach and highlight potential perspectives of further research and applications.
Keywords: complex text segment, local grammar, dictionary-graph, event extraction, unitex/gramlab
Abstract of Yang Wang:
There have been no shortages of controversies in literature, from old questions such as whether Cao Xueqin wrote all 120 chapters of "Dreams of Red Chamber", widely known as the greatest work of Chinese literature, to new questions such as whether Obama actually wrote his autobiography "Dreams From My Father". For mathematicians, it is interesting to ask whether mathematics can be used to settle these controversies. In this talk, I will give an overview on how mathematics can be applied to analyze the "style" of an author and the related field of study called "stylometry". I will show that mathematics can be used to almost definitively settle many such controversies.