asher63 (asher63) wrote,
asher63
asher63

Texts like networks: Improved stylometry from the Polish Academy of Sciences.

From the country that gave us Copernicus, Marie Curie, Joseph Conrad, and Staislaw Lem, comes a breakthrough in text analysis:

https://www.eurasiareview.com/12042019-texts-like-networks-how-many-words-are-sufficient-to-recognize-the-author/

So, how can we verify who penned a historical text known only from fragments? How can we establish the true creator of an Internet lampoon? How can we really determine if the text of a thesis or doctoral dissertation is not plagiarized? In many cases, traditional stylometric methods fail or do not lead to sufficiently reliable conclusions.

In Information Sciences, scientists from the Institute of Nuclear Physics of the Polish Academy of Sciences (IFJ PAN) in Cracow have presented their own statistical tool for stylometric analysis. Constructed with the use of graphs, it makes it possible look at the structure of texts in a qualitatively new way.

“The conclusions of our research are, on the one hand, encouraging. They indicate that the individuality of any person manifests itself clearly in the way they use a surprisingly small number of words. But there is also another, darker side of the coin. Since it turns out we are so original, it will be easier to identify us by our statements,” says Prof. Stanislaw Drozdz (IFJ PAN, Cracow University of Technology). ...


The method hinges on the use of networks and graphs:

We suggested that the characteristic features of the style be sought in a network representation of the text, using graphs,” explains Tomasz Stanisz, PhD student at the IFJ PAN and the first author of the publication, and he specifies: “The graph is a collection of points, or vertices of the graph, connected by lines, i.e. the edges of the graph. In the simplest case – in the so-called unweighted network – the vertices correspond to individual words and are connected by edges if and only if two given words have occurred adjacent to each other at least once in the text. For example, for the sentence ‘Jane is hungry’, the graph would have three vertices, one for each word, but there would only be two edges, one between ‘Jane’ and ‘is’, the other between ‘is’ and ‘hungry’.” ...

Cool stuff. Read the rest at the link.
Tags: books, geekery, language
Subscribe

  • Linkage.

    LINKAGE 2020-12-15 SOLARWINDS POSSIBLE VEHICLE OF GOVERNMENT HACK. Austin-based software company SolarWinds, which makes server and network…

  • Freedom and Jihad

    A miscellany of links and comments on militant Islam in the world today. I'd been hoping to work these into a coherent essay, but time doesn't…

  • Apocalypse and Its Discontents

    Spengler on civilizational death and its consequences. Hoyt on the myths of collapse: 'Collapse doesn’t regress to an earlier and simpler age. It…

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments