Over the past few years scientists have been working to reduce story plots into data. On February 4, 2015 The Paris Review released an article called “Man In Hole” with the subtitle of “Can a Novel’s Plot Be Reduced to Data Points?” On July 12, 2016 The Atlantic put out an article “The Six Main Stories, As Identified by a Computer” and after having gone through these and similar articles, and diving into research papers I’ve found its time I weigh in on the matter.
The Paris Review article mentions a man named Matthew Jockers, who at the time of the article was an English professor at the University of Nebraska, did a study on tens of thousands of books. Jockers had a different approach to plot than what some of us writer types might think. Instead of identifying plot as the underlying structure based on the progression of the story, Jockers uses the emotional trajectory of the plot without putting scenes in chronological order, or “Syuzhet”. Jockers explains,
Syuzhet is concerned with the linear progression of narrative from beginning(first page) to the end (last page)… When we study the syuzhet, we are not so much concerned with the order of the fictional events but specifically interested in the manner in which the author presents those events to readers.
Part of Jockers’s research involved inputting a database of the emotional positive or negative power of words as pulled by crowd sourced voting. The ultimate finding of Jockers’s research was that there were “about six” story archs but never revealed much about it (presumably leaving the details for another project). More information on Jockers’s process can be found here and here.
Naturally Jockers wasn’t the first to posit novel could be put into a machine and analyzed. It was Kurt Vonnegut who had proposed it first and in fact it was the ever popular video on plot on OpenCulture that inspired Jockers into the specific direction of his research.
Vonnegut figures there are more than six but there is a similar path among the two men and they are unmistakably heading in the right direction.
Then there’s the post from The Atlantic and we finally get (possibly) the full picture. Scientists got together, citing the work of Jockers, and selected 1,737 works of fiction between 10,000 and 200,000 words long and after running the data through a similar form of sentiment analysis as Jockers we got 6 core narratives. (Here’s the link to the research paper)
- Rise, or Rags to Riches
- Fall, Riches to Rags
- Fall then Rise, Man in a Hole
- Rise then Fall, Icarus
- Rise then fall then rise, Cinderella
- Fall then rise then fall, Oedipus
It all seems to be fairly tidy and I’ve been thinking about what this neat sorting might mean. My assumption here (I am not an expert) is that the patterns above not only make logical sense but the structure of them in a very deep way reflect the human mind’s craving of drama, tension, and/or redemption. It is another possibility that since story telling runs so deep in the history of humankind we have a form of social or cultural demand for stories that fit into these arcs. If either of these are true, and I currently am willing to take any of my own hypothesis with a grain of salt, then it is perfectly human of us to have our plots fit in such a neat way data-wise. Of course, part of me wants to question if the researchers could possibly be missing information on fiction or if by analyzing the fiction they have stripped something critical from its nature. Ultimately I think it’s fine but I’d like to hear other opinions.
If condensing fiction into data concerns you in the least bit, I suggest that you reevaluate and realize the sole purpose of human creativity is to embark from such set patterns into something new. Data/statistical analysis of fiction will only let us clearly see the boundaries around us that we’ve been unable to see and in this new vision we’ll be more able to set out in a more creative, and hopefully better, direction.