What OpenAI is achieving with GPT-3 is, in a nutshell, impressive. This new language model is able to program, design, hold conversations on complex topics in a fluid way and, now, summarize entire books in a couple of sentences.
The system works in a rather peculiar way. Instead of summarizing the entire book in one go, it does this by reducing the task into smaller tasks. It first summarizes the original text into sections, then summarizes those sections, which are in turn summarized into smaller sections, and finally a much shorter summary of the main idea of the book is achieved.
24,449 to 136 words
According to OpenAI, their fitted model from GPT-3 generates “sensible entire book summaries even matching the average quality of human-written abstracts.” They assure that it has an accuracy of six out of seven with respect to humans who have read the book 5% of the time and five out of seven with respect to humans who have read the book 15% of the time.
To prove it, OpenAI has published a series of book summaries on its website, but below we show the translated summaries of ‘Alice in Wonderland’ and ‘Romeo and Juliet’ to give us an idea:
“Alice falls into a burrow and grows to a gigantic size after drinking a mysterious bottle. She decides to concentrate on returning to her normal size and finding her way to the garden. She meets the Caterpillar, who tells her that one side of the mushroom It will make it grow taller and the other shorter. It eats the mushroom and returns to its normal size. Alice attends a party with the Mad Hatter and the March Hare. The Queen arrives and orders the execution of the gardeners for making a mistake with the roses. Alice saves them by putting them in a pot. The King and Queen of Hearts preside over a trial. The Queen gets angry and orders that Alice be sentenced to death. Alice wakes up and finds her sister by her side “- Alice in Wonderland (26,449 words).
“Romeo locks himself in his room, he is no longer in love with Rosalina. Capulet invites many guests to his party, including Romeo and his love, Juliet. Lady Capulet tells Juliet that Paris, a brave man, seeks her hand in marriage. Romeo sees Juliet and falls in love with her. Juliet confesses her love to Romeo and they agree to marry. Romeo kills Tybalt in a fight and is sentenced to exile. Juliet agrees to fake his death and go to Mantua with Romeo. Romeo buys poison to kill Juliet in her grave. He and Paris fight and both die. Juliet wakes up and commits suicide. The Prince says that heaven has punished the hatred between the Capulets and the Montagues “- Romeo and Juliet (25,433 words).
According to the company, pre-trained large models are not good at summarizing. One way to improve them is train them with feedback from humans, but that requires said humans to have read the entire book and is something that could take hours.
Solution? The recursive decomposition of tasks, or what is the same, dividing a difficult task into easier ones. That allows humans to evaluate abstracts more quickly and obviously allows the model to summarize books of any length.
OpenAI trained the model with a dataset of mostly fictional books containing about 100,000 words on average. To evaluate the model, they selected the 40 most popular books of 2020 and commissioned two people to read them, make a summary, and then evaluate the summary made by the AI.
The result is good, but not without problems. Some of the most notable is that it can generate inaccurate statements (something that may be due to the lack of context derived from simplifying a large task into small tasks) and that, on some occasions, the summary becomes a list of events important things in the book.
Speaking to VentureBeat, OpenAI has confirmed that the company “has no place to make the book summary model publicly available or open source.”
Via | OpenAI