Robo-journalism: How a computer describes a sports match

Basket ball gameImage copyrightGetty Images
Much of the promise of artificial intelligence is yet to be realised, but in some areas it's already proving its worth. Meet the robot journalists that one day might steal my job.
Robo-journalism is the process of automatically writing complete and complex news stories without any human intervention. Here are two "robo"-written articles - the first, penned by a program called Wordsmith, created by US company Automated Insights.
News organisation the Associated Press plans to use Wordsmith to write thousands of sports reports, like the one below. But how does a robot journalist work? The short articles below have been chopped up, with key bits highlighted and annotations under each snippet to explain the workings.

Sports reporting

Short headline: UNC beats Louisville 72-71 on late Paige basket. Long headline: Led by a Paige game-winner, North Carolina defeats Louisville 72-71
Image captionEverything from the headline to the text of this basketball match report was written by a computer program. It has to know how to fit the format - saving characters here by using the abbreviation "UNC", short for University of North Carolina.
CHAPEL HILL, N.C. _ Marcus Paige scored with nine seconds remaining in the game to give North Carolina a 72-71 lead over Louisville. The Heels held on to win by that same score following a missed 3-pointer by Wayne Blackshear and an unsuccessful second-chance attempt by Terry Rozier.
Image captionEach story starts out as a collection of data, logged during matches. That's things like tables, graphs and lists that might be hard to digest unless you're an expert. The system has a record of who did what and when, down to the second.
The Paige basket capped off a 13-point comeback for the Tar Heels, who trailed 63-50 after a Blackshear 3-pointer with 8:43 left in the game. UNC finished the game on a 22-8 run to secure the victory. After a basket by Brice Johnson gave North Carolina a 70-69 lead with 39 seconds left, Rozier responded with a hoop to give Louisville a one-point advantage with 26 seconds remaining.
Image captionThe software scours through its trove of data looking for "insights" - facts that it can figure out from the data. Like a human journalist, it's trying to answer the questions who won? By how much? And why? Here it has understood the concept of a "comeback" and has recognised that it's exciting for the reader that the points were scored with only a few seconds on the clock.
The streaky second half followed a back-and-forth first 20 minutes that featured four lead changes and five ties, including at 34 points entering the half. Kennedy Meeks led a balanced North Carolina attack with 13 points. Brice Johnson (11 points), J.P. Tokoto (10) and Paige (10) were also double-digit scorers for the Heels. Justin Jackson chipped in with eight points, four assists and a season-high three blocked shots.
Image captionTo make the article sound natural it has to know the lingo. Each type of story, from finance to sport, has its own vocabulary and style. It also has to match the house rules of the news organisation - an article written for AP might be different to one for Forbes.
For the Cardinals, Rozier led the way with 25 points, five assists and three steals. Chris Jones added 19 points on 8-for-12 shooting, as well as five assists and four rebounds. The reserves for North Carolina outscored their Louisville counterparts 20-0, with Nate Britt providing eight points off the bench. The Tar Heels also controlled the offensive glass, grabbing 17 offensive rebounds (OR% of 44.7) versus only nine for the Cardinals (OR% of 28.1).
Image captionTo figure out how to structure an article Wordsmith uses a virtual "tree". Each branch of the tree is a possible way to tell the story, by comparing the data it can decide which branch it should follow. This sentence was only included because it decided the reserves scored particularly well.
It marked the first league loss of the season for Louisville, which dropped to 14-2 overall and 2-1 in the ACC. With the win, North Carolina climbed into a conference tie with the Cardinals at 2-1, improving to 12-4 in all games.
The same game was also covered by human journalists. Compare the automated effort to their reports: ESPN FOX10TV and CBS Sports .
While the facts in the articles are largely the same, ESPN's story opens lyrically: "Marcus Paige ignored the pain in his twice-injured right foot, put his head down and drove toward the rim." Storytelling like this may take computers a while to imitate.
The same article also includes the quote: "'I said jokingly to my teammates that I was back,' Paige said." There's still some way to go before we can expect computers to source and write quotes like this. Fully understanding natural language is one of the biggest challenges in artificial intelligence. 
It's not all about sports though. Narrative Science, another company working on robo-journalism tools, can also write convincing articles automatically with their Quill system.
The excerpts below are taken from a Quill-written report on the performance of a stock portfolio.

Intelligent Machines - a BBC News series looking at AI and robotics

Financial reporting

Headline: Value Strategy Performed Better Than Benchmark in the Quarter. Subheading: Stock selection in the health care and financials sectors added to returns. Stock selection in the health care sector contributed the most to relative performance. Within the sector, stock selection in the health care equipment and supplies industry in particular boosted results. Stock selection in the financials and utilities sectors also contributed to relative results. In financials, industry allocation in real estate investment trusts (REITs) added to returns, while industry allocation in the electric utilities industry also contributed
Image captionThis article has a completely different language and style. It may not make for enthralling reading, but that's because it's been intentionally designed to match the look of similar human-written reports. In this case, Quill tries to explain why the portfolio performed the way it did by highlighting trends and other interesting or important data it finds.
As of June 28, 2013, the health care, industrials, and energy sectors were the portfolio's largest overweight positions relative to the benchmark. The most notable sector underweight positions were in financials, consumer discretionary, and materials stocks. Financials stocks were the portfolio's single largest sector allocation on an absolute basis.
Image captionThis sentence started life as a single row of data in a table. Take a look at the full data set that Quill used to create the story:
IPC Demo Sample

More on this story