A big focus of enterprise work these days is to automate human tasks for greater efficiency. Computer giant IBM asks in its most recent research whether generative artificial intelligence (AI), such as large language models (LLMs), can be a stepping stone to automation.
Called “SNAP”, IBM’s proposed software framework trains an LLM to generate a prediction of the next action to take place in a business process given all of the events that have come before. Those predictions, in turn, can serve as suggestions for what steps a business can take.
“SNAP can improve the next activity prediction performance for various BPM [business process management] datasets,” write Alon Oved and colleagues at IBM Research in a new paper, SNAP: Semantic Stories for Next Activity Prediction, published this week on the arXiv pre-print server.
IBM’s work is just one example of a trend toward using LLMs to try to predict a next event or action in a series. Scholars have been doing work with what’s called time series data — data that measures the same variables at different points in time to spot trends. The IBM work doesn’t use time series data, but it does focus on the notion of events in sequence, and likely outcomes.
SNAP is an acronym for “semantic stories for the next activity prediction”. Next-activity proediction (the NAP part of SNAP) is an existing, decades-old area of systems research. NAP typically uses older forms of AI to predict what will happen next after all the steps up to that point have been input, usually from a log of the business, which is a practice known as “process mining”.
The semantic stories element of SNAP is the part that IBM adds to the framework. The idea is to use the richness of language in programs such as GPT-3 to go beyond the activities of traditional AI programs. The language models can capture more details of a business process, and turn them it into a coherent “story” in natural language.
Older AI programs can’t handle all the data about business processes, write Oved and team. They “utilize only the sequence of activities as input to generate a classification model,” and, “Rarely are the additional numerical and categorical attributes taken into account within such a framework for predictions.”
An LLM, in contrast, can pick out many more details and mold them into a story. An example is a loan application. The application process contains several steps. The LLM can be fed various items from the database about the loan amount, such as “amount = $20,000” and “request start date = Aug 20, 2023”.
Those data items can be automatically fashioned by the LLM into a natural language narrative, such as:
“The requested loan amount was 20,000$, and it was requested by the customer. The activity “Register Application” took place on turn 6, which occurred 12 days after the case started […]”
The SNAP system involves three steps. First, a template for a story is created. Then, that template is used to build a full narrative. And lastly, the stories are used to train the LLM to predict the next event that will happen in the story.
In the first step, the attributes — such as loan amount — are fed to the language model prompt, along with an example of how they can be turned into a template, which is a scaffold for a story. The language model is told to do the same for a new set of attributes, and it spits out a new template.
In step two, that new template is fed into the language model and filled out by the model as a finished story in natural language.
The final step is to feed many such stories into an LLM to train it to predict what will happen next. The conclusion of this combination of stories is the “ground truth” training examples.
In their research, Oved and team test out whether SNAP is better at next-action prediction than older AI programs. They use four publicly available data sets, including car-maker Volvo’s actual database of IT incidents, a database of environmental permitting process records, and a collection of imaginary human resources cases.
The authors use three different “language foundational models”: OpenAI’s GPT-3, Google’s BERT, and Microsoft’s DeBERTa. They say all three “yield superior outcomes compared to the established benchmarks”.
Interestingly, although GPT-3 is more powerful than the other two models, its performance on the tests is relatively modest. They conclude that “even relatively small open-source LFMs like BERT have solid SNAP results compared to large models.”
The authors also find that the full sentences of the language models seem to matter for performance.
“Does semantic story structure matter?” they ask, before concluding: “Design of coherent and grammatically correct semantic stories from business process logs constitutes a key step in the SNAP algorithm.”
They compare the stories from GPT-3 and the other models with a different approach where they simply combine the same information into one, long text string. They find the former approach, which uses full, grammatical sentences, has far greater accuracy than a mere string of attributes.
The authors conclude generative AI is useful in helping to mine all the data about processes that traditional AI can’t capture: “That is particularly useful where the categorical feature space is huge, such as user utterances and other free-text attributes.”
On the flip side, the advantages of SNAP decrease when it uses data sets that don’t have much semantic information — in other words, written detail.
“A central finding in this work is that the performance of SNAP increases with the amount of semantic information within the dataset,” they write.
Importantly for the SNAP approach, the authors suggest it’s possible that data sets will increasingly be enhanced by newer technologies, such as robotic process automation, “where the user and system utterances often contain rich semantic information that can be used to improve the accuracy of predictions.”