AI makes the case for legal tech firm25 February 2020
Artificial Intelligence (AI) researchers at the University of Sheffield Advanced Manufacturing Research Centre (AMRC) used smart software and powerful IBM hardware to help a legal tech firm de-risk a ‘deep learning’ solution to time-consuming paper-based processes experienced by clients.
Sheffield-based DISPUTED.IO’s Casefunnel product enables law firms to work with high volumes of clients, avoiding offline form filling, making working with clients easier and automating legal workflow.
To improve productivity within law firms, the company asked the AMRC to explore how AI-driven solutions and digital automation technologies could greatly speed up the time-consuming process of email and phone by moving to a client-orientated online user experience.
Steven Shinn, chief executive officer and co-founder of DISPUTED.IO, says the expertise of research engineers at the AMRC proved very valuable to the company: “Access to AMRC AI developers to build a proof-of-concept for interpreting legal documents provided much needed validation that a problem space our customers face can be solved with Artificial Intelligence. Without access to the team at the AMRC we would have carried all the risk of this exploratory work.”
Sean Wilson, technical lead for AI at the AMRC’s Integrated Manufacturing Group, investigated how machine learning could identify relevant information in documents so that it could be automatically extracted and used with text recognition software, or be directly stored in a database, thus removing the need for manual inputting.
Sean said: “Companies deal with lots of client documents which are often either old or have been scanned and DISPUTED.IO believed AI could be a way of gathering information from these documents. We were asked to help put together a proof-of-concept to show that an AI solution was possible.
“We looked at whether advances in computer vision algorithms that use ‘deep learning’ - a type of machine learning - could be harnessed to develop and train a model that is able to detect features in documents.
“The challenge with any AI solution however is that it’s neither simple nor quick unless you use something like the IBM cell we have here at the AMRC. This was the first dedicated piece of work we did for an SME using this very powerful hardware.”
IBM’s PowerAI Vision software, combined with the computational muscle of the IBM Power9 AC922 server, allowed Sean to make speedy work of developing algorithmic model that can recognise features within a form but before he could begin testing the company’s theory, he first needed the right data to train the model.
For a model to build up its learning this usually requires about 5,000 labelled data examples which can take months to gather for acceptable accuracy. But Sean was able to slash both the amount of data and time needed by using a technique called ‘transfer learning’ - which takes a pre-trained model and makes clever adjustments so it is able to re-learn another task. It also meant a much smaller dataset could be used.
“There’s quite a lot going on in legal documents so we went through each PDF page and marked up all of the different data areas, and there was somewhere between 15-30 entries on each page,” said Sean. “Once the data labelling was complete it was super easy - the IBM box did the hard work.
“The model we chose to use was ‘Faster R-CNN’; it takes it a little bit longer to analyse images but it understands what is in the image much more accurately and that’s what we wanted.”
The data was split with 80 per cent used for training and 20 per cent for validation. This means for every training ‘pass’, the model learns features to detect on a randomised 80 per cent of the documents and the accuracy is then tested against the remaining 20 per cent.
“We ran 4,000 iterations on the IBM server and a NVIDIA Tesla V100 graphics card was dedicated to the process which allowed the training to be completed very quickly,” said Sean. “Without the IBM power box it would have taken a month to generate, train, and test the deep learning model. Doing it this way, we were able to upload and train the model in just over an hour with up to an 86% accuracy.”
Steven said Sean led the project with ‘great enthusiasm’, adding: “He rapidly found ways to use IBM Watson (a suite of enterprise-ready AI services) to produce a full proof of concept. We have since moved on and found other uses of Watson to meet other client needs.”
Sean said the outcome was a great result: “In a very short time the AMRC was able to provide the company with the proof of concept it needed to proceed with commercialising the new Casefunnel feature. By using the power of AI, which is a cornerstone of Industry 4.0 technologies, we helped an SME to do rapid innovation, testing their ideas in hours rather than weeks so they could determine if it was an idea worth pursuing further.”
The AMRC is part of the High Value Manufacturing (HVM) Catapult and the project was paid for using funds from the HVM Catapult as part of a commitment to working with small and medium-sized businesses.