Syntheia and the Future of Legal AI - Part 1

Horace Wu is the founder and director of Syntheia Pty Ltd, a legaltech platform that has just launched to market in July 2020, providing functionalities that represent a new era in legal AI. I recently interviewed Horace and his partner, Martin Karlsson of Lateral.io, for an ILTA podcast, but over the course of the conversation it became clear there was far more to say than could easily fit within the target 15 minutes. Happily, Horace agreed to catch up again for an in-depth discussion on Tower of Babel.

Horace, during the podcast, you talked about how much the landscape around AI has evolved over the last five years. Can you expand on that?

The landscape around AI has changed a lot in the last five years. Both buyers and sellers of AI technology have matured. Your readers have undoubtedly heard the buzz about the “AI revolution” and the “rise of the machines”. While some of that is sensationalized hype, there has been a lot of real and concrete progress made in computing science and data science in the last few years. Many of these changes have already brought measurable benefits to industries. A few examples you might have experienced include improving computer vision that powers self-driving cars, voice recognition and speech synthesis that feel like talking to a real person, hyper-personalized shopping and content recommendations, and computers generating sensible summaries of long text.

The key underlying advancements for the sellers of AI technology have been:

1. New and improved data science techniques - like deep learning, transfer learning, and attention-based NLP (and more open source libraries available for experimentation);

2. Better hardware that have multiplied computational power;

3. Increased access to better data; and

4. Cloud-based SaaS making technology more accessible.

These technological advancements are beginning to seep through to the legal profession.

When you say these have already started to impact the legal profession, where are you seeing that?

When we look at the legal profession specifically, we are only starting to feel the impact from recent AI advancements. Before we talk about these technology advancements, let’s briefly address the demand-side of the equation or the buyers of legal AI technology - because they are a fundamental part of painting the picture on the impact to the legal profession. There is a softly spoken sentiment in the market that “legal AI” has not met expectations in the past. In the last few years, the buying behavior in the legal profession has changed.

One common trend we have observed with law firms, in-house departments and even alternative legal services providers is a steady maturing of buying behavior. The motivation of buyers is changing from picking up shiny toys with a sense of excited curiosity to making procurement decisions based on identified problems they want to solve. In this context of buyer behavior becoming more mature, we have noticed two trends on the supply or provider side of the equation:

  • First, the increased breadth and depth of available technology. New software are going beyond traditional technologies which use expert rule systems or older NLP techniques like TF-IDF. The incredible advancements in the machine learning and natural language processing technologies is enabling us to solve more difficult problems than before. Many of these technologies are open sourced, so it has become much easier and cheaper for vendors of AI technology to experiment with the latest advances; and

  • Second, more agile methods for adapting and delivering technology. Some vendors are starting to change how software is delivered, with more integrated solutions and products that are updated after observing how end users actually use the tools provided to them.

The result is that AI technology can now do a lot more than before, and better solve specific problems that buyers have identified.

What do you think has brought about these changes? Is it possible to look at the way the market has been impacted and tie it back to specific causes?

There has been a concurrence of causes that have led to these changes. Just to list a few of them - better technology, like advances in deep learning and natural language processing, better use of techniques for agile software development, better understanding of problems that should be solved, and cloud-based computing. From where we stand as a vendor, we are focused primarily on changes to natural language processing - because NLP is one of the core technologies used in building legal AI software.

A quick recap on what NLP is - NLP is the use of a computer to process and analyze human language. This may sound easy, but NLP is actually very hard because of the ambiguity of meaning and a slew of other linguistic challenges.

In the last few years, there has been rapid and unprecedented improvements in NLP. We think the advances to the field of natural language processing is one of the key causes that has had, and will continue to have, an enormous impact on legal technology.

Can you tell me a bit about what that history has looked like?

Love to. Let’s expand on what we covered very briefly on our podcast. We can break the history of NLP into three periods - before 2012, between 2013 and 2018, and after 2018.

Before 2012

Before 2012, people were using handwritten rules (like complex “if-then” rules) or older statistical techniques (like counting the frequency of words in a document). Advanced techniques prior to 2012 included using support vector machines to classify samples of text, and using syntactic analysis to tag part-of-speech in a sentence. Generally, these older techniques treated words like discrete entities, which has certain limitations. Nowadays, NLP practitioners generally do not treat words only as discrete entities.

Between 2013 and 2018

A seismic shift happened in 2013, with an important idea in linguistics that words (or expressions) that can be used in similar ways are likely to have related meanings. There is a famous quote from John Firth that is a good illustration of the philosophy for the change in this period, “you shall know a word by the company it keeps.” The technique which implemented that philosophy is the use of distributed representations of words to capture their contextual meaning as vectors, or “word embeddings”. Famously, Tomas Mikolov used neural networks to create Word2vec - which is an implementation of the idea that words can be represented as mathematical vectors.

To visualize how this works, imagine a 3D space with x, y and z axes, filled with points in that 3D space, where each point is a word. Now, imagine this space with all its words are scattered across 300 dimensions instead of three dimensions. These vectors turned out to be very good at capturing certain semantic meaning of words. Imagine a cloud of words like we mentioned, where words that mean similar things are clumped together in that cloud, and words that mean different things are pushed further apart. For example, the word “tiger” would be near the word “lion”, and both of them would be far away from the word “computer”. More than just clumping words together, the vectors somehow captured relationships between words. A famous example is where you take the vector for the word “king” and minus the vector for the word “man”, and then added the vector for the word “woman”. Somehow, in that mathematical model, the resultant vector landed very near to the word “queen”. Following Word2vec, there was GloVe from Stanford in 2014, and Fasttext from Facebook in 2016, which build on the same principle.

Around this period, researchers were also using machine learning techniques like recurrent neural networks and long short-term memory to propel the field forward. These techniques allowed people to infer vectors from sequences of words, for example, sentences can be meaningfully represented as vectors using RNNs and LSTMs.

After 2018

2018 was the next inflection point for NLP. A new NLP concept and a technique for implementation of that concept allowed us to represent the meaning and relationship of words and sentences much better than before. The conceptual leap was “attention”, and the implementation architecture was “transformers”. “Attention” is the idea that different words in a sentence have different levels of importance, and machines should give different weights to words based on their importance. With attention, vector representations of text are more meaningful because they pay attention to those parts that are more important.

The “transformer” architecture was introduced by Google in their paper, “Attention is all you need”. Shortly after that, the Google team introduced BERT (Bidirectional Encoder Representations from Transformers). BERT was a landmark moment. BERT broke several records for state of the art back in 2018, and could handle multiple language-based tasks - like translation, reading comprehension and sentiment analysis. Since then, there have been no less than 170 papers that are related to or that build on BERT, and many more on other transformers. When people talk about GPT-3, it is also based on the transformer architecture.

For developers, perhaps the greatest impact was that BERT, and many other transformer models, were open sourced. Anyone can download and use language models that were pre-trained on massive datasets. This enabled highly effective transfer learning in NLP. This means we can rapidly experiment, training specialized models to understand the meaning of legal text with fewer labelled training data, and achieve state of the art performance that wasn’t possible a few years ago.

It is about how you use technology

We should add an important note - over the last two years, we have seen tremendous progress in NLP. Technologies like BERT are amazing, but they tend to be trained in a general-purpose way and need more finetuning to address specific business needs. To make use of the capabilities of these new techniques to solve complex problems for lawyers, we build AI pipelines that stack together a number of specialized modules, each solving a different problem. You can imagine them like a production line for a car, one module will weld the chassis, the next will drop in the engine, and another one will bolt on the doors.

Tune in to ToB later this week for part 2 of this interview.

Syntheia.png