Extracting insurance requirements from leases might just be the single most difficult task a property management team faces on the path to insurance compliance.

The ambiguity of legal language used in lease agreements and an utter lack of standardization in how these documents are structured and worded are just two of the many obstacles one has to deal with when creating a set of lease-based requirements. This task is so complicated that some companies just give up and don’t do it.

Roughly 20 percent of new Jones clients never extracted insurance requirements from their tenants’ leases even though they were supposed to. One client even admitted that they weren’t collecting certificates of insurance (COIs) from their tenants because they didn’t have the resources to extract requirements and, therefore, did not know what was acceptable and what wasn’t.

Then, of course, there are COI management tools that can do the lease extraction work if you are a client. Some tools do it at no cost to customers (Jones offers free and unlimited extraction services), but most will charge you a pretty penny.

And now—enter generative AI technology.

AI tools, such as ChatGPT, are taking the world by storm in multiple ways, but what matters to us today is their potential ability to revolutionize the process of abstracting leases and extracting insurance data from documents.

Can we reach the point where creating lease-based requirements for your building will become a matter of pasting a script into an AI tool window and receiving a usable data set?

Can Proptech tools provide better services to CRE by shipping faster-than-ever, more-accurate-than-ever lease extractions?

Will we see a world where property management teams have the ability to do lease extractions in-house quickly and easily, thanks to generative AI?

The Jones team has been testing ChatGPT and other AI tools for over a month, seeking answers to the questions above.

In this blog post, we’ll list the capabilities and limitations of these tools and discuss how AI technology can benefit property management teams with lease extraction tasks.

Note: interested in exploring how Jones can help you automate your COI management end-to-end, including extraction of insurance requirements?

Talk to our team of experts!

Let’s Chat!

Does ChatGPT do a good job of extracting insurance requirements?

Part I—Human gives prompts to ChatGPT (no coding)

For the first part of our experiment, we chose not to use any code or help from our tech department. The goal was to see if a series of prompts from a non-tech person (e.g. a property manager) could get ChatGPT to produce a usable set of insurance requirements.

Alex Kario (Director of Operations at Jones) gave ChatGPT a sample lease agreement and asked it to extract insurance requirements.

The first ChaGPT output was too verbose to be useful so Alex asked the AI to rewrite it in a bullet-point format. The second output produced hallucinated data based on a Harry Potter universe (no, we are not making this up).

ChatGPT- hallucinated data

ChatGPT is known for its occasional hallucinations, so Alex ignored it and repeated the prompt. This time, ChatGPT produced a more concise version.

ChatGPT extracts insurance requirements

Here’s our review of the results.

It’s still too verbose and contains lots of unnecessary fluff:

ChatGPT produces verbose results when extracting insurance requirements

It lists as requirements things that would not be evidenced on COIs or endorsements:

ChatGPT produces data that should be omitted when abstracting leases

This irrelevant information could be an issue for a property management team as they will mistakenly search for this on COIs.

Take the bodily injury example above. This isn’t something you’d ever find on a COI. What you’d be looking for is Each Occurrence, which is a combining single limit of bodily injury and property damage.

Here are examples where ChatPT did a decent job:

CHatGPT - useful example

Part II — We use coding and prompt engineering to train AI

It was clear that in order to get more accurate results, we needed a more scientific approach. Michael Rudman, CTO & Co-Founder of Jones, used ChaGPT API (for non tech-folks, this is basically ChatGPT for developers) for his round of tests.

Here is the process in a nutshell:

We’ve created a list of 80+ most common insurance requirements. For every requirement, we’ve run a series of 10-15 prompts.

One of the traits of ChatGPT that makes it unreliable is the fact that it provides incomplete data. For example, if a lease agreement lists four Additional Insured entities, ChatGPT might only provide three and omit one of them.

Our goal was to come up with a finite set of different prompts for the same insurance requirement that will get the most accurate and complete answer possible.

For example, here’s how a series of prompts would look like for the Additional Insured for General Liability requirement:

— What are the Additional Insureds listed on the following lease?
— Are any company names mentioned on the following lease?
if yes, follow up with : Are they referring to Additional Insureds?
— Are there addresses mentioned on the following lease?
if yes, follow up with: Are they referring to Additional Insureds?
— Is the [Company Name] mentioned as Additional Insured in the following lease?

Example of JSON file output

For every requirement, we repeated a similar process programmatically with several lease agreements.

Next, we trained ChatGPT to provide concise answers by building an NLP logic.

It took a ChatGPT answer like “The lease requires the tenant to have $1,000,000 in GL each occurrence” and extracted it to the key-value pair—“GL Each Occurrence” and “$1,000,000” respectively.

Now that we had key values for 10+ prompts, we built a classifier for every insurance requirement: for example, a logistic regression for binary questions such as “does this requirement exist?” or an SVM (support vector machine) for questions about a list of entities, such as a list of Additional Insureds.

Here’s an example of an input for logistic regression:

— eight prompts said “GL each occurrence – $1,000,000”
— one prompt said “GL each occurrence – not required,”
— one prompt said “GL each occurrence – required, no amount”,

The logistic regression model would then return 95 percent confidence that “GL each occurrence – $1,000,000” is required.

Key Takeaway

This whole process helped us achieve up to 70 percent accuracy on average for most lease extractions— but not higher.

We’ve concluded that ChatGPT can’t fly solo. However, it also became clear that it can be a valuable addition to our existing process if we use it as a support-decision mechanism.

Do you know how Grammarly can make a suggestion of how you can rephrase something? What we hope to achieve is a similar algorithm that highlights a part on a lease agreement and provides a suggestion for extracting an insurance requirement. The human expert then reviews it side-by-side and either accepts the suggestion or modifies it.

In our estimation, the overall speed would increase by at least 5-15 percent. It’s important to note that the Jones Team specializes in lease extractions and our current expertise allows us to complete an extraction in eight minutes on average.

What we’ve learned along the way

The human review will remain essential for quality control.
With enough prompt engineering, AI can extract and structure requirements to an impressive extent, capturing up to 70 percent of the check-up items.

However, leases are just not standardized and parsable enough to fully let AI go unchecked. Therefore, it’s not good enough to forgo the final review by a human expert.

How long until generative AI can gain the ability to do quality control without human involvement? Given how rapidly AI changes, we can only take an educated guess, which is—this won’t happen in the foreseeable future (let’s say, for at least 6-12 months), which is a great segway to our next point.

ChatGPT output quality varies wildly depending on how the lease is written
AI does best when requirements are written in a straightforward way and especially when the lease already structures clauses in numbered or lettered bullet points. But the more convoluted and ambiguous the lease language is the more AI struggles.

Take the Business Income example from our experiment. The language in the lease—”…at least 50,000,000.00 for loss of business income and continuing expenses…”—doesn’t specify whether the amount refers to policy limits or self-insured retention. In order to interpret this clause correctly, we need to go all the way to the beginning of the paragraph, which is something that AI will struggle with.

extracting insurance leases from leases requires understanding of context

This is exactly why human quality control will remain necessary for a long time—lack of context and ambiguous language in most lease agreements are to blame.

Can Property Management Teams Use ChatGPT For Extracting Requirements In-House?

The answer is a very cautious yes.

ChatGPT could speed things up significantly for property managers (eventually, up to 50 percent)—if they have time and resources to invest in prompt engineering and in post-extraction quality control.

Take for example the challenging requirement of Waivers of Subrogation. This item is particularly difficult to extract for two reasons:

—An ambiguity of which policy they are meant to apply to:
There is normally a reference to property damage, but it might be unclear whether the author intended first-party property damage (covered by Property Insurance) or third-property damage (covered by General Liability).

— Sometimes the lease itself waives the signers’ rights of subrogation:
Knowing whether the lease is waiving those rights or is just requiring that the rights be waived as evidenced by an endorsement is a very nuanced but important distinction to make because that will determine whether a Waiver of Subrogation requirement needs to be extracted or not.

How would a property manager go about training AI to improve its performance in the task of extracting the Waiver of Subrogation from leases correctly?

They could provide specific examples of language used in lease agreements to refer to a Waiver of Subrogation, along with clear explanations of its meaning and how it relates to the relevant insurance policy. This can help the model learn the relevant patterns and improve its accuracy in identifying the relevant clauses in a lease agreement.

The good news is that generative AI is definitely trainable in tasks like the above. This is something that the Jones team is currently working on (in partnership with Sensible, a company that specializes in machine-learning applications). We will continue sharing our learning with the community as we get better results at extracting insurance requirements with the help of AI.

Final verdict

Overall, ChatGPT can make things faster for both property management teams and COI management tools’ teams—as long as the human team that reviews the end result is well-versed in insurance and knows exactly what areas they need to double-check.

Ready to automate COI collection at your company?