Building abstract
© Munich American Reassurance Company

The Future of Group Underwriting

April 2021

The Group Benefits Market is highly competitive, with carriers striving to offer the best product, price, capabilities, and service. Additionally, brokers expect rapid quotes and often provide limited background data to enable underwriters to quickly and accurately assess the risk. How does a carrier ensure the rates they quote for Group Life, LTD, and supplemental benefits are appropriate for the risk when so much is left unknown?

Furthermore, what if a carrier could rapidly produce competitive offers on attractive business and, at the same time, identify less attractive business for standard processing? What if underwriters had access to a tool that analyzed available census data, appended that data with a myriad of additional valuable data elements, and developed a risk score with complete confidence? What if that risk score could be translated to a rate that enabled a response to the broker in a matter of hours?

This study illustrates how Munich Re's data analytics team developed a comprehensive approach to streamline and transform the competitive group quoting process.

Data has always been integral in the group insurance and quoting process. However, as digital data becomes more ubiquitous, traditional sources will not be enough to keep up with the ever-changing landscape. Carriers need to expedite the expansion of their digital capabilities to generate real growth. A data-driven approach using predictive models for rapid and accurate risk evaluation, leveraging third-party data to enhance traditional census data, is the future of group underwriting.

The following case study uses actual reinsured LTD census and claims data combined with third-party data to build a predictive model and assess the impact on pricing. This study illustrates how our data analytics team developed a comprehensive approach to streamline and transform the competitive group quoting process. The results demonstrate a proof of concept regarding the use of these models as a supplement to, or replacement of, current methods.

The key objective is to identify new features that can segment claim incidence better than traditional variables alone and rapidly incorporate them into underwriting and pricing. We accomplished this by developing a predictive model using a subset of historical Group LTD reinsurance data enhanced with third-party data. Since there is no single industry incidence rating manual, a model built only on traditional variables – such as age, gender, occupation, etc. –  was our baseline for comparison against models built on traditional and third-party variables. This third-party data model aims to reliably predict claim incidence better than the model that uses only traditional variables.
For this case study, Munich Re matched employer census data and fifteen years of historical claims to a U.S. consumer information database that includes certain demographic attributes. Unique identifiers such as name, date of birth, and state of residence were used to perform the match for a final match rate of approximately 75 percent. The market offers many tools that can be leveraged to match individuals to third-party datasets, including those that return sensitive information in an aggregated, de-identified form, which can work seamlessly for the Group market.

Feature selection is the process of reducing a large number of variables available down to those few that are the most predictive for the goal of interest. For example, fields with many missing values are typically a challenge to use in any model. Users of the final model may also prefer predictor variables. The relationship of the variable to the outcome is easy to interpret, as this helps build confidence in the reliability of the model’s output. While the application of third-party data to predicting claims is new in the Group Underwriting space, variables found to be useful in other lines of business are an excellent place to start. Munich Re is constantly evaluating new data sources and their value for mortality and morbidity risk assessment and applying that knowledge across multiple lines of business.

Our goal for this case study was to understand additional drivers of claim incidence beyond the traditional variables. We emphasized retaining only a small number of novel predictor variables during feature selection. Working with a small subset of features facilitated understanding each feature's impact in isolation while also controlling current pricing factors. The novel features applied in this study include data that is generally not found in a census. The variables that prove most valuable will vary by carrier, target market, and distribution. 

When building predictive models, it is best practice to experiment with various models and configurations to determine the appropriate approach. We considered both parametric and tree-based regression methods for this study. Parametric models such as generalized linear models (GLMs) are based on distributional assumptions about the data. The modeler must predefine certain relationships expected in the data, such as variable interactions or non-linear relationships. Tree-based methods are helpful when there is no prior knowledge of variable relationships. These models learn patterns directly from the data, with the tradeoff generally requiring a more extensive training dataset for increased accuracy.1  Since the goal is to incorporate novel features in claim incidence segmentation, we selected a tree-based method (“random forest”). However, we have also successfully used GLMs in similar modeling exercises.

We designed the traditional and third-party features in this case study as the model inputs and the claim incidence as the model output. Finally, 70 percent of the available data was randomly selected to train the random forest, and the remaining 30 percent was set aside for later use in model validation. 

Validating the predictive model on data that was not used to build the model is crucial in any analytic project. This gives the user confidence in how well the predictive model will perform on new cases. We assessed the models by comparing actual to predicted claim incidence across age, gender, and industry groups. Figures 1 and 2 below compare the mean absolute error of the traditional data model to the third-party data model across the different age bands for men vs. women. The third-party data model produces a lower error rate across most age bands for both men and women. Thus, our third-party data model met our goal and predicts claim incidence better than models with traditional variables alone. Using only the limited data available to a reinsurer and readily available external data, we built a model that better predicts disability incidence than traditional data. These insights can be used to more accurately estimate claims and set better premiums for key risks. 
Figure 1. Model Error: Monthly Benefit: 12.5K-15K Women
Figure 1. Model Error: Monthly Benefit: 12.5K-15K Women
Figure 1. Model Error: Monthly Benefit: 12.5K-15K Women
Figure 2. Model Error: Monthly Benefit: 12.5K-15K Men
Figure 2. Model Error: Monthly Benefit: 12.5K-15K Men
A different way of visualizing this information is to compare the actual claim experience to the traditional vs. third-party data models' predicted claim incidence. The plots in Figures 3 and 4 compare the A/Es using the model with traditional variables as one expected basis to the model with traditional variables and third-party data as the other expected basis across different age bands for men vs. women. The A/Es from the third-party data model are closer to 100 than the A/Es from the traditional data model, which coincides with the lower error rates. On the other hand, the A/Es from the traditional data model are slightly closer to 100 for men under 40 and over 60 years old. Views such as these are crucial to understanding where the models perform well and where they do not in order to apply the insights appropriately and target groups properly.
Figure 3. A/E: Monthly Benefit: 12.5K-15k Women
Figure 1. Model Error: Monthly Benefit: 12.5K-15K Women
Figure 3. A/E: Monthly Benefit: 12.5K-15k Women
Figure 4. A/E: Monthly Benefit: 12.5K-15K Men
Figure 4. A/E: Monthly Benefit: 12.5K-15K Men

Since the third-party data model performs well across different subgroups, we incorporated the results into manual rates by applying discounts or loads for more accurate pricing, resulting in cost savings and improved experience. Figure 5 shows the aggregated results for two employers in the data set. Blue and orange bars represent the predicted incidence rate from the traditional data model and the third-party data model. The grey bars represent the actual incidence rate for each employer.

Employer 1 performs better than both models predicted, meaning Employer 1 could have been priced more competitively. In fact, the third-party data model predicts a 10% lower incidence rate than the traditional data model. Had it been used in pricing, it would have allowed for a more competitive offer to Employer 1. On the other hand, Employer 2 could be underpriced since the claim experience is 30% higher than that predicted by the traditional data model. Suppose this result was used to guide pricing. In that case, it could have resulted in a more accurate price for this higher risk employer – improving profit margins if the business was placed, or leading to not pursuing this less attractive business if a higher price could not be reached.

Predicted vs Actual Incidence Rate by Employer
Predicted vs Actual Incidence Rate by Employer
Predicted vs Actual Incidence Rate by Employer

A predictive model has limited effectiveness if it cannot fully integrate into existing workflows. Incorporating third-party variables can allow the carrier to identify specific cohorts that perform better or worse and price the case more accurately. An automated pricing solution developed to capture the employer census, append third-party data and perform calculations can return the appropriate load within minutes of the received census. Integrating with external vendors can take many forms. Munich Re can work with carriers on the easiest way to incorporate third-party data and model predictions into a pre-existing underwriting workbench. 

It’s important to monitor data hit rates and model performance over time to ensure the current behavior meets the expectations set during any training or proof of concept stages. Periodic re-evaluation is also necessary when consumer behavior or market conditions change to ensures consistent, high performance from the data and models employed.

The group market has been relatively flat for many years, and group carriers are constantly competing for the same business. Price and service are two of the most critical factors in winning new business. Making risk decisions with limited data can drive carriers to underprice business and ultimately sacrifice profits.

As the group market continues to evolve, the status quo is no longer an option. Our research demonstrates there are better, more reliable ways to price group risk. Munich Re Life US has deep knowledge of the Group market and proven expertise in building predictive models and leveraging third-party data sources. As a premier reinsurer in both group life and disability, we are focused on transforming how Group carriers evaluate risk – partnering with carriers to target the most profitable business and accelerating the production of accurate quotes. 

Contact the Authors
Dawn McMaster
Dawn McMaster
2nd Vice President, Business Development
Group & Living Benefits
Cynthia Clement
Cynthia Clement
Manager
Integrated Analytics
References
1 Russell, S. J. (2016). Artificial intelligence: A modern approach. Harlow: Pearson.