Insurance companies have always been among the most data-savvy innovators, basically because of the ability to calculate risks accurately makes or breaks their business. Nowadays, insurance companies around the world are engaging with artificial intelligence to build new data-driven capabilities and next-generation insurance products.

According to Gartner, as of 2022, analysts predict that 85% of algorithms used in the insurance industry will be incorrect because of bias. This is because in most cases, the data sets used to train AI models lack enough data representation for women, people of colour, and other minority groups. This underrepresentation not only affects insurers' business decisions, but also consumers' lives. For instance, many customers in minority neighbourhoods are being charged higher car insurance premiums than those in primarily wealthy neighbourhoods.

However, bias is just one of the many challenges that insurers have encountered regarding the use of Artificial Intelligence. These challenges include:

  • 1. Model explainability

    AI models are complex in essence. Sometimes, even the data scientists that developed and trained the models do not know exactly how or why they work. It is this lack of explainability that can play a role in preventing peers within the organization (such as line-of-business leaders), regulators, and even customers from buying in. As regulatory pressure grows, as do customer expectations, explainable AI will become a must-have for any organization.

  • 2. Compliance

    In the years to come, insurance providers must become more sophisticated in their use of artificial intelligence and analytics across their operations while staying compliant with regulations, as well as protecting the personal information of their customers. Insurers’ data and AI models must adhere to multiple regulations, including data privacy rules such as GDPR and CCPA. Securing and maintaining regulatory compliance is a constant struggle for the industry.

  • 3. Achieving business value

    Many AI models never reach production for quality reasons, which makes it hard to achieve ROI on AI investments. This inability to generate business value through data can be due to several factors. In some cases, data is so sensitive that data teams are unable to access it. In other cases, when it is possible to access parts of it, the data has been stripped of so much information to protect sensitive data that it is of no use for model training or analysis. In both scenarios, companies can generate a smaller value but never reach the data full business potential.

Synthetic data is emerging as a possible solution to all of these challenges. Synthetic data sets contain the same amount of details as original customer data, but without the original personal details. Synthetic datasets without any personal information can also be tweaked to improve balance and representation, and help insurers comply with privacy regulations like GDPR and CCPA.

Synthetic data is quickly replacing traditional approaches to data anonymization including data masking, randomization, permutation or generalization. A new approach based on synthetic data is much more secure, saving money and accelerating AI model development since datasets can be created in a fraction of the time.

Also, privacy-preserving synthetic data offers an alternative for companies to overcome their inability to derive insights and models to production, and so the risks associated with model explainability and bias, since model validation can be performed using high-quality AI-generated synthetic data. For local interpretability methods like SHAP, it is important to have access not only to the model but also to a large number of representative and relevant data samples.

Use cases of synthetic data for insurers

Artificial insurance fueled with synthetic data improves several insurer pain points while simultaneously benefiting the customer. With the advent of advanced machine learning algorithms, underwriters are bringing in more information to better assess risk and offer tailor-made premium pricing. This rapid change means big things for insurers and applicants alike.

Synthetic data can be generated in many shapes and forms, allowing a variety of different use cases. Synthetic data for AI development is one of the richest use case categories with many high-value applications. According to Gartner, by 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated.

These are the use cases that work well in practice and generate a higher ROI in the insurance industry:

  • 1. Model training / retraining:

    Creating optimized synthetic training datasets enable maximum downstream machine learning task accuracy. Also, over time, AI models show signs of performance degradation. Using synthetic datasets, data science teams can retrain those models to optimize performance and model stability.

  • 2. Risk assessment:

    In the insurance industry, where speed of the purchasing process and reduction in the level of involvement by the insurer and the customer are key, the unique advantages synthetic data offers can prove transformational in risk assessment. Underwriting precision can be improved by providing accurate statistical insights or anonymizing metadata to model risk.

  • 3. Fraud detection:

    Fraud detection systems can be strengthened with large volumes of synthetic data to train detection models. Insurers can create larger datasets and then boost incidences of fraud in the data in order to train fraud detection models and improve their accuracy by picking up patterns easier.

  • 4. Eliminating bias:

    Skewness towards racial or gender bias can be corrected by generating additional records to compensate for the biased training data. This helps the classifiers improve predictions for all the patients, and generalize better to unknown data.

  • 5. Prize optimization:

    Insurance companies are forbidden to use personal data in their modeling because of the CCPA and HIPAA rules, making it a challenge to model pricing accordingly. Insurers can now turn to synthetic data to leverage geolocation data and millions of synthetic addresses, making pricing models as accurate as trained on real data.

  • 6. Experience personalization:

    Synthetic data can also be used to identify, develop and test new products that answers specific customer´s needs from data that complies with the strictest privacy and legal frameworks. It also can improve the customer journey by improving conversion rates with real-time and secure exchanges of information across departments and jurisdictions.

DEDOMENA's platform allows insurance companies to boost their time-to-data and time-to-insight by combining synthetic data generation with data enrichment capabilities to unlock the access to unlimited quality data for the entire AI development lifecycle, ensuring better performance and faster deployment. The platform comes with advanced features, such as on-premise edge computation and the ability to synthesize complex data structures with referential integrity and time series correlation. As a result, DEDOMENA can serve the broadest range of use cases with suitably generated and labelled synthetic data.