Synthetic Data Use Cases in Financial Industry

A couple of weeks ago, in our previous post: the value of synthetic data in financial services, we discussed how the generation of synthetic data in banking has become a necessary and fundamental condition to be able to extract the full potential from data and the further trained machine learning models in order to satisfy customers and business goals without compromising privacy at all.

Synthetic data tools must be aimed at guaranteeing this privacy, in addition to preserving the quality of the original data, allowing the banking sector to enhance all the uses that the original data can offer.

The applications of this technology within advanced analytics and machine learning development are extensive, from improving fraud detection and market simulations to data exchange and increased collaboration across teams.

Under GDPR, the possibility of exchanging personal data between banks and inner departments is totally limited. As a consequence, the possibilities of getting relevant insights and optimising the potential of data decreases significantly.

Synthetic data is compliant with all the GDPR requirements and allows companies, including banks, to use all the data they need avoiding legal or ethical risks and without losing any informational or statistical property Thus this artificial data ensures the optimal use for Machine Learning and Artificial Intelligence development. Synthetic data also eases data exchanging processes between heterogeneous institutions like partners or even with the government.

Main use cases of Synthetic Data for the banking sector

Synthetic data can help financial institutions stay ahead of their competitors by being effective and securely leveraging their data assets extracting additional value from them. Developers, Engineers and Data Scientists, to mention a few, will be able to use synthetic data with complete confidence for a wide variety of purposes, from training machine learning algorithms or improving deep learning models to test cloud computing performance, always knowing that they are working with high quality data very similar to the real one in the sense of value. But, the most important takeaway is that they are never putting the client's privacy at risk and complying with the different data protection regulations.

The use of synthetic data in banking is very diverse. Next, we will dive into some use cases that shows the wide versatility of synthetic data in this sector:

Quick evaluation of suppliers and solutions

Banks analyse a myriad of providers when they need to adopt and integrate third-party technologies. To validate the solutions, they must always be fed with quality data, not just "dummy" data, with the aim of being able to understand first-hand the scope and performance, beyond what the provider sells in their presentations. Until now, it takes months to process and protect an insignificant sample that is not always useful enough to validate performance, being a simple and inadequate solution the use of fakedata limiting the analysis and often leading to results below expectations. These drawbacks can be solved by using synthetic data.

Increased data volume

If we want a model that makes accurate predictions, it needs to be trained on a large volume of representative data. In cases where a dataset is unbalanced, incomplete, or sparse, it can be supplemented with synthetically generated data.

For example, to detect certain types of fraud, enough fraudulent examples are needed for the algorithm to learn correctly, since actual observations may be limited. These real observations, when complemented and enriched with synthetic data, help to achieve optimal results.

Data Retention

Privacy protection laws and regulations can limit the retention period of personal data and give consumers the right to request the deletion of all their information in the entity.

Even though the original data is no longer in the custody of the entity, data has already been generated by learning from customer behaviour, removing the limitations of how long or for what purpose the data (artificially generated) can be used. Therefore, this information can be reserved for future analysis, something that was not feasible until now.

Data monetization

Since privacy compliance and information security regulations will no longer be an issue, the new artificially generated data can be used to generate new revenue streams. The banking sector can take their Open Data and data monetization strategy even further, since synthetic data enables them to pack this data and sell it to third parties without the need for express consent. In addition, this data can be enriched and integrated into pipelines or real-time processes to offer greater solution coverage and thus further increase potential revenue.

Creation and training of new models

Without access to new data it is not possible to develop new models or update pre-existing ones. When a model has already been trained, but after a period of time it drops in performance and accuracy, it has to be re-trained or fine-tuned with new data that addresses the limitations that affect the predictions: change in patterns, different behaviours, etc.

For example, suppose your model does not correctly predict the spending category of a transaction because consumption and business habits have changed. To improve this model, an updated synthetic dataset is needed that includes the information of the last months to fine-tune it, reaching the expected thresholds and metrics again.

Migration to the Cloud

Data migrations from on-premises environments to the cloud have a high risk and can lead to problems such as unidentified security vulnerabilities, failures of critical services, human errors due to lack of knowledge about the new infrastructure or performance bottlenecks, like cloud sprawl.

It is extremely important to thoroughly evaluate the new environment and its components before migrating. Obviously, large volumes of productive data can not be extracted or used, so synthetic data is the best alternative given is almost identical to real data and companies can generate all the volume needed.

Quality Assurance

The use of synthetic data is not new for the software quality departments at financial institutions. However, the data generated based on rules is not useful for testing Artificial Intelligence applications because it does not represent real-world behaviours.

The Quality Assurance teams (QA) will no longer have to wait for months to validate the security, functionality, and effectiveness of new features, pieces of software and data-centric applications. With the use of synthetic data will be able to have data that behaves just like the real one, without the need to carry out internal validations for information security and regulatory compliance.

Conclusion

Without any doubt, the application of synthetic data in the banking sector enables banks and financial institutions to solve a huge variety of use cases reducing timing and risks.

As we have been addressing previously in other posts of this blog, due to the ability to preserve the value, with synthetic data the possibilities of the full potential from the data assets is optimal. We look forward to seeing the possibilities of this technology continue to grow in the coming months.