Synthetic Data Can be a Real Solution for Analysis and Security
Facteus (Formerly ARM Insight) Named a Top 10 Alternative Data Solution Provider by Capital Markets CIO Outlook Magazine
By: Ginger Schmeltzer, Sr. Analyst, Aite Group
Published December 22, 2020 on PaymentsSource
In the midst of the COVID-19 pandemic, we are all constantly being bombarded with graphs and tables showing the impacts of the Coronavirus to date and predicting the levels of infections, hospitalizations and deaths over the coming months.
These charts and graphs reported in the news are based on a mix of actual data reported by hospitals and governments as well as synthetic data created from the reported data. Epidemiologists and other data analysts extrapolate out the actual data, based on analysis and assumptions, into a synthetic data set that is used to build the projections we are all so eagerly following in order to find out when we can start returning to our normal lives.
Data intelligence is critical not only to managing the healthcare impacts of COVID-19, but just as importantly, for informing businesses navigating disaster planning, business continuity, product development and client servicing during this crisis.
Businesses need broad access to both internal and external data to understand the financial risks to the business as well as the health risks to their employees and customers, to make prudent and practical decisions in order to survive the (hopefully) short-term revenue losses and to plan for the changes in business practices that will likely be with us for a long time to come. Access to comprehensive data is more critical than ever; and ensuring that the data is secure has never been more important as in this time when our vulnerability – in terms of our health, our finances, our overall safety – is the highest it has ever been.
What is Synthetic Data?
Synthetic data is an artificial data set that mimics the original data; however, it removes the personal or other sensitive information that may be included in the original data. Raw data is run through special algorithms and generators to create new data sets that cannot be traced back to the original consumer or transaction. However, this “fake” data set retains the accuracy and statistical significance of the original data set, making it ideal for creating a baseline for future studies or testing, modeling business opportunities, projecting trends and more.
For researchers and scientists tracking the COVID-19 crisis and working to develop treatments and vaccines, synthetic data can be used to aid in the creation of a much larger baseline for testing and clinical trials. For business or product owners, synthetic data can be generated on a one-to-one basis so that the final synthetic data set matches the original set field-for-field, but without the privacy risks. This “new” data set can then be safely used for performance analysis, benchmarking, forecasting, or product development, producing results as valid as using the original data and at no risk of misuse of personally identifiable information.
How is Synthetic Data Contributing?
Data collected from hospitals and health departments have been critical inputs into understanding the health impacts of the COVID-19 pandemic, but they only tell part of the story of the changes COVID-19 has created in our lives. As businesses, retailers and restaurants have been reopening, government officials and public health personnel are using reported infections data to track new outbreaks, trace contacts and make plans to manage the evolving situation.
Businesses themselves need data to understand when and how to re-open their doors, how consumer needs are changing and how best to confidently and competently serve a client base whose interactions and purchasing behavior are now very different from just a few months ago. And consumer transactional data should not be evaluated separately from pandemic trend data; rather, the two need to be combined so that operational decisions are informed from both health and business perspectives to ensure the economy is being reopened in the safest and most beneficial manner possible.
One example of an organization enabling broad data sharing across industries is Safegraph, which has formed an interdisciplinary consortium with over 1000 organizations, including the CDC, major academic research institutions, transaction data providers and government organizations at all levels. This consortium’s mission is to support COVID-19 response efforts by sharing aggregated and anonymized – and some synthesized – data on social distancing, foot traffic and consumer spending to retail establishments.
The comprehensive datasets the Safegraph consortium is collecting from thousands of retail chains and millions of consumers and small businesses provide critical input for government agencies in managing the broader economic recovery, as well as for financial institutions planning for the hoped-for rebound in consumer spending.
How Can Synthetic Data Help Businesses Navigate the COVID Crisis?
Combining health data with economic data allows us to construct models for guiding reopening planning, identifying businesses by level of criticality and economic value and balancing this against the level of health risk posed to customers of those types of businesses. Below is an example of one such model: a reopening roadmap created by Facteus based on consumer spending data from over 1000 banks and input from 26 health analysis sources. Depending on which quadrant a business falls into and the infection levels in a local area, government officials and business owners can use the roadmap to make better informed decisions as they work to reopen their local economies.
Safe, accessible and comprehensive data is critical to getting our economy going again. Synthesization of data allows for broad sharing of the inputs businesses and municipalities need to make decisions, all with much reduced levels of concern by healthcare professionals, government officials, business owners, compliance officers and PR staff about the risks of personally identifiable information (PII) being misused or stolen.
Beyond COVID, How Can Synthetic Data Help Us in Financial Services?
Imagine how useful a synthetic data set would be as a product manager or business owner. With easily accessible, rapidly updatable and statistically valid synthetic data, a product manager could be much more proactive in responding to customer issues, predicting future product trends, or generating ideas for new product features based on deeper analysis of product usage and customer feedback. All of this with much reduced levels of concern by business owners, compliance officers and PR staff about the risks of personally identifiable information (PII) being misused or stolen. Synthetic data can also be used to accurately train machine learning models and neural networks, critical for such areas as fraud detection and management systems, which need mountains of reliable data for testing and strengthening.
Start thinking about how you can safely unlock your data to support your own internal business planning, as well as your customers and potentially others in the ecosystem who may be able to use that data to more effectively support the government and private sector in bringing us out of this crisis and back toward improving employment levels, business performance and consumer confidence. Synthetic data is a powerful tool long leveraged by scientists in their research; now is the time for those of us in the business and financial communities to add it to our tool sets as well.