Quants turn to machine learning to unlock private data
Replication could allow financial firms to use – and monetise – data that was previously off-limits
N E E D T O K N O W
- Rudimentary methods of anonymising private data such as masking can be easily reversed.
- Synthetic datasets created by machine learning algorithms are completely different from the originals, while still retaining the same statistical properties.
- Financial firms including American Express, Fidelity and JP Morgan exploring ways to use the technology to unlock the value of sensitive datasets.
- Erste Group built a retail banking app using synthetic data, while an Italian bank used synthetic data to validate a third-party credit scoring model.
- If the technique proves robust, it could make it easier for investment firms to develop novel strategies based on alternative datasets.
When an investment firm wanted to find out how a new breakfast menu at Wendy’s might affect the fast-food chain’s bottom line, it looked for the answer in time-stamped credit card transaction data.
The data was anonymised, of course. Credit card companies remove sensitive information and add statistical ‘noise’ to this type of data before selling it to investors or even sharing it internally. But these anonymisation techniques are not fool proof and nervousness about privacy breaches has held back the use of transaction data in areas such as investment analysis, fraud detection and the development of execution algorithms.
A new idea could change that. Rather than anonymising datasets, financial firms are looking at replicating them. Machine learning algorithms can synthesise new, artificial datasets that are completely different from the original, while retaining the same statistical characteristics. Because the new data is essentially fake, it can be shared at will.
For investors, synthetic data opens the door to testing datasets more easily.
“When a fund wants to trial alternative data, usually it takes time,” says Gautier Marti, a quant researcher and developer at the Abu Dhabi Investment Authority, and an expert in ways to replicate complex datasets. “With synthetic data you can share samples without needing to sign non-disclosure agreements and so on.”
Marti has looked at different ways that synthetic data might be used in finance and sees anonymisation as the most obvious application.
Hedge funds making big investments in alternative data want assurances that the supply of data won’t be cut off because customers change their privacy settings, says Lorn Davis, vice- president of corporate and product strategy at Facteus, a company that anonymises card transaction data.
To read the full article, visit https://www.risk.net/investing/quant-investing/7849681/quants-turn-to-machine-learning-to-unlock-private-data (paywall)