Friday, 9 September 2016

Australian Bureau of Statistics: the pot of gold that is big data

The privacy concerns arising in respect of big data tend to have two foci. First, there are ethical questions about how private information is captured and subsequently used without the subject’s knowledge or consent. Second, there are concerns that the way governments and corporations store and secure this data fails to reach an appropriate standard, leaving the door open for private individual data to be accessed by unauthorised persons, or otherwise released. [NSW LC, Standing Committee on Law and Justice, Remedies for the serious invasion of Privacy in New South Wales, 3 March 2016]

The Australian Bureau of Statistics (ABS) has shown an interest in accessing the gold mine that is retail scanner data.

Big data refers to the large volume of structured or unstructured data that organisations generate and store. It is characterised as data that generally contain high volume, high velocity and/or high variety information and demands cost-effective, innovative ways of processing for enhanced insight and decision making .

The opportunity that big data presents to statistical agencies is the potential to produce more relevant and timely statistics than traditional data sources such as sample surveys. As an input into official statistics, either for use on its own, or combined with more traditional data sources, Big data could help position National Statistical Offices (NSOs) to improve the accuracy of their measures or the quality of the statistics produced. It can also help improve the comprehensiveness of official statistics by addressing existing data gaps.

An example of Big data is transactions data from major retailers obtained from the electronic capture of product information at the point of sale. Transactions data contain detailed information about the business name and location of the transaction, date and time, quantities, product descriptions, values of products sold as well as their prices.

ABS innovations will meet new and emerging data needs. For example, the ABS is developing a prototype known as the Graphically Linked Information Discovery Environment (GLIDE), which is a suite of tools using Semantic Web methods to help analysts explore and visualise linked data. GLIDE has linked personal income tax data with business tax data to explore new methods to manage, link and analyse cross-sectional and longitudinal data.

A pilot project to inform policy development through the combination of Census and social security information was established between the ABS and the Department of Social Services.

The ABS is already a user of big data - with considerable potential to use much more - as effective use of this government data reduces our need to collect information separately and directly from households and businesses.

ABS is moving beyond the public data environment to draw insights from retail scanner data, to explore options with other data sources such as investigating the use of satellite imagery to measure agriculture crop yields and new methodological approaches to using telecommunication location information.

The spatial opportunities of big data approaches are considerable and have the potential to fundamentally change how we produce population information - especially the extent to which we can measure temporal dynamics which have generally been beyond the reach of traditional approaches.

And this online article gives a strong clue as to why the ABS would like to link national census data on individuals and households to retail scanner data – it will increase the commercial value of the statistical data products it offers for sale.

Ad News, 5 September 2013:

One of the country's biggest advertisers, Woolworths, said it doesn't need big splashy ad campaigns to launch its insurance offering. Because its database tells it the people it needs to target directly.

Woolworths Limited director of group retail services Penny Winn said the company has been deliberately shying away from traditional mass advertising for its new insurance business.

Woolworths' combined insurance statistics database and frequent shopper database found those who buy milk and red meat are better insurance risks than those who have pasta, rice and liquor in their shopping baskets. As a result, Woolworths are able to target those good insurance risk customers directly with better insurance offers.

“What we've been able to do is take our insurer's car crash database and overlay it with our Woolworth's Rewards database. I rarely see actuaries get excited but they were very excited about what we found because it was so statically significant,” said Winn.

“Because you see, customers who drink lots of milk and eat lots of red meat are very, very, very good car insurance risks versus those who eat lots of pasta and rice, fills up their petrol at night, and drink spirits. What that means is we're able to tailor an insurance offer that targets those really good insurance risk customers and give them a good deal via direct channels instead of above-the-line [advertising]. And it helps to avoid the bad insurance risks.”

It seems that along with an individual’s name, address, marital status, income range, education level, ancestry, personal hygiene regimen, criminal or traffic infringement record, taxation liability and/or welfare payment history, medication and health status, the ABS would also like to have the option of assessing the individual’s alcohol consumption and insurance risk level.

What a treasure trove for those with a malicious heart and refined hacking skills or overly inquisitive police and national security agencies.

No comments: