The Impact of Matching Error on the Efficiency of Regression Data Integration Estimation

Location: ABS Office, preference Canberra ACT but not limited to

Duration: 6 months

Proposed start date: April 2019

Project Background

Traditional, sampling survey method is one of the ABS main approaches to produce estimates of interests.  Recently, the ABS is exploring the possibility to improve sample survey estimates quality by linking survey samples to Big data sources.   In a paper by Kim and Tam (2018), they outlined a methodology to improve the efficiency of sample survey estimates by combing Big Data in a regression framework.  In their work, they assumed that there is no linking error.  Whilst it is known that linking error will not affect the unbiasedness, it will have a negative impact on the efficiency of the regression estimator.  The purpose of this project is to quantify this impact using simulation results and derive a theoretical or empirical formula for the loss of efficiency caused by linking error.

Research to be Conducted

The ABS has a framework for using administrative data to improve survey estimation. This framework makes strong assumptions about the above-mentioned complications (e.g. assumes there are no False Positives and assumes that people who have a left Australia do not have an administrative record). The project work will aim to relax these assumptions by developing a framework that is more realistic. Research work would include:

  • Reviewing the statistical literature in the fields of data linking error induced measurement error and coverage error and review the work in these fields by the ABS and other statistical agencies.
  • Extending the current regression data integration framework to include linkage errors.
  • Conduct simulations in a range of situations to assess the loss of efficiency due to linking error.
  • Assisting to write a paper for submission to a statistics journal.

Skills Required

We are looking for someone with the following skillset:

  • Expertise in and knowledge of the literature on statistical modelling and data linking.
  • Ability to innovate and solve technical problems.
  • Suitable mathematical programming skills e.g. SAS or R.

Expected Outcomes

An improved understanding of regression data integration estimation efficiency with linking error.

Additional Details

The intern will receive $3,000 per month of the internship, usually in the form of stipend payments.

It is expected that the intern will primarily undertake this research project during regular business hours, spending at least 80% of their time on-site with the industry partner.  The intern will be expected to maintain contact with their academic mentor throughout the internship either through face-to-face or phone meetings as appropriate.

The intern and their academic mentor will have the opportunity to negotiate the project’s scope, milestones and timeline during the project planning stage.

Applications Close

03 April 2019


APR – 0887