Big Data for Small Area Estimation
Location: ABS Office, Nationally
Duration: 6 months
Proposed start date: April 2019
One of the challenges faced by the ABS, as well as other National Statistical Offices, is the production of reliable estimates for small population domains, at desired levels of disaggregation and frequency. Increasingly during recent years, users of official statistics are requesting fine-level, frequent, and highly-detailed information that at times extends beyond the published estimates.
One such example is the Aboriginal and Torres Strait Islander population estimates, which are published (along with their corresponding standard errors) every five years, following each Census, at the state level. These results are then disaggregated by age and sex (although no standard errors are attached), and are adjusted for annual population changes (i.e., births and deaths). The challenges consist in: (1) obtaining finer-level geographical (e.g., SA4) population estimates; (2) obtaining more frequent (e.g., annual) estimates; and (3) obtaining other estimates of interest (e.g., employment status, health, etc.) disaggregated by area type (e.g., state).
This project will investigate the feasibility of using small area estimation (SAE) techniques and existing data sources, which include survey and big/admin data, to address one of these challenges. In particular the focus will be on either:
- Producing fine-level (e.g., Statistical Area Level 4) geographical estimates for the Aboriginal and Torres Strait Islander population, desirably, disaggregated by age and sex.
- Producing state-level estimates for the Aboriginal and Torres Strait Islander population, on an annual basis, desirably, disaggregated by age and sex.
- Producing state-level employment estimates for the Aboriginal and Torres Strait Islander population, on an annual basis, disaggregated by age and sex.
The datasets to be considered include the monthly Labour Force Survey, the Census, the Post Enumeration Survey (i.e., PES) which measures the undercount/overcount following the Census, Medicare data, and Centrelink data.
Research to be Conducted
The key objectives for this project are to:
- Assess the feasibility of using existing data sources to successfully address (one of) the challenges listed above
- Assess the methodologies required to achieve that
- Build a prototype methodology that can be applied on actual data
We are looking for someone with the following skillset:
- Strong modelling skills; technical knowledge of small area estimation being an advantage
- Good technical knowledge of survey design and estimation, including stratification and weighting
- Working with large datasets, including integrating multiple sources of data
- Programming experience in SAS or R (preferably), or Python
The expected outcomes of the project are:
- A project report addressing the key objectives identified above
- Literature review and investigation into methods currently used by other national statistical agencies or research organisations
- Algorithms/Prototype software and analytical models
It is expected that the results of this project will inform follow-on research and development work.
The intern will receive $3,000 per month of the internship, usually in the form of stipend payments.
It is expected that the intern will primarily undertake this research project during regular business hours, spending at least 80% of their time on-site with the industry partner. The intern will be expected to maintain contact with their academic mentor throughout the internship either through face-to-face or phone meetings as appropriate.
The intern and their academic mentor will have the opportunity to negotiate the project’s scope, milestones and timeline during the project planning stage.
03 April 2019
APR – 0889