High Performance Data Management
Location: Canberra, ACT
If you are residing in another state from that listed above, there may be some flexibility in the arrangement. Please contact the Business Development manager for more information.
Duration: 5 months
Expected start date: February/March 2018
Keywords: Database systems, Software development (SQL), Python, Java, Linux commands, metadata, Geographic Information Systems
Please note: Due to funding requirements, students must have Australian Citizenship or Permanent Residency to apply for this project. Any applicants not meeting this requirement will automatically be deemed ineligible for this project.
The National Earth and Marine Observations (NEMO) branch of Geoscience Australia has the responsibility of maintaining a large volume and broad range of earth science information products, including collections of satellite-derived and marine data. As the size and diversity of data grows, so does the complexity of processing and managing them effectively.
The scale of the problem is such that manual processes are no longer effective, and so we must look to ways of structuring and storing data in a manner that allows for automation. These are ‘Big Data’ problems, both in terms of individual data sets and for collections as a whole. Geoscience Australia are seeking an intern to assist with curating and managing large volumes of scientific information.
Research to be Conducted
Geoscience Australia is seeking assistance with developing and implementing operational workflows relating to collections of spatial environmental information. Examples of data sets include: bathymetry (seabed mapping) data, sonar collections, water column data, and sediment information, as well as data relating to the geology of the Antarctic. Geoscience Australia are seeking to improve their capabilities of storing and delivering these data sets using cloud database technology (e.g. PostGRES and PostGIS using Amazon RDS), with an eye towards delivery using web services (e.g. through GeoServer).
Research projects that relate to better delivery of this information through automation and improved provenance tracking would be of significant interest. Examples of current work include automation of bathymetry processing using MB-System, automated extraction of data footprints, and improving the ability to query data by ingesting data into databases and providing SQL views. Geoscience Australia also have links to a Drupal-based CMS and a GeoNetwork-based metadata catalogue.
Geoscience Australia makes extensive use of Amazon Web Services, and can provide access to compute instances, database services, and the necessary open source software to perform scalable computing. They also have access to a specialist AWS Cloud-Enablement team, and have a number of in-house development teams working with a variety of different programming languages and software kits.
For this project, we are seeking a candidate with:
- Experience working with database systems – specifically PostGRESQL (and PostGIS), although additional experience with technologies such as Hadoop, HBase/BigTable, or NoSQL solutions such as MongoDB etc. would be an asset.
- Some experience with software development (especially SQL) will be essential.
- Programming language of Python, although Java, PHP and possibly Scala are used.
- Some familiarity with Linux commands and bash scripting will be essential. GA principally operate using the Linux operating system, on EC2 instances provided by Amazon Web Services (AWS).
- Experience working with metadata and metadata workflows (e.g. using XML, XSL, XSD, JSON or YAML) would be beneficial.
- Experience with Geographic Information Systems (e.g. ArcGIS, QGIS) would be an asset, as well as familiarity with delivering information via Web Services (e.g. through GeoServer).
The principal expected outcome will be the development of processes workflows to support to fulfil the objectives described above. Reasonable documentation of workflows will be required (i.e. code alone would be insufficient), such that a reasonable user would be able to understand the work and be able to continue development with it.
The end result will need to be integrated with existing government business workflows and will need to align with ongoing operations. A presentation of the results to the NEMO management team would also be valuable upon completion of the project. The development of a scientific publication, if applicable, would be welcomed.
The intern will receive $3,000 per month of the internship, usually in the form of stipend payments.
It is expected that the intern will primarily undertake this research project during regular business hours, spending at least 80% of their time on-site with the industry partner. The intern will be expected to maintain contact with their academic mentor throughout the internship either through face-to-face or phone meetings as appropriate.
The intern and their academic mentor will have the opportunity to negotiate the project’s scope, milestones and timeline during the project planning stage.
To participate in the APR.Intern program, all applicants must satisfy the following criteria:
- Be a PhD student currently enrolled at an Australian University.
- PhD candidature must be confirmed.
- Applicants must have the written approval of their Principal Supervisor to undertake the internship. This approval must be submitted at the time of application.
- Have Australian Citizenship or Permanent Residency.
- Internships are also subject to any requirements stipulated by the student’s and the academic mentor’s university.
11 February 2018
INT – 0383