Patient‑centric Precision Biomarker Discovery Pipeline for High‑throughput Data Analysis

Engineering, IT, Mathematics and Statistics

ABOUT THE INDUSTRY PARTNER

CSL is a global plasma biotechnology company with a diverse drug portfolio, headquartered in Melbourne. The Biostatistics team within CSL Research works on providing innovative solutions to diverse research problems pertaining to the company’s drug development pipeline.

WHAT’S IN IT FOR YOU?

  • Hands-on experience applying precision medicine and patient stratification concepts to real-world clinical and high dimensional data (i.e. proteomic data).
  • Practical exposure to data-driven patient similarity and subgroup-specific biomarker discovery, moving beyond one-size-fits-all models.
  • Experience developing modern, reproducible statistical pipelines that explicitly account for patient-level heterogeneity.
  • Insight into how stratified and precision-focused analytics are implemented in a biopharmaceutical research setting.
  • Mentorship and networking opportunities with CSL biostatisticians and collaborators.

RESEARCH TO BE CONDUCTED

High‑dimensional clinical and omics datasets provide rich opportunities for biomarker discovery and patient stratification. However, effectively incorporating patient‑level heterogeneity into biomarker analysis remains a significant challenge. This project aims to address this gap by developing a robust, well‑documented R‑based precision module that supports data‑driven patient stratification and subgroup‑specific biomarker discovery for binary outcomes.

The research will begin with an updated literature review that extends previous work by summarizing best‑practice statistical methods for high‑dimensional biomarker discovery and patient stratification. Attention will be paid to approaches grounded in patient similarity, including similarity‑based clustering, neighborhood‑based modeling, and subgroup‑specific inference.

Building on the insights gained, the intern will develop a modular R workflow that stratifies patients using similarity measures derived from clinical metadata, high‑dimensional features, or model‑performance characteristics. This workflow will enable subgroup‑specific biomarker discovery and validation while incorporating essential components such as data preprocessing, feature selection, model development, internal validation, robustness assessment, and interpretability. The development process will emphasize reproducibility, transparency, and seamless integration with existing CSL biomarker discovery pipelines.

The resulting framework is intended to complement CSL’s current biomarker discovery ecosystem by adding a precision‑focused layer that explicitly accounts for patient heterogeneity through similarity‑driven stratification. The intern will apply the pipeline to CSL case studies and/or publicly available datasets to demonstrate practical utility, quantify performance improvements associated with stratified and local modeling, and document limitations and opportunities for further methodological extension.

SKILLS WISH LIST

If you’re a postgraduate research student and meet some or all the below we want to hear from you. We strongly encourage women, indigenous and disadvantaged candidates to apply:

  • Background in Statistics, Biostatistics, or Bioinformatics (Master’s, or PhD candidate).
  • Solid understanding of regression-based modelling and model validation techniques.
  • Familiarity with clustering and similarity-based methods for patient stratification.
  • Experience working with high-dimensional data (e.g. omics or large feature sets).
  • Strong R programming skills for data analysis and reproducible workflows.
  • Clear written and verbal communication skills for documenting methods and results.

RESEARCH OUTCOMES

By the end of the internship, the student will deliver:

  • An R-based precision biomarker module for data-driven patient stratification and subgroup-specific biomarker discovery in high-dimensional clinical and omics datasets.
  • A modular analysis workflow supporting:
    – Patient similarity construction and clustering
    – Subgroup-specific regression modeling for binary outcomes
    – Internal validation and robustness assessment
  • Reproducible analysis scripts covering:
    – Data preprocessing and quality control
    – Feature selection in high-dimensional settings
    – Model fitting, validation, and interpretability
  • Vignette-style documentation describing the workflow, methods, and practical usage of examples.
  • An updated literature review summarizing best-practice methods for data-driven patient stratification and precision biomarker discovery, highlighting methodological gaps and recommendations.
  • Case study applications using CSL datasets and/or public data to demonstrate performance, interpretability, and limitations of stratified and local modeling approaches.
  • A concise summary of limitations and future extensions to inform subsequent development of precision analytics within CSL.

ADDITIONAL DETAILS

The intern will receive $3,300 per month of the internship, usually in the form of scholarship payments.

It is expected that the intern will primarily undertake this research project during regular business hours and maintain contact with their academic mentor throughout the internship either through face-to-face or phone meetings as appropriate.

The intern and their academic mentor will have the opportunity to negotiate the project’s scope, milestones and timeline during the project planning stage.

Please note, applications are reviewed regularly and this internship may be filled prior to the advertised closing date if a suitable applicant is identified. Early submissions are encouraged.

LOCATION:
Melbourne, Victoria
DURATION:
6 Months
CLOSING DATE:
18/02/2026
ELIGIBILITY:
PhD & Masters by Research students, both domestic & international
REF NO:
APR - 2973

INTERNSHIP CONTACT

CONNECT WITH APR.INTERN

Suggested Internships

SPRINGBOARDS.AI (APR – 2968)

Location:
Sydney, NSW or Remote
Next-Gen Creative AI: Building LLMs for Non-Homogeneous Content Generation

HEIMDALL CONSULTING (APR – 2743)

Location:
Brisbane, QLD or Remote
The Heimdall® CDEM Capability Assessment Survey Application

CSL (APR – 2945)

Location:
Melbourne, VIC
Biomarker Discovery Pipeline for High-Throughput Data Analysis