Detecting & Managing Sentiment Anomalies in Collaborative Information Systems
Location: Melbourne VIC
Duration: 5 months
Proposed start date: May or June 2019
In collaborative information systems, multiple authors aim to produce good interpretations of imperfect, incomplete information from a variety of sources. Effective communication requires a careful balance of efficiency and careful comprehension, but significant implications are often misunderstood or missed.
This project aims to remedy systemic weaknesses in the collaborative sense-making process using machine learning. Recently, artificial neural network language models have improved significantly in their ability to consider extended context and train at scale on unsupervised datasets, making new applications practical.
We will produce a real-world proof-of-concept application with our partner Melbourne Health. In a medical setting, patients with complex, long term disease are examined by a range of specialists, using different tests performed at different dates and sending the results in a variety of formats to a variety of data destinations. As patients’ conditions evolve over time, specific features & events vary in significance. Doctors must regularly review and interpret all the evidence quickly and accurately.
Our approach is to learn the typical course of events concerning individual patients from reports produced during normal business, and then:
- Provide immediate warnings when the sentiment of new written content contradicts the sentiment of prior reports. This aims to mitigate confusion due to information overload.
- Provide retrospective notifications to authors when new events contradict the implied sentiment of their earlier comments. This feedback enables authors to continuously improve.
We aim to deliver these improvements with minimal disruption to normal workflows by using unsupervised learning to model normal progression and relying on minimal expert supervision only to filter detected anomalies for relevance. The potential upside of the project is significant: not only can critical errors be spotted earlier, but the system will enable best-practice continuous improvement via selective feedback.
We are seeking an applicant to play a key role in the delivery of this project through the implementation of machine learning algorithms & necessary software infrastructure. The applicant should have a background in machine learning and natural language processing. The applicant will be tasked with the discovery of methods for automatic modelling and learning from limited human feedback.
The solution will exploit machine learning expertise & resources from within our company and domain-specific knowledge and data from our partner. The applicant will be expected to bridge these two groups and have creative opportunities to determine the technical and practical aspects of the solution. We anticipate that one or more publications may arise from the result of the project, which the applicant is expected to co-author. Thus, this is a great way to gain experience of real-world project delivery using the latest machine learning techniques.
Research to be Conducted
We will work with our partner organization Melbourne Health to validate the proposed system for impact in clinical practice. Medical specialists are often pressured by the number and complexity of information sources concerning individual patients – for example pathology, histology & radiology reports obtained over months or years. In addition, it is neither routine nor easy for clinicians to follow up on individual cases and obtain useful feedback.
Although text sentiment analysis is already widely used, application to the collaborative information sharing context in the workflows described above is more novel. In addition, our approach will use the latest unsupervised learning techniques to model the normal development of individual cases and the sentiments expressed in related reports. This data and training can be performed retrospectively, allowing rapid deployment of the system.
We will then attempt to identify & filter anomalies within these data to optimize clinical relevance and significance, using a very limited amount of supervisory feedback from clinicians. Use of unsupervised learning for the bulk of the modelling limits the human expert labelling and supervision effort, making it practical to deploy and maintain the system in real-world settings.
Our objectives are:
- Validation of the clinical impact and real-world utility of the system architecture as proposed, including quantification of both retrospective and immediate feedback. If these supervisory interactions are not satisfactory, we will explore alternatives.
- Demonstration of a real-world use-case for our unsupervised learning techniques that shows the utility of the models learned, both practically and empirically. In addition to performance on academic benchmarks, this is crucial for validation of our technology.
- A prototype solution that can be easily adapted to other domains.
The project depends on use of anonymized medical records data for training and evaluation of the system. Ethics approval for the same is in progress. It also requires participation by clinicians to supervise the training of relevant filters and for assessment of clinical impact. We are in the process of formalizing an agreement with Melbourne Health to provide these.
From a technical perspective, we will also require computational resources for training and evaluation of methods and to integrate the prototype solution in a high-reliability environment. Our company can provide these and relevant software infrastructure and tools. We have experience executing research and commercial projects at scale.
Although the outline of the project is clear and it is reasonable to believe that the objectives are achievable given current state-of-the-art methods, the most appropriate specific technical methods and solutions will be identified by the applicant as part of the research project.
We are looking for a PhD student with the following:
- Experience of data science, especially natural language processing
- Machine learning approaches in general; artificial neural networks
- Programming in Python, with experience of NumPy and TensorFlow
- Ability to work with interdisciplinary teams
The expected outcome of the project is a software prototype lightly integrated into existing healthcare systems & workflows to enable empirical validation and measurement. The solution will incorporate novel machine learning architectures and it is expected that the software component of the project will be complemented by a written report, with consideration for publication as a scientific paper. We hope to obtain evidence that validates the core approach of identifying anomalous sentiments or outcomes in an unsupervised model with minimal, practical supervision. In the medical setting, we aim to quantify clinical impacts and benefits. It is likely that a second paper focusing on clinical impacts may also result.
The combination of peer-reviewed publications and tangible real-world use-cases are important for ongoing commercial development of Incubator 491 and meeting criteria for further investment. We may also pursue full commercialization of the solution in healthcare or other domains.
The intern will receive $3,000 per month of the internship, usually in the form of stipend payments.
It is expected that the intern will primarily undertake this research project during regular business hours, spending at least 80% of their time on-site with the industry partner. The intern will be expected to maintain contact with their academic mentor throughout the internship either through face-to-face or phone meetings as appropriate.
The intern and their academic mentor will have the opportunity to negotiate the project’s scope, milestones and timeline during the project planning stage.
24 April 2019
APR – 0904