Phase one outputs from development of capacities to extract health and safety insights from free-text sources

27/10/20

Data insight Machine learning Text mining Working together

The text mining project being delivered as part of the LRF Discovering Safety Programme is looking to build upon existing state of the art text mining and natural language processing to develop a suite of text mining and natural language processing tools and techniques for specific use on unstructured health and safety datasets. As well as enabling the Discovering Safety Programme to generate new health and safety insights and learning using such tools and techniques on the HSE datasets available to the programme, the intention is to make the tools developed available for industry to use on their own datasets, for their own specific purposes, further leveraging benefits arising from the work undertaken on the programme.

Aims and objectives

The core aim of Phase 1 work has been to convert the HSE reports archive to a format more amenable to collective analysis and demonstrate how it might be put to applied use.

More Information

Development of capacities to extract health and safety insights from free-text sources project

Key findings

The major outcomes from this stage of the project are the tools developed. Performance of the tools is expected to improve significantly when larger datasets can be shared by HSE and annotated, allowing models to be trained on a more substantial document corpus. The first phase of annotation of RIDDORs has been completed by a team of nine annotators, and the inter-annotator agreement has been assessed. Detailed analysis of the results has highlighted concepts and entities that are not consistently labelled (such as confusion between Materials, Equipment and Physical Environment), and which could consequently reduce the performance of machine learning algorithms.

Recommendations

Planned work is looking to develop tools to support specific tasks: 1) enhanced search and retrieval of individual documents and specific content within documents based, and 2) targeted health and safety knowledge discovery, both descriptive (i.e. identifying existing knowledge within the knowledge base) and inferential (i.e. generating new knowledge through inference). The ability to apportion documents into different clusters based on content and auto-label specific content are both key in being able to perform enhanced search and retrieve tasks on the reports corpus and subsequently summarise the returned content.

Related Content

A new way to start a discussion with HSE and the Health and Safety Data Community

Collaboration is at the heart of Discovering Safety and there are many ways you and your organisation can get involved and benefit from insider knowledge. We have recently launched a new discussion forum feature on our website where you can pose questions to us or hold discussions with other members. To access this, first, log […]

Digging deep for data on safety

A new text search system has the potential to boost safety planning on construction sites. It has been developed by the Health and Safety Executive (HSE) in collaboration with the National Centre for Text Mining, University of Manchester (NaCTeM). The RIDDOR Text Analysis Tool is a critical element of the Discovering Safety Programme (a collaboration between the […]

Leading Indicators

We’re working with the University of Manchester and the construction industry to provide the evidence base for using ‘leading indicators’ in preventing harm at work. Leading indicators are the positive steps – like training and communications – that can help prevent harm from happening. The health and safety community has long believed that leading indicators […]

Loss of containment insights project

We are working with the processing industry to help develop tools that will help us find the factors that lead to Loss of Containment (LoC) incidents. These happen when hazardous substances (such as gas, fuel and chemicals) escape from storage, sometimes leading to catastrophes such as: We need to learn more about the causes of […]

Digital tool unlocks new heights of safety performance for the construction sector

The Safetibase Risk Suggestion Tool is set to revolutionise the construction sector by enabling virtual collaboration and providing safety critical information at designer’s fingertips. The application of a digital tool aligned to industry standards and requirements (CDM 2015; PAS 1192:6) has shown that further health and safety insights and treatments can be obtained through use […]

New event just announced

Join the Discovering Safety team and share experiences and explore how to use tools and techniques to access and improve your use of safety data. Meet our data analysis and safety experts for an open discussion about how to unlock your safety data and gather insight to improve safety operational decision making. As we negotiate […]