By Eric Fischer, Senior Product Manager, Data Solutions
Accruent Data Insights uses real data from 289 million healthcare work orders to provide independent reliability insights. In part 1 of our 5-part blog series, we explore how Accruent Data Insights helps HTM departments clean their medical equipment data through standardized nomenclature.
At some point, nearly every healthcare technology management (HTM) professional will struggle with non-standard medical equipment names in their maintenance systems. As device manufacturers merge, newly acquired hospitals integrate or the demands of everyday life as a biomedical engineer put pressure on staff to keep up, medical equipment data can quickly become a huge mess – and a daunting task to keep clean.
Accruent has developed an in-depth solution to this age-old problem. Accruent Data Insights standardizes medical device manufacturer and model names so that healthcare organizations can trust the data in their systems and be more confident about medical equipment decisions. By leveraging data pre-processing, fuzzy matching and the power of big data infrastructure, our solution helps you create a clean, useful database for your medical equipment.
The First Step: Data Preparation
To develop the solution, Accruent started with the basics. We created simple programs to consistently format manufacturer and model names, including changing names to upper-case and removing trailing white spaces. This also meant simple tasks at scale like:
- getting rid of parentheses
- removing other special characters
- changing all dashes to spaces
Then, data pre-processing got rid of extraneous, redundant or otherwise unsuitable terms. Frequent terms such as “healthcare,” “inc” or “company” were put into a formal list of stop words, which were removed from all records so they wouldn’t confuse the program that would eventually compare model names.
After this first pass of cleaning up manufacturer and model names, we could then tackle the problem of non-exact matches.
Fuzzy Matching Techniques
In any medical equipment database, there are bound to be entries that are similar but not exactly the same. And while a human can typically understand the relationship at a glance, a computer program may not. So how did Accruent create a program that knows ‘HUNTLEIGH EXCEL’ is very similar to ‘ARJO HUNTLEIGH HEALTHCARE INC EXCEL’?
Accruent converted each pair of model names into a similarity score using the following similarity scoring methods:
This method counts the ratio of matching terms. In the case above, ‘HUNTLEIGH’ and ‘EXCEL’ are both the same. Remember that ‘HEALTHCARE’ and ‘INC’ are not counted because they are stop words. The ratio of matching terms is computed into a similarity score.
This method uses something called “edit distance” – the most common of which is Levenshtein distance. As an example, “New York Mets” is very similar to “New York Meats” because only one letter change is needed to make them equal.
Custom Weighted Scoring
Accruent also developed our own weighted scoring after noticing certain patterns in medical device nomenclatures. For example, if two model names have matching numbers, the score is adjusted upward, because confidence in model number matching is much higher than string matching in the medical device naming world.
Once the individual entries had been cleaned up, it was time to look at compiling them into a useful resource.
Creating A Medical Device Dictionary
You might be wondering what happens when the computer sees “HUNTLEIGH TECHNOLOGY INC AC550” or “HUNTLEIGH HEALTHCARE 247001.” (For those who are unfamiliar, these are the same model as the “ARJO HUNTLEIGH EXCEL” but with references to the model number and/or catalog number instead of the brand name.)
It took us some time, but we eventually created a program that learns from the massive amounts of data we had available. The program starts with one device name and continues through a lengthy process to eventually compare similarity scores of all 12 million assets. It then bundles similar model names and numbers together.
This program took months to develop and in its simplest form looked like this:
- Select the next manufacturer and model name.
- Find similar manufacturer and model name.
- Check the category to see if similar.
- If manufacturer and model name similarity score is good, bundle them together as the same.
- Add item to the dictionary as an alias of another device.
- Repeat step 1.
In short, the program was designed to learn from itself.
Consider the example mentioned above. During the programmatic loop, a record with the name “HUNTLEIGH FLOWTRON EXCEL” was associated with another record called “HUNTLEIGH FLOWTRON EXCEL AC550.” At that point, the program learned that “AC550” is another term to refer to this same device. These learnings were captured in our dictionary, which is a collection of aliases of the same device.
After significant testing in small use cases, Accruent decided to test out the program at scale. However, the same code at scale would take hundreds of days to run, so Accruent leveraged the power of big data infrastructure. We migrated the program to huge servers with lots of CPU and memory, and re-wrote the code to take advantage of parallel processing capabilities.
After developing the code and running it across our huge data footprint, we still had more challenges to overcome because the program occasionally made silly errors. To fix this, a human entered the loop to validate the artificial intelligence. Accruent built an intuitive application and numerous people quickly validated the similarities. Below is a snapshot of the application:
After validating the dictionary, the major efforts involved in cleaning the data were mostly completed.
Today, when Accruent needs to clean a new hospital system’s data, we run a similar version of this program to match the hospital’s data to our dictionary. For the hospital system, the process of cleaning the medical equipment data is reduced from months down to minutes.
In addition, our dictionary gets better and better and better with each new hospital’s data that we map. We also leverage this technology to match data to AHA, Attainia, FDA and UMDNS systems. The results of all these mappings have made the Accruent Data Insights solution more accurate, complete and intelligent.
How can Accruent Data Insights help you?
When Accruent needed to normalize medical equipment asset data, we invented a whole new way for HTM professionals to keep their data clean. Now, you can support your medical device equipment planning with data-backed insights.
If you’d like to learn more, please reach out to your Account Executive and ask for a free data cleaning report card so you can see how this technology can help you. As always, you can reach us at email@example.com.
Please stay tuned for the rest of our weekly series:
- Accruent Data Insights Deep Dive Part 2 – Standardizing Medical Device Categories
- Accruent Data Insights Deep Dive Part 3 – Mining for End-of-Support Dates
- Accruent Data Insights Deep Dive Part 4 – Determining Medical Device Costs
- Accruent Data Insights Deep Dive Part 5 – Measuring Life Expectancy