
This section includes two types of tools useful for beginning work with ICD-10-CM coded data: Programming Resources.
Perl Regular expressions for ICD-10-CM Injury and Drug Overdose Indicators
A regular expression is a sequence of characters that define a search pattern. There are different syntaxes, or “flavors”, for writing regular expressions - Perl is the most common. The injury and drug overdose indicator definitions operationalized as Perl regular expressions can be used in statistical programs to identify the presence of included ICD-10-CM codes in ED and Hospitalization datasets.
Notes about the regular expressions:
- Capture groups – parentheses () - are utilized to improve readability of the regular expressions. Each regular expression is formatted as follows:
- Capture group 1 – search pattern for the first 6 characters of the ICD-10-CM code
- Where applicable, the pipe symbol – alternation | - is used within capture group 1 to gather multiple 6-character sub-expressions representing the various codes included in the indicator
- BLUE and YELLOW colors are used to show the alternation pattern within capture group 1
- Capture group 2 – search pattern for the 7th character of the ICD-10-CM code
- The same 7th character inclusion criteria are generally shared by all codes included in the indicator
- PURPLE color is used to show capture group 2
- Most injury codes in ICD-10-CM should have 7 characters. As outlined in the indicator definitions, the regular expressions include codes that are missing a 7th character for encounter type, and codes truncated any further than this are excluded. Some codes only have 3-6 characters by design (T30-T32, Y07, Y09). This is accounted for and noted where necessary.
- The regular expressions should be updated annually to include any CMS modifications to the code set that affect the indicator definitions
- If you are using SAS, PRX functions can be used in the DATA step to harness Perl pattern matching features
- If you are using R with these regular expressions replace ‘\’ with ‘\\’ and use the option ‘perl = TRUE’ and ‘ignore.case = TRUE’ when applicable. In addition, R users can simplify the aforementioned capture group 2 by replacing “$|\b” with “$”
Attachments:
- Indicator-Specific Regular Expressions - Update 4-8-21
- Other Useful ICD-10-CM Regular Expressions - Update 10-15-20
- Answer Key