Quote of the Day
May I give you a word of advice? Next time you invade Italy, do not start at the bottom.
— General Fridolin von Senger und Etterlin, WW2 German general in Italy during WW2. This was his post-war advice to a British historian. The Allied Italian Campaign was a disaster in many ways – General von Senger und Etterlin's critique of the Allied effort was probably correct.
Introduction
Because many corporations are focused on the Microsoft Office suite of tools, businesses often require that I use Excel/Power Query for my analysis work so that they can work with the tools I develop after I am gone. Fortunately, I really enjoy using Power Query, but I find it irritating that it does not support regular expressions. I must admit that Power Query's standard functions can do a good job of extracting strings, but the process is a bit tedious. I have a large library of regular expressions for extracting email addresses, phone numbers, social security numbers, and the like that would be efficient for me to use if Power Query could run them.
Fortunately, I saw a blog post by Imke Feldmann that showed a hack that allows the use of a regular expression in Power Query. The hack involves writing a query that contains a small HTML page that calls a Javascript regex routine. This may not be the world's most efficient piece of code, but much of my work involves cleaning up relatively small dimension tables where convenience is more important than efficiency.
My M-code example is included in this workbook and is shown below. I used a US Census table that I copied from the Wikipedia for an application example. I use this table all the time for grouping US state data by region. While it is a small table that could have been cleaned up with only Power Query's editor (i.e., no coding), using regular expressions resulted in a tad shorter query.
Reg Ex Query
The Power Query query that contains the regex is shown below.
/* Modified version of regex routine documented by Imke Feldman at https://www.thebiccountant.com/2018/04/25/regex-in-power-bi-and-power-query- in-excel-with-java-script/ Input Parameters: x : String to be searched y: search regex z: replacement regex */ (x,y,z)=> let Source = Web.Page( "<script>var x="&"'"&x&"'"&";var z="&"'"&z& "'"&";var y=new RegExp('"&y&"','g'); var b=x.replace(y,z);document.write(b);</script>") [Data]{0}[Children]{0}[Children]{1}[Text]{0} in Source
Wikipedia Description of US Census Regions and Divisions
This is the text I copied from the Wikipedia.
Region 1: Northeast
Division 1: New England (Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont)
Division 2: Mid-Atlantic (New Jersey, New York, and Pennsylvania)
Region 2: Midwest (Prior to June 1984, the Midwest Region was designated as the North Central Region.)[7]
Division 3: East North Central (Illinois, Indiana, Michigan, Ohio, and Wisconsin)
Division 4: West North Central (Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota)
Region 3: South
Division 5: South Atlantic (Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, District of Columbia, and West Virginia)
Division 6: East South Central (Alabama, Kentucky, Mississippi, and Tennessee)
Division 7: West South Central (Arkansas, Louisiana, Oklahoma, and Texas)
Region 4: West
Division 8: Mountain (Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, and Wyoming)
Division 9: Pacific (Alaska, California, Hawaii, Oregon, and Washington)
Tidied Table of US State Regions and Divisions
This is a tidy version of the Wikipedia data.
Cleaned Up Table of US States By Region and Division
Region | Division | State |
---|---|---|
Midwest | East North Central | Illinois |
Midwest | East North Central | Indiana |
Midwest | East North Central | Michigan |
Midwest | East North Central | Ohio |
Midwest | East North Central | Wisconsin |
Midwest | West North Central | Iowa |
Midwest | West North Central | Kansas |
Midwest | West North Central | Minnesota |
Midwest | West North Central | Missouri |
Midwest | West North Central | Nebraska |
Midwest | West North Central | North Dakota |
Midwest | West North Central | South Dakota |
Northeast | New England | Connecticut |
Northeast | New England | Maine |
Northeast | New England | Massachusetts |
Northeast | New England | New Hampshire |
Northeast | New England | Rhode Island |
Northeast | New England | Vermont |
Northeast | Mid-Atlantic | New Jersey |
Northeast | Mid-Atlantic | New York |
Northeast | Mid-Atlantic | Pennsylvania |
South | South Atlantic | Delaware |
South | South Atlantic | Florida |
South | South Atlantic | Georgia |
South | South Atlantic | Maryland |
South | South Atlantic | North Carolina |
South | South Atlantic | South Carolina |
South | South Atlantic | Virginia |
South | South Atlantic | West Virginia |
South | East South Central | Alabama |
South | East South Central | Kentucky |
South | East South Central | Mississippi |
South | East South Central | Tennessee |
South | West South Central | Arkansas |
South | West South Central | Louisiana |
South | West South Central | Oklahoma |
South | West South Central | Texas |
West | Mountain | Arizona |
West | Mountain | Colorado |
West | Mountain | Idaho |
West | Mountain | Montana |
West | Mountain | Nevada |
West | Mountain | New Mexico |
West | Mountain | Utah |
West | Mountain | Wyoming |
West | Pacific | Alaska |
West | Pacific | California |
West | Pacific | Hawaii |
West | Pacific | Oregon |
West | Pacific | Washington |
This hack is so hillariusly genius and opens an entire world of new possiblities :-]
Pingback: [SOLVED] Extracting data with an associated tag/unit – BugsFixing
Very, very thank you for your post!