Data is the daily bread-and-butter for any analyst. In order to provide good results you also need good data. Sometimes this data is very well prepared beforehand and you can use it as it is but it is also very common that you need to prepare and transform the data on your own. To do this Microsoft has introduced Power Query (on tool of the Power BI suite). Power Query can be used to extract, transform and load data into Excel and/or Power Pivot directly.
When using any data you usually know what the data looks like and what to expect from certain columns – e.g. a value is delivered as a number, a text contains exactly 4 characters, etc.
Though, sometimes this does not apply for all rows of that dataset and your transformation logics may cause errors because of that. In order to avoid this and still have a clean data load you need to handle those errors. Power Query offers several options to this which I will elaborate in this post.
This is the sample data I will use for the following samples:
A | B | C |
1 | 4 | AXI23 |
2 | 5 | TZ560 |
NA | 6 | UP945 |
we will perform simple transformations and type casts on this table to generate some errors:
Error-handling on row-level
This is the easiest way of handling errors. Whenever a transformation causes an error, we can simply remove the whole row from the result set:
This will generate the following result table:
A | B | C |
1 | 4 | AX312 |
2 | 5 | TZ560 |
As you can see, the third row was simply removed. This is the easiest way on how to remove errors from your result set but of course this may not be what you want as those removed rows may contain other important information in other columns! Assume you want to calculate the SUM over all values in column B. As we removed the third row we also removed a value from column B and the SUM is not the same as without the error-handling (9 vs. 15)!
Error-handling on cell-level
As we now know that column A may result in an error, we can handle this error during our transformation steps. As “NA” could not be converted to a number we see Error as result for that column. Clicking on it gives use some more details on the error itself. To handle this error we need to create a new calculated column where we first check if the value can be converted to a number and otherwise return a default numeric value:
The M-function that we use is “try <expressions to try> otherwise <default if error>” which is very similar to a try-catch block in C#. If the expression causes an error, the default will be used instead. Details on the try-expression can be found in the Microsoft Power Query for Excel Formula Language Specification (PDF) which can be found here and is a must read for everyone interested in the M language.
We could further replace our column A by the new column A_cleaned to hide this error handling transformation.
A | B | C | A_cleaned |
1 | 4 | AXI23 | 1 |
2 | 5 | TZ560 | 2 |
NA | 6 | UP945 | 0 |
Error-handling on cell-level with error details
There may also be cases where it is OK to have one of this errors but you need/want to display the cause of the error so that a power user may correct the source data beforehand. Again we can use the try-function, but this time without the otherwise-clause. This will return a record-object for each row:
After expanding the A_Try column and also the A_Try.Error column we will get all available information on the error:
A | B | C | A_Try.HasError | A_Try.Value | A_Try.Error.Reason | A_Try.Error.Message | A_Try.Error.Detail |
1 | 4 | AXI23 | FALSE | 1 | |||
2 | 5 | TZ560 | FALSE | 2 | |||
6 | UP945 | TRUE | DataFormat.Error | Could not convert to Number. | NA |
As you can see we get quite a lot of columns here. We could e.g. use A_Try.HasError to filter out error rows (similar to error-handling on row-level) or we could use it in a calculated column to mimic error-handling on cell-level. What you want to do with all the information is up to you, but in case you don’t need it you should remove all unnecessary columns.
Downloads:
Power Query Error Handling Workbook: Power Query Error Handling.xlsx