I was working on an imported Power BI semantic model, adding some fiscal year calculations to my date table. The date table was sourced from a view in Databricks Unity Catalog. I didn’t have access to add more fields to the view, so I was adding the fields in Power Query first, with plans to request they be added to the view in the future. I got some unexpected results, which turned into a bug being logged for the ODBC code for Power Query.

If you are only analyzing data in the last 20 years, you won’t see this bug. But if you are doing long-term analysis including years before 2000, you might just run into it.

The Scenario

I encountered the bug when I was adding fiscal start of year and fiscal end of year columns. I used the following formula:
Date.AddMonths(Date.StartOfYear([The Date]), FiscalYearEndMonth - 12)

For the current year, I was getting correct results. For example, my fiscal year end month was 8, so for the date 1 April 2024, I was expecting the result to be 1 September 2023. But for years prior to 2000, the results were off by one for every date in the year except the first day of the year.

I isolated the issue to the Date.StartOfYear function. Then I created a simplified reproduction by making a view with a column of dates using the date data type in Unity Catalog. I populated the view with 6 dates, as shown below. I then created a custom column with the formula Date.StartOfYear([TestDate]).

A table in the Power Query Editor showing that it calculated the start of year for 2 Jan 1990 as 1 Jan 1991.
My Power Query custom column added to a table sourced from Databricks with 6 rows of dates producing incorrect results

Power Query could correctly calculate the year, but it could not calculate the first or last date of the year.

I tried a couple of variations on this test. Changing the data type of the TestDate column to timestamp in Unity Catalog leads to a Date/Time column in Power Query, which produces the same incorrect results. I then tried the same calculations with a view built in Azure SQL Database. This produced correct results! Manually pasting data into a table also produced correct results.

I sent this info to Microsoft, and they confirmed there is a bug in the ODBC code used to query Databricks related to calculating differences in dates. Because this custom column folded back into the query executed in Databricks, the results were calculated in the Databricks engine rather than the Power Query engine.

Issue Summary

Where: Power Query ODBC code used to query Databricks

What’s affected: A handful of date calculations including Date.StartOfYear and Date.EndOfYear when used on date values prior to 1 Jan 2000.

Impact: Low – most people are not doing analysis on data prior to 2000 using an imported table in a Power BI semantic model with a Databricks source.

Workarounds

I found 3 workarounds that can be used:

  1. Move the calculation further upstream into Databricks (this is generally a good idea and follows Roche’s Maxim of Data Transformation).
  2. Use the Table.StopFolding function to keep the date calculation from being performed in the source engine.
  3. Change the data type of the date column from the source data to Date/Time/Timezone in Power Query, then perform the date calculations.

Option 1 is the best solution, but we don’t always have control over our source data. If you can update your source table/view, please do that and don’t bother worrying about this bug.

Option 2 requires you to use the Advanced Editor to modify your M code, but it’s a quick 1-minute change. Add a new line after importing the data but before performing the date calculation. Reference the output from the previous step in the Table.StopFolding function. Change the subsequent line to reference the output of your new step.

let
    Source = Databricks.Catalogs("adb-6021692154917656.16.azuredatabricks.net", "sql/protocolv1/o/6021692154917656/1202-172413-pp03vcis", [Catalog = "", Database = ""]),
    main_Database = Source{[Name="main",Kind="Database"]}[Data],
    default_Schema = main_Database{[Name="default",Kind="Schema"]}[Data],
    datetest_View = default_Schema{[Name="datetest",Kind="View"]}[Data],
    PreventQueryFolding = Table.StopFolding(datetest_View), 
    #"Inserted Start of Year" = Table.AddColumn(PreventQueryFolding, "Start of Year", each Date.StartOfYear([TestDate]), type date),
    #"Inserted Year" = Table.AddColumn(#"Inserted Start of Year", "Year", each Date.Year([TestDate]), Int64.Type)
in
    #"Inserted Year"
Example of using the Table.StopFolding function to work around the date calculation issue

Option 2 works fine on small tables, but I wouldn’t advise disabling query folding on a very large table just to make the date math work. Hopefully, you have a nice star schema with a (somewhat small) date dimension instead of trying to do the date math in a large fact table or a large single-table model.

If you don’t feel comfortable using the Advanced Editor, changing the data type of the source date column from date or date/time to date/time/timezone in Power Query (before you do the date calculations) will achieve the same result. You can change the data type back in a subsequent step after performing the date calculations. This is my least favorite option, but it does achieve the goal and is probably more beginner-friendly than the other options. Once again, this is not ideal on a large table.

Conclusion

As I stated above, this bug is a bit obscure, only affecting older dates in data sourced from non-SQL Server ODBC connections. But I wanted to share some workarounds on the off chance someone else encountered it.

If you have run into this, I’d love to hear about it in the comments.