Power BI – Gerhard Brueckl on BI & Data

Release of Fabric Studio v1.0

Posted on 2024-12-09 by Gerhard Brueckl — 1 Comment ↓

I am very proud to announce the first public release of Fabric Studio v1.0 – a VSCode extension that allows you to manage and develop your Fabric workspace(s). Similar to Power BI Studio, it seamlessly integrates into VSCode for increased productivity for professional developers and admins alike.

It includes a lot of different features of which the most notable are probably these:

a generic workspace browser supporting all Fabric item types and their most common API actions
a custom file system provider allowing you to modify Fabric items as if they were local
a dedicated deployment pipeline manager
an integration of the Fabric Git into VSCode source control
a VSCode Fabric notebook to run arbitrary API calls
…

Workspace Browser

The workspace browser gives you an overview of all items that currently exist in your workspaces. This includes all items that currently exist and automatically extends to new items that might get added in the future. For selected items specific entries in the context menu were added e.g. Copy SQL ConnectionString, Run Notebook, …

There is also a common set of actions that exist for every item like opening the selected item directly in the Fabric Service via your browser or copy its ID or Name.

At the top you will find icons that allow you to filter the list of workspaces, refresh the current item, edit the items (e.g. semantic models, pipelines, … see below) or open a notebook that allows you to run arbitrary calls against the Fabric REST API.

Edit Fabric Items from VSCode

Using the context menu in the Workspace Browser you can select Edit Items which will open the definition of the selected item in your VSCode Solution Explorer as a new folder. You can either do this on the workspace level, a specific item type folder (Pipelines, Notebooks, …) or on an individual item. As of now, not all items are supported – here is a list of items that are supported as of now:

Semantic Models using TMDL (.tmdl)
Reports using PBIR (.json)
Data Pipelines using JSON (.json)
Notebooks using Python (.py) or Jupyter Notebooks (.ipynb)
Spark Job Definitions using JSON (.json)
Mirrored Databases using JSON (.json)
…

This feature is implemented using VSCode Custom File System providers which makes it behave as if it were a local file system. This means you can also copy&paste or drag&drop between Fabric and your local file system – in both directions! The use-cases are unlimited here:

easily copy a semantic model or report from one workspace to another
upload the report of a local PBI Project (.pbip) to Fabric without having to also publish and overwrite the dataset
do bulk-edits on your notebooks or pipelines
…

Once you are done with your changes, you can use “Publish to Fabric” to upload them back to Fabric and make the new version available to your users.

Deployment Pipelines

Selectively deploy individual items or whole item types (multi-select!)into the next stage directly from VSCode.

Fabric Git Integration

If your Fabric workspace is linked to GIT, you can now mange it from VSCode as if it were a local repository. Stage/Unstage/Discard changes or pull the latest changes from the underlying GIT repository.

Fabric API Notebooks

As Fabric Studio is solely based on the REST APIs provided by Fabric, I also wanted to offer a way to make running arbitrary API calls as easy as possible. The main problem when it comes to REST APIs is always authentication. As the API is already authenticated in the background, we can use the same mechanisms to also run any other API calls as well. Notebooks in VSCode offer an intuitive way to to do this. Another reason for this generic way of doing API calls is that not all endpoints will be covered by the UI so it just made sense to offer this option as well.

There would be a lot more features worth being mentioned here but instead I will create short demo videos and publish them via my social media channels (Bluesky, X/Twitter, LinkedIn). So to stay up-to-date with the most recent developments, make sure to also follow me there!

The last thing I want to mention is that the whole project is 100% open source and can be used under the MIT license. The repository is currently hosted in my GitHub account: https://github.com/gbrueckl/FabricStudio. If you are interested in the project and maybe want to contribute to it, please reach out to me!

If you like Fabric Studio but are working mainly with Power BI, make sure to also check out Power BI Studio – another extension developed by me, specifically tailored towards Power BI developers and admins!

Release of Power BI Studio v2.0 (VSCode extension)

Posted on 2024-05-28 by Gerhard Brueckl — 12 Comments ↓

Due to the great feedback I have received for the first version of my VSCode extension to mange Power BI objects from within VSCode I decided to continue working on it and am finally happy to share that I am releasing a new version – v2.0!

If you already had the previous version installed in VSCode, you do not have to do anything as it will update automatically. If you are a new user, you can install it from the gallery or search for “Power BI Studio” in the VSCode extensions tab.

Besides adding some new features I also changed the name to “Power BI Studio” to make it more accessible and ease communication. While it is technically still a VSCode extension and still requires VSCode (desktop client or vscode.dev) to run, I think “Power BI Studio” is a much better term and also refers to other established Power BI tools like DAX Studio and SQL Server Management Studio which a lot of users are already familiar with.

But there are also a lot of new features – here are the most important ones:

Integration with External Tools in Power BI Desktop
Show Memory Statistics for Power BI Datasets
new config powerbi.workspaceFilter to filter workspaces
support for Fabric APIs in Power BI notebooks
a lot of bug fixes and minor improvements on existing features

Integration with External Tools in Power BI Desktop

You can now start any of your External Tools you have configured for Power BI Desktop directly from VSCode to automatically connect to an online dataset. (Seriously, how often did you have to search for the proper connection string when you wanted to used DAX Studio or Tabular Editor?!)

Besides the direct integration with Power BI Desktop External Tools you can also simply copy the connectionstring from the context menu of a dataset!

Show Memory Statistics for Power BI Datasets

When analyzing the performance of a Power BI Datasets it is very crucial to understand the memory footprint of your model. To support admins and developers alike to investigate in memory issues, I integrated the DAX queries from my fellow Microsoft Data Platform MVP Hariharan Rajendran that exposes the memory consumption into a pre-defined notebook which can easily be opened from the context menu of the dataset:

The scripts/DAX queries will be updated constantly whenever new functionality becomes available. If you have any ideas/queries you want to have included, please reach out to me!

Other features

The new config setting powerbi.workspaceFilter allows you to use a Regular Expression (RegEx) to filter the workspaces you want to be shown in the UI. This can be very useful if you are working in a large enterprise with many workspaces and you only want to work with a small subset of them. As it is a RegEx, it allows very fine-grained and also very modular filters and you can use | (=RegEx OR) to chain different conditions. To filter for all workspaces that contain Finance or are production workspaces (suffix [PROD]) you could use “Finance|\[PROD\]”

Using Power BI notebooks you can now also query Fabric REST API endpoints.as they are using the very same authentication in the background. To do so, you need to specify the full URL of the Fabric API you want to call or use the command SET API_PATH = https://api.fabric.microsoft.com upfront. As of now the Fabric API does not come with autocompletion so you need to know which endpoints you want to call. Please refer to the official Fabric REST API documentation for more details.

GET https://api.fabric.microsoft.com/v1/workspaces

There will be some more integration with Fabric in the very near future so please stay tuned!

For all other changes/bugfixes/improvements please refer to the official documentation and changelog.

Power BI Studio will still be developed as Open-Source Software (OSS) and contributors are very welcome. Also if you have any other feedback, feature requests or simply found a bug, please file a ticket in the repository.

Announcing the MS Fabric Users Slack Channel

Posted on 2024-03-27 by Gerhard Brueckl — 5 Comments ↓

Are you a Microsoft Fabric user looking to supercharge your collaboration and networking? I am excited to announce the new Slack channel MS Fabric Users which I just created, tailor-made for people who want to engage with a dynamic and supportive community around Microsoft Fabric. With this channel, I want to create a space that’s built for speed, ease, and connectivity.

Why Did I Choose Slack For Communication:

I am already using different Slack channels for other technologies I use on my daily basis (Delta Lake, Databricks Users, VSCode Development, …) and when it comes to interactive collaboration, I think Slack is much more efficient in getting answers to your immediate questions than traditional forums etc. Here are some other reasons why I think that’s the case:

1. Real-Time Communication:

Slack’s instant messaging platform means you can get answers to your questions and feedback on your projects without waiting for forum replies. Connect with peers in seconds, not minutes or hours.

2. Better Organization with Channels:

Create and participate in topic-specific channels that keep discussions focused and relevant. Whether you’re interested in theming, components, or best practices, channels make it easy to find and partake in conversations that matter to you.

3. Enhanced Collaboration:

With Slack, collaboration is not just about talking – it’s about doing. Share code snippets, files, and resources effortlessly with the drag-and-drop interface. Pair that with integrations like GitHub and Trello, and you’ve got a powerful toolkit right at your fingertips.

4. Accessibility On-the-Go:

Stay connected with the community wherever you are. The Slack mobile app provides a seamless experience, ensuring you never miss out on important discussions no matter where you work from.

5. Advanced Search Capabilities:

Slack’s powerful search function makes it easy to find relevant conversations, shared files, and announcements. No more sifting through pages of forum posts to find the information you need.

Join the Fabric Slack Community:

I believe community is key to learning and growing as a developer. By using Slack, you can enhance the way you connect, engage, and support each other as Microsoft Fabric users.

To get started, join the Slack channel today! Simply visit MS Fabric Users to sign up and dive into the conversations happening right now!

Release of Power BI-VSCode Extension

Posted on 2023-10-30 by Gerhard Brueckl — 16 Comments ↓

Download from VSCode Gallery

I am working a lot with Power BI in my daily business and there have always been a couple of things that bothered me since the very beginning. Most of this is related to the web UI and its usability, mainly that you need too many clicks to get to where you want (e.g. viewing Datasets refreshes) but also that some features are simply not exposed in the UI that are possible with the Power BI REST APIs (e.g. rebinding a report to another dataset). So I thought there must be some better way to do this and make management and usability of Power BI easier and I came up with the idea for a Visual Studio Code extension for Power BI to close this gap.

As you may know, I have already written another VSCode extension for Databricks (Databricks Power Tools) which is basically also “just” a wrapper around the various Databricks APIs but makes various features of Databricks much more accessible, especially for people that spend most of their time in a local IDE anyway and are already used to it. At this point I also want to thank my company paiqo for supporting this engagements and making all this possible!

For about a year now I have been developing the Power BI VSCode extension and it finally reached a state where I want to release it. It has been in the VSCode market place for quite some time now but was never officially released by a blog post like this. To stay up-to-date I highly recommend to follow the repository which will always be updated to include the latest features and documentation.

So what is this Power BI VSCode extension all about and how can it help me in my daily work? There are currently three core components included which all serve different purposes:

Workspace browser
Notebooks to run arbitrary API calls and DAX
TMDL editor (!)

The workspace browser allows you to access all artifacts that you have access to and run the most common API calls directly from the VSCode UI. Besides features that are also available in the web UI like taking over an artifact, triggering a refresh/viewing the history or changing parameters, this includes additional features like Rebind, Clone, Configuring Query Scale Out, Update Report Content, etc. For some features you can also use Drag&Drop instead of the context menu. For example, if you drag a report and drop it on a dataset, a popup will ask you whether you want to rebind the report to that dataset or clone the report and link the clone to the dataset!

Besides the workspace browser there is also a dedicated one for Deployment Pipelines which allows you to configure Power BI deployment pipelines and also run selective deployments directly from VSCode!

There are also UIs for Capacities and Gateways, but those are mainly for informational purposes and are read-only.

The second component of the extension are Power BI Notebooks which allow you to run any arbitrary API call . This is especially useful as not every API call can be built into the UI properly (e.g. due to too many parameters, etc.). Power BI Notebooks also support notebook magics like %dax or %cmd to run DAX queries or to set variables within the notebook. There is also intellisense/autocomplete which should help you a lot to discover and write your final API call. This also includes samples for more complex API calls like calling the Enhanced Refresh API.

To run a DAX statement via the Execute Queries API, you can simply use %dax in the first line of the notebook call and then start writing your DAX query:

The last – but definitely not least(!) – part is the just recently added TMDL (Tabular Model Definition Language) integration, which allows you to modify Power BI datasets using TMDL. If your dataset resides in a premium capacity and the XMLA endpoint is enabled for read/write mode, you can select “Edit TMDL” from the context menu of your dataset. This will add a new folder to your VSCode workspace that represents the TMDL structure of that dataset. You can now navigate the individual .tmdl files, change them and validate them. once you are happy with the changes, you can also publish your changes back to the online dataset. The .tmdl files only reside in memory for the time of your VSCode session and will be reloaded every time. If necessary, you can also force a manual reload at any time to the the most recent version from the Power BI service.

Besides this “online”-mode, you can also save the TMDL definition locally – e.g. if you want to check it into a Git repository. The same features as described above are also available for locally stored TMDL definitions. this also includes TMDL definitions generated by other tools like Tabular Editor or pbi-tools!

To ease debugging, there is also a [Go to Error] button if your TMDL is not valid which jumps directly to the faulty TMDL file and highlights the line with the error:

To make this all work, you need to have ASP.NET Core Runtime 7.0 or higher installed as described in the docs.

So whats next?

While I do have some new features already in the backlog, I am also eagerly looking forward to gather some feedback from the community to drive future developments. So if you have a feature that you want to have added to the extension, simply open a feature request in the repository.

Due to the open architecture of VSCode, the extension also integrates with/leverages all other extensions in the Power BI space. Though, there is not much available at the moment but I hope that his ecosystem grows and sooner or later there will be a language extension for DAX or TMDL that provides intellisense/autocomplete here too, or simply syntax highlighting for the very beginning.

As this is an open-source project, you can also contribute directly by creating pull requests. If you like the extension and make sure I don’t run out of coffee while continuously improving it you can also sponsor a cup of coffee for me to contribute to this extension.

Querying Power BI REST API using Fabric Spark SQL

Posted on 2023-08-31 by Gerhard Brueckl — 29 Comments ↓

Microsoft Fabric has a lot of different components which usually work very well together. However, even though Power BI is a fundamental part of Fabric, there is not really a tight integration between Data Engineering components and Power BI. In this blog post I will show you an easy and reusable way to query the Power BI REST API via Fabric SQL in a very straight forward way. The extracted data can then be stored in the data lake e.g. to create a history of your dataset refreshes, the state of your workspaces or any other information that is provided by the REST API.

To achieve this, we need to prepare a couple of things first:

get an access token to work with the Power BI REST API
expose the access token as a SQL variable
create a PySpark function to query the Power BI REST API
expose the PySpark function as a SQL user-defined function
use SQL to query the Power BI REST API

To get an access token for the Power BI REST API we can use mssparkutils.credentials.getToken and provide the OAuth audience for the Power BI REST API which would be https://analysis.windows.net/powerbi/api

pbi_access_token = mssparkutils.credentials.getToken("https://analysis.windows.net/powerbi/api")

We then need to make this token available in Fabric Spark SQL by storing it in a variable:

spark.sql(f"SET pbi_access_token={pbi_access_token}")

The next part is probably the most complex one. We need to write a Python function that runs a query against the Power BI REST API and returns the results in a standardized way. I will not go into too much detail but simply show the code. It basically queries the REST API via a GET request, checks if the result contains a property value with the results and then returns them as a list of items. Please check e.g. the GET Groups REST API call to better understand the structure of the result. The function further adds a new property to each item to make nesting of API calls easier as you will see in the final example.

import requests

# make sure to support different versions of the API path passed to the function
def get_api_path(path: str) -> str:
    base_path = "https://api.powerbi.com/v1.0/myorg/"
    base_items = list(filter(lambda x: x, base_path.split("/")))
    path_items = list(filter(lambda x: x, path.split("/")))

    index = path_items.index(base_items[-1]) if base_items[-1] in path_items else -1

    return base_path + "/".join(path_items[index+1:])

# call the api_path with the given token and return the list in the "value" property
def pbi_api(api_path: str, token: str) -> object:
    
    result = requests.get(get_api_path(api_path), headers = {"authorization": "Bearer " + token})

    if not result.ok:
        return [{"status_code": result.status_code, "error": result.reason}]

    json = result.json()

    if not "value" in json:
        return []

    values = json["value"]

    for value in values:
        if "id" in value:
            value["apiPath"] = f"{api_path}/{value['id']}"
        else:
            value["apiPath"] = f"{api_path}"

    return values

Once we have our Python function, we can make it accessible to Spark. In order to do this, we need to define a Spark data type that is returned by our function. To make it work with all different kinds of API calls without knowing all potential properties that might get returned, we use a map type with string keys and string values to cover all variations in the different APIs. As the result is always a list of items, we wrap our map type into an array type.
The following code exposes it to PySpark and also Spark SQL.

import pyspark.sql.functions as F
import pyspark.sql.types as T

# schema of the function output - an array of maps to make it work with all API outputs
schema = T.ArrayType(
    T.MapType(T.StringType(), T.StringType())
)

# register the function for PySpark
pbi_api_udf = F.udf(lambda api_path, token: pbi_api(api_path, token), schema)

# register the function for SparkSQL
spark.udf.register("pbi_api_udf", pbi_api_udf)

Now we are finally ready to query the Power BI REST API via Spark SQL. We need to use the magic %%sql to tell the notebook engine, we are running SQL code in this one cell. We then run our function in a simple SELECT statement and provide the API endpoint we want to query and a reference to our token-variable using the variable syntax ${variable-name}.

%%sql 
SELECT pbi_api_udf('/groups', '${pbi_access_token}') as workspaces

This will return a table with a single row and a single cell:

However, that cell contains an array which can be exploded to get our actual list of workspaces and their details:

%%sql
SELECT explode(pbi_api_udf('/groups', '${pbi_access_token}')) as workspace

Once you understood those concepts, it is pretty easy to query the Power BI REST API via SQL as this can also be combined with other Spark SQL capabilities like CTEs, e.g. to get a list of all datasets across all workspaces as shown below:

%%sql
WITH cte_workspaces AS (
    SELECT explode(pbi_api_udf('/groups', '${pbi_access_token}')) as workspace
)
SELECT workspace.name, workspace.id, pbi_api_udf(concat(workspace.apiPath, '/datasets'), '${pbi_access_token}') as datasets
FROM cte_workspaces

As you can see, to show a given property as a separate column, you can just use the dot-notation to reference it – e.g. workspace.name or workspace.id

There are endless possibilities using this solution, from easy interactive querying to historically persisting the state of your Power BI objects in your data lake!

Obviously, there are still some things that could be improved. It would be much more elegant to have a Table Valued Function instead of the scalar function that returns an array which needs to be exploded afterwards. However, this is not yet possible in Fabric but will hopefully come soon.

This technique can also be applied to any other APIs that expose data. The most challenging part is usually the authentication but Fabric’s mssparkutils.credentials make it pretty easy for us to do this.

Using Power BI Field Parameters to translate Data and Values

Posted on 2022-06-01 by Gerhard Brueckl — 20 Comments ↓

When building an enterprise reporting solution with Power BI, a question that always comes up is how to handle translations. Large enterprises operate in various countries where people also speak different languages. So a report should be available in all frequently used languages. Ideally, you just create a report once and then a user can decide (or it is decided for him) in which language the report is displayed.

Power BI only partially supports this scenario and the closest we could get *before field parameters* were introduced is already very well described by Chris Webb’s blog post on Implementing Data (As Well As Metadata) Translations In Power BI – a must-read if you need to deal with translations in Power BI. Another good read on the topic is the blog post Multilingual Reports in Power BI from PBI Guy.

As you will quickly realize, the translation of metadata is already pretty easy as it is baked into the engine. Unfortunately this is not the case when you need to translate actual data values (e.g. product names, …). In the multidimensional version of Analysis Services this just worked like a charm as it was also a native feature but this feature never made it to Analysis Services Tabular Models, Azure Analysis Services or Power BI.

The current approaches when it comes to data and value translations are more workarounds than actual solutions. They probably work fine for small data models and very specific use-cases but usually fall short in performance, usability or maintainability when implemented on a larger scale enterprise models.

The recently introduced Field Parameters in Power BI give us a bit more flexibility here and another potential solution to implement data and value translations in Power BI.

Here is what we want to achieve:

create a single report only
support for multiple languages – metadata and column data
only minor changes to the existing data model

How can Field Parameters help here?

Field Parameters allow you to select the columns you want to display in your report/visual on-the-fly. Based on the selection, the reporting engine decides which physical column(s) it needs to use in the query it generates and sends to the data model.
So we can create a Field Parameter for the different columns that hold the translated data values and already easily switch the language by changing the selection of our Field Parameter. This is how our Filed Parameter would be defined:

Translated ProductName = {
    ("product name", NAMEOF('DimProduct'[EnglishProductName]), 0, "en-US"),
    ("nom du produit", NAMEOF('DimProduct'[FrenchProductName]), 1, "fr-FR"),
    ("nombre de producto", NAMEOF('DimProduct'[SpanishProductName]), 2, "es-SP")
}

I did this for all the fields for which translated values are actually provided. Usually this is just a very small subset of all the available columns!

Translated MonthOfYear = {
    ("MonthName", NAMEOF('DimDate'[EnglishMonthName]), 0, "en-US"),
    ("mois de l'année", NAMEOF('DimDate'[FrenchMonthName]), 1, "fr-FR"),
    ("mes del año", NAMEOF('DimDate'[SpanishMonthName]), 2, "es-SP")
}

Translated DayOfWeek = {
    ("Day Of Week", NAMEOF('DimDate'[EnglishDayNameOfWeek]), 0, "en-US"),
    ("jour de la semaine", NAMEOF('DimDate'[FrenchDayNameOfWeek]), 1, "fr-FR"),
    ("día de la semana", NAMEOF('DimDate'[SpanishDayNameOfWeek]), 2, "es-SP")
}

As you can see, Field Parameters allow you to translate the metadata (first value) and also to define the column to use for the data values (second value, using NAMEOF() function).

To change all field parameters at once I introduced an additional 4th column that holds the culture/language of the current row which is then linked to another static DAX table that is defined as follows:

Language = DATATABLE("Culture", STRING, {{"en-US"}, {"fr-FR"}, {"es-SP"}})

Then relationships are set up between these tables:

In your report you can now simply use the column from the field parameters and add a slicer for the Language table to control which language is displayed. Note: this must be a single-select slicer as otherwise Power BI will build a hierarchy of the different languages!

Here is the final result:

(please use Full Screen mode from bottom right corner)

As you can see, we just created a single report that supports multiple languages for both, metadata and data values, allows you to easily switch between them and provides similar performance as if you would have built the report for a single language only!

There are still some open questions when it comes to translating all the labels used on the whole report which is already partially covered in the other blog posts referenced above but this approach brings us another step further to a fully translatable report.

Another nice feature of this approach is that you can also put security on top of the Language/Culture table so a user only sees exactly one row – the one with the language/culture of his choice or country. So a user would not even need to select the language but it would be selected for him automatically!
Ideally you could even use the USERCULTURE() DAX function but unfortunately this is currently not supported in the PBI service. There is already an open idea for which you can vote if this is important to you.
USERCULTUER() DAX function is now finally general available also in the service: https://powerbi.microsoft.com/en-us/blog/userculture-dax-function-now-supported-in-power-bi-premium/

The .pbix file can be downloaded here: PBI_Translations.pbix

Automating the Extraction of BIM metadata from PBIX Files using CI/CD pipelines

Posted on 2022-02-01 by Gerhard Brueckl — 26 Comments ↓

The latest updates can always be found in the

PowerBI.CICD repository

In the past I have been working on a lot of different Power BI projects and it has always been (and still is) a pain when it comes to the deployment of changes across multiple tiers(e.g. Dev/Test/Prod). The main problem here being that a file generated in Power BI desktop (.pbix) is basically a binary file and the metadata of the actual data model (BIM) cannot be easily extracted or derived. This causes a lot of problems upstream when you want to automate the deployment using CI/CD pipelines. Here are some common approaches to tackle these issues:

Use of Power BI deployment pipelines
The most native solution, however quite inflexible when it comes to custom and conditional deployments to multiple stages
Creation a Power BI Template (.pbit) in addition to your .pbix file and check in both
This works because the .pbit file basically contains the BIM file but its creation is also a manual step
Extraction of the BIM file while PBI desktop is still running (e.g. using Tabular Editor)
With the support of external tools this is quite easy, but is still a manual step and requires a 3rd party tool
Development outside of PBI desktop (e.g. using Tabular Editor)
Probably the best solution but unfortunately not really suited for business users and for the data model only but not for the Power Queries

As you can see, there are indeed some options, but none of them is really ideal, especially not for a regular business user (not talking about IT pros). So I made up my mind and came up with the following list of things that I would want to see for proper CI/CD with Power BI files:

Users should be able to work with their tool of choice (usually PBI desktop, optional with Tabular Editor or any other 3rd party tool)
Automatically extracting the metadata whenever the data model changes
Persisting the metadata (BIM) in git to allow easy tracking of changes
Using the persisted BIM file for further automation (CD)

Solution

The core idea of the solution is to use CI/CD pipelines that automatically extracts the metadata of a .pbix file as soon as it is pushed to the Git repository. To do this, the .pbix file is automatically uploaded to a Power BI Premium workspace using the Power BI REST API and the free version of Tabular Editor 2 then extracts the BIM file via the XMLA endpoint and push it back to the repository.

I packaged this logic into ready-to-use YAML pipelines for Github Actions and Azure DevOps Pipelines being the two most common choices to use with Power BI. You can just copy the YAML files from the PowerBI.CICD repository to your own repo. Then simply provide the necessary information to authentication against the Power BI service and that’s it. As soon as everything is set up correctly. the pipeline will automatically create a .database.json for every PBIX file that you upload (assuming it contains a data model) and track it in your git repository!

All further details can be found directly in the repository which is also updated frequently!

Doing relative-time Slicers properly in Power BI

Posted on 2021-09-01 by Gerhard Brueckl — 6 Comments ↓

A very common requirement for a Power BI report that I stumble across at almost all of my customers is to automatically show data for the current day/month/year when a report is opened. At first sight this seems like a very trivial problem but once you dig into the problem, you will realize that all of the common solutions out there have some disadvantages and only solve the problem partially.

So here is what we want to achieve:

Show the Current Month (or Day, or Year) by default
Works [in combination] with all other columns in the date table.
A single, easy to use slicer/filter to control the time selection and change from Current Month/Day/Year to any other value
Works with built-in time intelligence functions
Works with existing DAX measures
Works with any datamodel/report

Solutions like Relative Time Filter/Slicer, DAX or relative flags in the date table address only some points of the above list but definitely not all of them which is why I thought we need a better solution to this:

(please use full-screen mode)

We actually created a new table in our data model that is linked to the original date table. The reason why we cannot use the same table here is that the new table does not have unique date values as all dates/rows referring to our current calculations are duplicated. It has to be a many-to-one relationship with cross-filter direction set to both (even though we will only use the new table ‘Calendar_with_current’ to filter the existing table ‘Calendar’):

And that’s it basically. You can now exchange the original Calendar table with the new one to get the new “Current” values in your report. If you have time intelligence functions in place, you further need to extend them and add ALL('Calendar_with_current ') as a filter to make them work also with the new table. The old table can also be hidden now if you do not want to confuse the end users. To make a seamless switch you can further rename the tables.

I added an additional column to the table called Type that allows you to select which values you want to show – the original values (e.g. “September”), the values with “Current X” (e.g. “Current Month”), or both.Please see the second page/tab of the embedded report above.

So this raises the question how this new table can be created? To simplify this I have created a Power Query function that takes 3 parameters:

The current date table
A list of definitions of your current-values
The name of the unique date-column in your current date table (parameter 1)

The first and the third parameter should be clear, but what are the “CurrentDefinitions”?

It is basically a table which defines the relative time calculations that you want to extend your existing date table with. Here is an example:

The column Column refers to the column in which you want to create the relative date definition. The column NewValue specifies the value that you want to set for rows that match the third column Filter. The column Filter either takes a static filter expression like [RelativeMonth] = 1 (as in lines 5-8) but can also use existing M-functions and reference the existing Date-column using the placeholder <<DateColumn>> as you can see in lines 1-4.

The table can be maintained using “Enter Data” and can contain any number of rows/definitions!

For most of my scenarios this works pretty well and addresses all major problems highlighted above.

The latest Power Query function can be downloaded from my github repository: fn_DateTableWithCurrentCalculations.pq
Power BI desktop files can be downloaded from here: PBIX PBIT

Custom functions and complex return types in Power Query

Posted on 2021-07-26 by Gerhard Brueckl — 2 Comments ↓

When working with Power Query, you have probably already realized that every expression you write returns a value of a specific type. Usually this will be a primitive type like text, number, or date. (A full list of types available in Power Query can be found here: https://docs.microsoft.com/en-us/powerquery-m/m-spec-types). If for some reason the type of an expression cannot be defined, the special type *any* will be used. For sure you already encountered this when using Table.AddColumn which, by default, results in the new column being of type *any*.

To avoid this, you can use the optional fourth parameter and specify the resulting type of the expression. This can be very handy and saves you the Change Type step that usually comes afterwards.

This fourth parameter not only works for primitive types but also for complex types. If you do not specify it, the column type is again *any* even though the actual values are records:

Once you click the Expand-Button of the new MyRecord column in the table header, you will realize a short delay until the available fields are displayed. This indicates that PQ first has to evaluate the expression before it can provide you the list of fields within the record. For complex scenarios, this can take a long time and can also be avoided by explicitly specifying the type in the fourth parameter as shown below:

As you can see, PQ can now immediately display the available columns without having to evaluate the function!

This also works the very same way if you call a custom function as expression of your Table.AddColumn. But there is the caveat: If you have a function that returns a complex type, let’s say a record, you will usually want to specify the type as part of the function or within the definition of the function and not re-type it again each time you call the function.

Fortunately, there is a solution to this problem: the function Type.FunctionReturn. In combination with Value.Type you can derive the return type from the function dynamically!

So, if you have the following function:

let
    myTextFunction = () as text => "Called fn_myTextFunction!"
in
    myTextFunction

You can derive the resulting type of the function by using a combination of Type.FunctionReturn and Value.Type as shown below:

Ok, so this is already pretty cool – but what happens if your function returns a complex type like a record or a table?

You will realize that you can simply replace “as text” with “as record” and the function would now return a record – at least logically, you also need to change the actual expression:

let
    myFunction = () as record => [MyText="Called fn_MyTextFunction!"]
in
    myFunction

and then call it as before:

You will realize that now again it takes some time until the available fields are displayed indicating the function must be evaluated first. Another indicator for this to happen is the warning at the bottom and the link to “Load more”. If you think of it, this makes sense – Power Query knows that the function now returns a record, but does not know which fields the record contains and thus has to evaluate it. So how can we combine custom function that return complex types and the ability to specify the resulting type as part of the function?

The first thing that would come to your mind is to simply strong-type the return type of the function specifying each field individually, but this will result in an error:

Currently it is not supported to specify a complex type as the return type of a function – it only works with primitive types. But as you can guess, I did not start this blog post for no reason. There is a way to achieve this, even though it may not be as nice as it could and should be.

The solution here is the Type.ForFunction function which allows you to create a more specific definition of your function including the return type. This definition/type can then be applied to your original function using Value.ReplaceType:

let
    resultType = type [TextValue=text, NumValue=number],
    myFunction = (myText as text, myNum as number) => [TextValue=myText, NumValue=myNum],

    parameters = Type.FunctionParameters(Value.Type(myFunction)),
    requiredParameters = Type.FunctionRequiredParameters(Value.Type(myFunction)),

    newFunctionType = Type.ForFunction([ReturnType = resultType, Parameters = parameters], requiredParameters),

    newFunction = Value.ReplaceType(myFunction, newFunctionType)
in
    newFunction

You basically first define the final return type of the function and the function itself (lines 2 and 3). The other lines (5 to 10) take care of applying the return type to the function which can then be used in combination with the approach above to dynamically derive the return type when calling the function (using ype.FunctionReturn and Value.Type). This now allows you to specify everything that is related to the function in one place!

This is especially handy if you have a function that returns a record or a table which is re-used multiple times and the fields/columns may change over time. Using this approach allows you to only change the function and everything else is derived automatically.

Sample workbook for download: PQ Custom Function return types.pbix

Reading Delta Lake Tables natively in PowerBI

Posted on 2021-01-29 by Gerhard Brueckl — 42 Comments ↓

I also contributed the connector described in this post to the official delta.io Connectors page and repo (link). You will find the most recent updates in my personal repo which are then merged to the official repo once it has been tested thoroughly!

Working with analytical data platforms and big data on a daily basis, I was quite happy when Microsoft finally announced a connector for Parquet files back in November 2020. The Parquet file format is developed by the Apache foundation as an open-source project and has become a fundamental part of most data lake systems nowadays.

“Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.”

However, Parquet is just a file format and does not really support you when it comes to data management. Common data manipulation operations (DML) like updates and deletes still need to be handled manually by the data pipeline. This was one of the reasons why Delta Lake (delta.io) was developed besides a lot of other features like ACID transactions, proper meta data handling and a lot more. If you are interested in the details, please follow the link above.

So what is a Delta Lake table and how is it related to Parquet? Basically a Delta Lake table is a folder in your Data Lake (or wherever you store your data) and consists of two parts:

Delta log files (in the sub-folder _delta_log)
Data files (Parquet files in the root folder or sub-folders if partitioning is used)

The Delta log persists all transactions that modified the data or meta data in the table. For example, if you execute an INSERT statement, a new transaction is created in the Delta log and a new file is added to the data files which is referenced by the Delta log. If a DELETE statement is executed, a particular set of data files is (logically) removed from the Delta log but the data file still resides in the folder for a certain time. So we cannot just simply read all Parquet files in the root folder but need to process the Delta log first so we know which Parquet files are valid for the latest state of the table.

These logs are usually stored as JSON files (actually JSONL files to be more precise). After 10 transactions, a so-called checkpoint-file is created which is in Parquet format and stores all transactions up to that point in time. The relevant logs for the final table are then the combination of the last checkpoint-file and the JSON files that were created afterwards. If you are interested in all the details on how the Delta Log works, here is the full Delta Log protocol specification.

From those logs we get the information which Parquet files in the main folder must be processed to obtain the final table. The content of those Parquet files can then simply be combined and loaded into PowerBI.

I encapsulated all this logic into a custom Power Query function which takes the folder listing of the Delta table folder as input and returns the content of the Delta table. The folder listing can either come from an Azure Data Lake Store, a local folder, or an Azure Blob Storage. The mandatory fields/columns are [Content], [Name] and [Folder Path]. There is also an optional parameter which allows you the specify further options for reading the Delta table like the Version if you want to use time-travel. However, this is still experimental and if you want to get the latest state of the table, you can simply omit it.

The most current M-code for the function can be found in my Github repository for PowerBI: fn_ReadDeltaTable.pq and will also be constantly updated there if I find any improvement.
The repository also contains an PowerBI desktop file (.pbix) where you can see the single steps that make up for the final function.

Once you have added the function to your PowerBI / Power Query environment you can call it like this:

= fn_ReadDeltaTable(
    AzureStorage.DataLake(
        "https://myadls.dfs.core.windows.net/public/data/MyDeltaTable.delta", 
        [HierarchicalNavigation = false]), 
    [Version = 12])

I would further recommend to nest your queries and separate the access to the storage (e.g. Azure Data Lake Store) and the reading of the table (execution of the function). If you are reading for an ADLS, it is mandatory to also specify [HierarchicalNavigation = false] !
~~If you are reading from a blob storage, the standard folder listing is slightly different and needs to be changed~~.

Right now the connector/function is still experimental and performance is not yet optimal. But I hope to get this fixed in the near future to have a native way to read and finally visualize Delta lake tables in PowerBI.

After some thorough testing the connector/function finally reached a state where it can be used without any major blocking issues, however there are still some known limitations:

Partitioned tables
- ~~currently columns used for partitioning will always have the value NULL~~ FIXED!
- ~~values for partitioning columns are not stored as part of the parquet file but need to be derived from the folder path~~ FIXED!
Performance
- ~~is currently not great but this is mainly related to the Parquet connector as it seems~~
- very much depends on your data – please test on your own!
Time Travel
- ~~currently only supports “VERSION AS OF”~~
- ~~need to add “TIMESTAMP AS OF”~~
Predicate Pushdown / Partition Elimination
- ~~currently not supported – it always reads the whole table~~ FIXED!

Any feedback is welcome!

Special thanks also goes to Imke Feldmann (@TheBIccountant, blog) and Chris Webb (@cwebb_bi, blog) who helped me writing and tuning the PQ function!

Downloads: fn_ReadDeltaTable.pq (M-code)

Fabric API Notebooks

Share this:

Integration with External Tools in Power BI Desktop

Show Memory Statistics for Power BI Datasets

Other features

Share this:

Share this:

Share this:

Share this:

Share this:

Solution

Share this:

Share this:

Share this:

Share this: