Calculating the Events-In-Progress is a very common requirement and many of my fellow bloggers like Chris Webb, Alberto Ferrari and Jason Thomas already blogged about it and came up with some really nice solutions. Alberto also wrote a white-paper summing up all their findings which is a must-read for every DAX and Tabular/PowerPivot developer.

However, I recently had a slightly different requirement where I needed to calculate the Events-In-Progress for Time Periods – e.g. the Open Orders in a given month – and not only for a single day. The calculations shown in the white-paper only work for a single day so I had to come up with my own calculation to deal with this particular problem.

Before we can start we need to identify which orders we actually want to count if a Time Period is selected. Basically we have to differentiate between 6 types of Orders for our calculation and which of them we want to filter or not:

Order | Definition |

Order1 (O1) | Starts before the Time Period and ends after it |

Order2 (O2) | Starts before the Time Period and ends in it |

Order3 (O3) | Starts in the Time Period and ends after it |

Order4 (O4) | Starts and ends in the Time Period |

Order5 (O5) | Starts and ends after the Time Period |

Order6 (O6) | Starts and ends before the Time Period |

For my customer an order was considered as “open” if it was open within the selected Time Period, so in our case we need to count only Orders O1, O2, O3 and O4. The first calculation you would usually come up with may look like this:

- [MyOpenOrders_FILTER] :=
- CALCULATE (
- DISTINCTCOUNT ( 'Internet Sales'[Sales Order Number] ),
- FILTER (
- 'Internet Sales',
- 'Internet Sales'[Order Date]
- <= CALCULATE ( MAX ( 'Date'[Date] ) )
- ),
- FILTER (
- 'Internet Sales',
- 'Internet Sales'[Ship Date]
- >= CALCULATE ( MIN ( 'Date'[Date] ) )
- )
- )

We apply custom filters here to get all orders that were __ordered on or before the last day__ and were also __shipped on or after the first day__ of the selected Time Period. This is pretty straight forward and works just fine from a business point of view. However, performance could be much better as you probably already guessed if you read Alberto’s white-paper.

So I integrate his logic into my calculation and came up with this formula (Note that I could not use the final Yoda-Solution as I am using a DISTINCTCOUNT here):

- [MyOpenOrders_TimePeriod] :=
- CALCULATE (
- DISTINCTCOUNT ( 'Internet Sales'[Sales Order Number] ),
- GENERATE (
- VALUES ( 'Date'[Date] ),
- FILTER (
- 'Internet Sales',
- CONTAINS (
- DATESBETWEEN (
- 'Date'[Date],
- 'Internet Sales'[Order Date],
- 'Internet Sales'[Ship Date]
- ),
- [Date], 'Date'[Date]
- )
- )
- )
- )

To better understand the calculation you may want to rephrase the original requirement to this: “An open order is an order that was open on at least one day in the selected Time Period”.

I am not going to explain the calculations in detail again as the approach was already very well explained by Alberto and the concepts are the very same.

An alternative calculation would also be this one which of course produces the same results but performs “different”:

- [MyOpenOrders_TimePeriod2] :=
- CALCULATE (
- DISTINCTCOUNT ( 'Internet Sales'[Sales Order Number] ),
- FILTER (
- GENERATE (
- SUMMARIZE (
- 'Internet Sales',
- 'Internet Sales'[Order Date],
- 'Internet Sales'[Ship Date]
- ),
- DATESBETWEEN (
- 'Date'[Date],
- 'Internet Sales'[Order Date],
- 'Internet Sales'[Ship Date]
- )
- ),
- CONTAINS ( VALUES ( 'Date'[Date] ), [Date], 'Date'[Date] )
- )
- )

I said it performs “different” as for all DAX calculations, performance also depends on your model, the data and the distribution and granularity of the data. So you should test which calculation performs best in your scenario. I did a simple comparison in terms of query performance for AdventureWorks and also my customer’s model and results are slightly different:

Calculation (Results in ms) | AdventureWorks | Customer’s Model |

[MyOpenOrders_FILTER] | 58.0 | 1,094.0 |

[MyOpenOrders_TimePeriod] | 40.0 | 390.8 |

[MyOpenOrders_TimePeriod2] | 35.5 | 448.3 |

As you can see, the original FILTER-calculation performs worst on both models. The last calculation performs better on the small AdventureWorks-Model whereas on my customer’s model (16 Mio rows) the calculation in the middle performs best. So it’s up to you (and your model) which calculation you should prefer.

The neat thing is that all three calculations can be used with any existing hierarchy or column in your Date-table and of course also on the Date-Level as the original calculation.

nice piece of work, Gerhard! Keep up the great work and contribution to the community! Happy Holidays!

Thanks for sharing!

I found that in the real world (hard to see in Adventure Works) you can achieve a good optimization by applying a filter on a low-cardinality column in the fact table. For example, if you are filtering a range of dates and the maximum distance between Order Date and Ship Date is 60 days, you might exclude all the “other” years. Depending on data distribution, this type of optimization can be very effective (you should remove at least 80-90% of rows in order to see the improvement).

Hey Marco,

I am aware that it is very beneficial in terms of performance if the distance between the two used dates (Order Date and Ship Date) is low but what exactly do you mean by “excluding all the ‘other’ years”?

I guess what you wanted to point out is that if e.g. August 2014 is selected and the maximum distance is 60 days the calculation only needs to consider June 2014 until October 2014 which would improve performance significantly

-gerhard

Yes exactly that!

Hey Gerhard,

this is an awesome piece of work, but it seems I have some trouble to use.

In my scenario I use an date table that is not related to the facttable, using all of your 3 solutions provide the same results (gladly).

This is something like an aggregated distinctcount over time …

Unfortunately this does not work with PowerView due to the lack of the relationship. As soon as I create the relationship I no longer have my (aggregated distinct count over time). Do you have an idea how I can solve my problem?

Kind regards and once again thanks for sharing

Tom

Hi Tom,

what exactly do you mean by “it does not work with PowerView”?

I guess you are referring to dynamic filtering? e.g. clicking on a bar-chart?

As soon as you have an active relationship between the Fact-table and the Date-table you would need to adopt the calculations. Otherwise you would have two relationships, the active one and the one that is created on the fly using FILTER/CONTAINS

-gerhard

Hey Gerhard,

in my first solution there was no relationship between the date table and the fact table this worked smoothly using the generate/contains approach.

Until PowerView reminded me that it needs a relationship, this was the reason why I wrote “it does not work in PowerView” 🙂

Now I have two date dimensions one is related (the normal date table) and one that is not related (even not inactive), I call it date_calc.

I adjusted the Generate/Contains approach a little and voila. I’m glad that there are some days off ahead, so that there is some time to optimize my solution.

Now I’m able to track the development (evolution) of things within a certain timeframe using a status, where the things reach a status (hopefully) or stay at their first status (the do not evolve).

Once again, thanks for your post! It saved me a lot of thinking 🙂

Hey Gerhard,

Thank you for this great post. I have a similar but slightly different question:

What would I do If i need to count the number of sales orders that have been ordered but not delivered on a single day?

Thank you!

Wesley

Hi Wesley,

I think this is very similar to what Chris Webb discussed in his post: http://blog.crossjoin.co.uk/2013/06/13/a-new-events-in-progress-dax-pattern/

and what is also used as a starting point for this blog post.

If your request is simply to find orders which were ordered but not delivered on the very same day, you could imply use this formula:

SoldButNotShippedSameDay:=CALCULATE(

DISTINCTCOUNT(‘Facts'[OrderID]),

FILTER(‘Facts’,

‘Facts'[OrderDate] <> ‘Facts'[ShipDate])

)

hope that helps to solve your issue

-gherard

Hi Gherard,

I have a similar problem to Wesley. The problem I am encountering is that I am receiving a time period from the report and would like to evaluate if an order was shipped but not received on the first day of that date range.

Thanks!

Nishka

Hi Nishka,

do you mean that if you select July, you want to find orders that have been shipped on July 1st but not received on July 1st?

or shipped on July 1st but not received in the whole time-period of the report?

-gerhard

Yes I mean the first scenario, if the selection is June I want to check if on June 1st the order was shipped but not received

sorry for the delay but I was on vacation.

so for orders that have been shipped on June 1st but not received on

a) June 1st

b) until June 31st (last day of selection)

c) until now (not received at all)

which one is correct for you?

Hello,

I have utilized the dax code behind this idea in an effort to breakdown open case age by month. Had to make a slight modification in terms of Min/Max and the inequality signs to pull the total open cases at the end of the month:

Open Cases = CALCULATE( DISTINCTCOUNT(DocWebTouchPoints_dw_Incident[IncidentID]),

FILTER( DocWebTouchPoints_dw_Incident, DocWebTouchPoints_dw_Incident[ArriveDateTime] CALCULATE(MAX(‘Date Table'[Date Key]))))

However, I am still getting total cases that arrived prior to end of month AND closed within the last month. (There is no relationship between fields used.)

Would greatly appreciate your assistance or a redirect to a webpage resource as this has been boggling my mind for a few days now.

Woops, this is the code:

Open Cases = CALCULATE( DISTINCTCOUNT(DocWebTouchPoints_dw_Incident[IncidentID]),

FILTER( DocWebTouchPoints_dw_Incident, DocWebTouchPoints_dw_Incident[ArriveDateTime] CALCULATE(MAX(‘Date Table'[Date Key]))))

Hi Vanessa,

it seems that the operator (“=”) got lost in your comment – assuming that it was an equal sign as in your other comment. then the calculation would simply return all distinct incidents which ARRIVED on the last day of the selected time period.

you would need to filter on both, Arrival-Date and Close-Date.

You should first check out the articles that i linked to at the beginning of the blog and make your calculation work for a single day – then you can adopt it accordingly as described here

kind regards,

-gerhard

Open Cases SR = CALCULATE ( DISTINCTCOUNT(DocWebTouchPoints_dw_Incident[IncidentID]),

FILTER(DocWebTouchPoints_dw_Incident, DocWebTouchPoints_dw_Incident[ArriveDateTime] = CALCULATE(MAX(‘Date Table'[Date Key])) && DocWebTouchPoints_dw_Incident[Type] = “SERVICE REQUEST”))

Hi Gerhard,

I have a similar situtation where user wants to select a date and the report has to find active ppl that point of time. I have only one table which has the start date and end date. And I have a time dimension. I don’t know which column should I join the date dimension to (if i should). As until I join ssas tabular won’t find a unique column.

Kindly share your expert suggestions.

Best Regards

GS

Hi GS,

the links at the very beginning of the post should cover your problem, this post is just an advanced version which also allows you to select a time range (e.g. months or weeks)

In general, you will not have an (active) relationship between your tables as relationships always require you to link two columns directly but do not allow “between”-joins. Thats why we need to use FILTER() and other functions to create this join on the fly.

but there also seems to be an issue with your Date-Table as it should contain unique dates and you should be able to link it to your fact-table – but again, for all Events-In-Progress-Calculations you must not have any active relationships between your tables

hth,

-gerhard

Dear Gerhard

I am new to this blog, my apologies, if I am too direct.

I have a problem for which I am unable to find a solution. I was wondering if you could give some solution, would be really appreciative.

I am working in excel powerpivot, have slight knowledge of DAX.

I have a table in which I have userIDs, and their level play details. The details include level no, start time and end time of each level. I want to calculate the session count and approx duration for the for each session.

The sessions will be ofcourse for each individual user ID.

A Session will/can include multiple levels, I am taking all levels that start within 5 minutes of completion of previous level as one sesssion.

Waiting for any and all guidance.

Best Regards/Aamer

Hi Aamer,

you can probably use the very same formula as I used in my example. You dont even need to take care of your +/- 5 mintues in between levels as you can do a distinct count on UserID and it does not matter whether he played 7 levels or just 1 level.

The average session duration will probably be more tricky. You would need a session-id or something, calculating this on-the-fly might not be feasible in DAX. The only thing I could think of is to create an artificial parent-child hierarchy between the different levels of the user and use the first part as the session-identifier

kind regards,

-gerhard

ok, so I created a little sample-file where I ceate an artificial SessionID calculated column based on start/end-time and the +/-5 minutes rule

you can then use this to drive all your calculations

the file can be found here

http://files.gbrueckl.at/blog/Events-In-Progress_9F14/Sessions.xlsx

let me know if you have any further questions!

-gerhard

Thanks Gerhard

Much appreciated, solved my problem

Best regards

Aamer

Gerhard,

Thanks for this write-up. This is a very common business problem, could be orders or subscriptions like I am working on now. All customers has subscriptions with FromDate and ToDate on them, one row for each subscription, with an ID for the subscription which in addition to the customerid is the row identifier :

CustomerID SubscriptionID FromDate ToDate

1000000 1 05/01/2016 16/03/2016

1000000 2 16/03/2016 NULL

1000001 1 27/05/2013 NULL

………..

I have a couple of million of these in my fact table.

So, counting customers/subscriptions has already been solved thanks to Alberto’s and your article here. However, I would also like to use this table to calculate Churn/Retention etc. for our customers. The new and returning customers pattern may be applicable but that is based on sales transactions and in this case that is not good enough, since customers are billed monthly/quarterly etc. So, to get Churn I need to know which customers did I have last period (month if that is selected, year if that is selected) that I don’t have any longer.

Any thoughts on how to proceed here?

Hi Stian,

thats actually a very interesting question!

So you want to know customers which existed in the past but do not exist anymore in the current period. As we cannot count something that does not exist anymore in the current period, we need to do the calculation the other way round – find active customers in the previous period and check if they still have a valid subscription in the current one.

Unfortunately I do not have too much time at the moment so I can just give you a littel snippet from which you can start working:

Churn:=COUNTROWS(),

FILTER(

ADDCOLUMNS(

CALCULATETABLE(

SUMMARIZE(

Subscriptions,

Subscriptions[CustomerID],

Subscriptions[SubscriptionID]),

“HasNextSubscription”, CALCULATE(COUNTROWS(Subscriptions), Subscriptions[SubscriptionID] > EARLIER(Subscriptions[SubscriptionID])) > 0)

[HasNextSubscription] = TRUE())

There are probably some typos in there but you should get an idea:

Find all customers with their SubscriptionIDs for the previous period

Check if they have any newer SubscriptionsIDs (which I assume is an ascending number?)

if they dont have any newer SubscriptionIDs, they churned

you probably need to add an extra check on the subscription validity and if it includes TODAY to avoid that all todays customers are counted as churn 😉

hope that helps

would be great if you can keep me updated on this!

-gerhard

If you are using SSAS 2016 or PowerBI, you might also take a look at the new set functions like EXCEPT in combination with DAX Variables.

You could calculate a list of customers of the previous period and use EXCEPT to remove the cutsomers of the current period so only the lost customers are left. then you can do a simple COUNTROWS and would also get your results

would also be interesting in terms of performance

another approach would be to flag the last SubscriptionID of each customer in a calculated column as = partitioned by customer

having this flag you can simply extend your existing calculation with [Flag] = TRUE() to get only churned customers

Hi,

I have a very similar case with retail networ where each location of stores across the country has a unique ID. Location are taken over from one company to another (eg. retail chain acquires small shop run by a private owner). Start date and end date of store operations are available. I need to calculate how many time location was taken over during in a selected time period. It would be grate to apply the solution of churning subscribers to my case as well.

I hope Gerhard will help for Stian to go forward with the churn pattern!

Paweł