When developing notebooks in Microsoft Fabric, we have two different methods to call one notebook from another: %Run and MSSparkUtils.notebook.run
We need to decide which method we would like to use on each situation. The best method to achieve this is comparing both methods and understanding their differences in details:
%Run | MSSparkUtils.notebook.run | Notes |
It needs to be alone in a notebook cell | It can be used together spark regular code | The possibility to combine the call with other pySpark structures makes the call very flexible |
The variables are shared between the caller and the called notebook | The called notebook run in the same pool, but the notebooks don’t share variables | Sharing variables makes the parameter exchange easier.The MSSparkUtil, in this situation, supports a limited type of parameters |
Sharing variables, every variable created is accessible to the caller | The return value needs to be sent using MSSparkUtils.notebook.exit() | On one hand, managing return values can be easier when the variables are shared.On the other hand, this resembles the usage of Global Variables, which was never considered a good practice because the amount of coupling created |
The called default lakehouse is automatically replaced by the caller default lakehouse | If the default lakehouses are different, an error happens, unless you specify the ‘useRootDefaultLakehouse’ as parameter | Overriding the default lakehouse is a good benefit, instead of only pointing an error |
Before the execution, the called notebook is inserted as one additional cell in the caller notebook | The notebooks are still independent | This creates a difference during the debugging process |
Conclusion
Many differences seems to point an advantage to %RUN, but the possibility to use the call in the middle of other pySpark structure is considerable and excludes the %RUN from many scenarios.
We need to keep these differences in mind when analysing which option to use in each scenario.
Load comments