In this age of data lakes and petabyte-scale databases, it is shocking how frequently I still receive data in the form of CSV, text, and Excel files. While modern-day analytics focuses on cutting-edge advances in machine learning algorithms, the day-to-day drudgery of data analysis is still a manual process of finding, compiling, and wrangling disparate data types.
For the financial analyst, data often arrives as an Excel spreadsheet, but just as often, it is a data dump into a CSV or a query into a SQL database. Sometimes, the data is arranged in a confusing layout or does not have all the requisite components for analysis. Time spent scrubbing this data is valuable time wasted for the analyst, yet at times this task is accepted as a necessary evil to be tolerated.
A solution to this common problem is actually quite accessible: Excel and Power BI have an entire set of data transformation tools that few users are aware of, named Get & Transform (formerly known as Power Query). Using its embedded extract, transform, and load (ETL) functionality enables financial analysts to seamlessly link to their data sources and get to insights quicker.
As we tee up data to load into Excel or Power BI, we usually have to perform some transformations to the data. Some examples of data manipulation would include:
In the diagram below, we see that Get & Transform performs this tedious role of pre-processing the data before it is loaded.
Why is it worthwhile learning how to use Get & Transform? Well, when I look at what I have personally used this functionality for, it has offered me a malleable set of tools for:
Generally, when I receive new data, I will explore it using Get & Transform before loading it into Power Pivot. This allows me to see what transformations might be necessary and quickly perform some pivots and groupings on the data to formulate a framework for analysis. In many cases, at this stage, I will find that I need more data, or that there are data issues. By using an Excel-based platform, I can quickly iterate with my data source to find these data anomalies.
Ultimately, the decision to stay in Excel or move the data analysis to another platform will depend upon the audience and the repeatability and distribution of the analysis. If my clients only use Excel, then I will almost always use Get & Transform to load the data, Power Pivot to perform the analysis, and Excel to product the PivotTables and charts. To the client, this will feel seamless since it is all housed within Excel.
However, if my client:
Then I will use Get & Transform solely for the initial data exploration, and then move the heavy lifting into R.
In previous versions of Excel, Power Query was an add-in that could be installed to help with ETL functions. However, in Excel 2016 and Power BI, these tools are more tightly integrated. In Excel 2016, they can be accessed through the Data tab, and then the Get & Transform Data section.
In Power BI, the functionality exists on the Home tab, in the External Data section.
In this article, my examples take place in Power BI, but the interface is almost identical to Excel’s. I will point out the differences when they arise so the tutorial should make sense to both types of users.
To assist this tutorial, I have created a few examples of sales data for a fictional retailer that sells outdoor gear and clothing. In each of these examples, the data will be produced in different ways to demonstrate realistic methods of data dumps.
As an initial example, we will see the data presented as a large data dump into a CSV file. The complicating factor is that the data is presented with multiple columns representing various stores. We ideally would like to import and transform the data into a more usable layout.
Below is a screenshot of what the raw CSV looks like:
Why would we want to change this? To take advantage of the relationship capabilities that are possible in these applications. We will see this play out further on in the discussion.
For now, let’s assume that we need to see the data as a “narrower and taller” structure, rather than a “wider and shorter” one. The first step is to load the CSV; then, we will start to “unpivot” the data.
As you can see, the final structure of the data is narrower than the initial data, and a lot longer. Another point is that, as we are clicking on different actions, the tool on the right-hand side is generating a list of applied steps used to build the query. It is important to understand that this is going on in the background, as it will be revisited later.
Get & Transform looks and behaves similarly between Power BI and Excel for the most part. However, in Excel, after clicking Close and Load, there is one additional prompt. In the figure below, we can toggle between whether we wish to load the data into:
In addition, we are also given the option of whether or not to Add this data to the Data Model. Checking this box loads the data into a Power Pivot table. If we are going to analyze the data in Power Pivot, I advise choosing Only Create a Connection and then making sure that the Add this data to the Data Model option is selected. If the data is within the Excel row limit, and we prefer to conduct our analysis in Excel, then just choose Table.
In the next clip, we will see that the reason why we formatted the data to be long and skinny is so that we are able to analyze the sales not only by store but also by region and state. To accomplish this task, we will import a table that maps each store to a region and state. We will see below that we can quickly create reports that show sales by these different groupings.
You can imagine how this type of capability for data transformation in Excel, or Power BI, can be powerfully applied to any case where we have dynamic groupings of data, such as:
While this article addresses CSV and other Excel files, Get & Transform tackles a wide range of data types. Once a query is created, it can be refreshed over time as the data changes.
In order to demonstrate Get & Transform’s ability to manipulate strings, I created another dataset that mimics a text file showing accounting transactions from a firm’s general ledger (GL).
Notice how the account number and name appear in the same string? In Power BI, we can effortlessly parse the account number and name into separate fields.
In this video, you can see that after I split the column, the tool guessed that the new left-hand side of the Account field should be a number, and it creates a “Changed Type1” step. Since we ultimately want this field as a string, we can go ahead and delete the step manually under the applied steps.
Next, we take the same data and create a chart of accounts with mappings to account categories.
Why would we go through all those steps to map a few account numbers? A real general ledger can be hundreds or even thousands of accounts. This quick mapping query, as we have shown, would scale to that level with no additional work.
Get & Transform supports many different data sources. While not an exhaustive list, below are some examples:
Personally, I have only tried about half of the connections on the above list. Each of the connectors I have used has been fairly robust; I have gotten from raw data to insights without a burdensome amount of work. Equally importantly, it serves as a validator between disparate sources of data, ensuring that the end outputs have a normalized level of quality control.
In the background, Get & Transform is generating code each time we click a button in the tool or make a selection. Below is an example of how you would access the code for the account-mapping query we created:
The code uses a functional language named M, which auto-generates for basic use cases. However, for more complicated data wrangling, we can edit and write our own code. For most cases, I will only ever make minor modifications to this code. In more complicated transformations, I may write most of the code from scratch to stage temporary tables, or for performing more complicated joins.
Excel tends to reach its limits when you try exporting more than a million rows. In the cases where I have transformed millions of rows with Get & Transform, the only way to ship out ungrouped rows is via tedious hacks or workarounds. I have also found that Get & Transform queries can be unstable to deploy to multiple users, especially if you use multiple data sources and joins. In those cases, I will always use R to deploy the duplicable data wrangling. Finally, Excel is not built for more advanced data modeling. You can perform linear regressions pretty quickly, but beyond that, you will need to use a more rigorous platform.
Having said all that, I find that Excel is what most of my clients are most comfortable with. Excel is still the most important tool in a financial analyst’s arsenal. By incorporating the Get & Transform functionality, Excel and Power BI become even more powerful through the range of data sources that they can accept.
New Posts
As a venture investor, Natasha Ketabchi spends most of her day evaluating tech startups as potential investments. In early-stage investments, Natasha...
Related posts
Burn rate is one of the simplest, yet most fundamental metrics that investors and startups focus on. It pertains to the total cash spend of the...
How do you value a startup or a scaleup? This is a question that plagues both founders and investors. Arriving at a figure all parties can agree upon...
In industries seeing stagnant growth or a negative impact from uncontrollable, outside forces, many companies are turning to corporate venture capital...
Tesla shares are up 60% over the past 12 months—performing relatively well even with the impacts of the coronavirus on the broader stock market (S&P...
Grocery delivery startup Instacart has recently been valued at $4.2 billion. In this article, Finance Expert Patrick Gaffney works backwards and uses...
To unlock growth in a company, you must first find the KPI that is core towards increasing company value. More effective knowledge of growth comes...
With venture funding having grown more than 120% in the US in the last five years, startup founders and investors alike have grown increasingly...
Revenue-based financing allows an entrepreneur to repay an investment over time, to retake ownership of their business, and give a capped return to...
The key to a successful business plan is a coherent story built around financial projections and external context.
With $936 billion of uninvested private equity capital inching down market, why do 46% to 80% of lower middle market sell-side transactions fail to...
Xem thêm - nhà cái VNQ8