With the help of Power Query, you can work with multiple files and sheets and be able to combine the data efficiently. When working in Excel, you might be organizing your data in different files, and in each file you might have different sheets. Whether your data is in a list or table but the structure is identical across all sheets and all files, or, you might have discrepancies in the data structure. In all cases, to analyse your data all together, you need to combine it first (even if we have differences).
In this article I show you 4 different scenarios for combining files varying according to data structure even if we have many inconsistencies.
You can download the exercise files and follow along by clicking on the button herebelow
Combining files of the same structure in a single folder
This method is fast and simple. It works fine if your data is in a list or in a table with the same name in the different files. However, if the data is in tables with different table names it will return an error and requires a more powerful way for combining them
In this example I use the folder: “01-CombineSheets” having sales data for 5 years.
Note: not only data structure is the same but also the sheet name in all source files is “Sheet1”.
In a blank Excel file, click on the Data Tab.
Get Data ► From File ► From Folder ►Navigate to the folder ► Select Combine and load (from the lower right side).
The Combine Files dialog box opens, showing a sample file.
Select Sheet1 ► Hit OK
All the files in that folder are combined together in a new worksheet. A bunch of queries are created automatically. A new column “Source.Name” is added and currency formatting is lost.
If your data is in a Table (not a list) this technique still works if the tables have the same name.
If you want to learn about the functionalities of lists and Tables, watch my tutorial: “secrets of Excel Lists and Tables You Never Heard about” by clicking on the link: https://www.youtube.com/watch?v=Kli9ph_-kTk&t=48s or by scanning the QR Code
Note: If you rename sheets differently and repeat (or if you are using Tables of different names across the files)► You get an error message ► This requires a more advanced technique.
Combining files of the same structure with Different Sheets & Table names
This method enables you to combine multiple files in a folder where data is in a list or table and even if the sheet names are different and table names are different
For this example, I will use “02-Combine Tables”.
In a blank Excel file, click on the Data Tab.
Get Data ► From File ► From Folder ►Navigate to the folder ► Transform Data
► Opens the Query Editor
Select the [Content] & [Name] columns while pressing CTRL ►Right click
► Remove Other columns
On the Ribbon click on the Add Column tab ► Custom Column ► Type a PQ function (Case sensitive)
With the above function, column headers from the different sheets will be repeated after appending the data. This requires an extra step with another Power Query function ►Table.PromoteHeaders([Data]). To avoid this extra step,we can simply add an extra argument (lowercase) to the first function that will do the job:
A new Custom column is created showing Table. Right click and remove the [Content] column.
Click on the double side pointing arrow to expand the contents of each Table (These are the Excel Objects from the source files)
Uncheck “Use Original Name as prefix” and hit OK.
Filter the [Kind] column to select the type of object whether Sheet or Table. Note PQ filter is also case sensitive.
Select the Name & Data Columns (while pressing CTRL) ► Right Click and “Remove Other Columns”.
Click on the double Side pointing arrow of the Data column to expand it ► Ok
Finally Rename columns as needed, change the Data Types then, Close and Load.
Combine files where data is not of the same structure
In a real work situation, you may have different number of sheets and different sheet names or even more or less columns in some files. In such case we use a more robust technique to deal with files, sheets, and data inconsistency.
For this exercise, I’ll be using 3 Excel files available in the folder “04- CombineDifferentColumns” in which we have sales data for different years.
The files have different sheet names and different number of columns, and we need to append them all together. Let’s have a look at the different files:
Our goal is to combine all the columns, from all the sheets, from all the files available in one folder.
In a new blank Excel file, click on the Data Tab of the ribbon then,
Get Data ► From File ► From Folder ► Navigate to the exercise folder ► Open
A window opens showing a preview of the folder contents ► Click on “Transform Data”.
On the right side, Rename the query “DifferentColumns”. That does not add a step to the applied steps.
Right click the Extension column ► Transform ►lowercase.
Filter the Extension column ► Text Filters► equals: .xlsx ►OK
Select the “Content” column and the “Name” column ► Right click ► Remove other columns.
Click on the Add Column tab ► Click on Custom Column.
I want to convert the Binary Files into Tables since Power Query cannot read the data from the binary file directly) ► Custom column
- Power Query functions are case sensitive
- The lower case “true” is for promoting headers to avoid duplicate headers in combined sources. If you do not use this argument you will need to add another step for a power Query function that does the same thing: Table.PromoteHeaders([Data])
A new column showing “Table” is added. Right click the “Content” column ► Remove
Click on the double side pointing arrow of the Extract column ► Uncheck “Use Column name as prefix” ► hit Ok
Filter the “Kind” column ► text Filters ► equals: Sheet ► OK
Select the left 3 columns (by pressing Shift) ► Right click ► Remove other columns.
- The leftmost column is the File name
- The middle column is the Sheet name
Click on the double side pointing arrow of the “Data” column to expand it ► OK
The data is now combined from all sheets in all files but only for the columns in common. The “Ranking” column and the “Profit %” column are not there.
If you look at the formula bar and read the Power Query function generated ► it hardcoded the column names.
This step is important as we plan to replace this list of column names by a complete dynamic list.
Here are the applied steps. Hit F2 and rename the last step say “Original” and the one above it “Before”
Our goal is to recall a previous step and use a Power Query function to extract a list of column names.
With the last step selected under applied steps ► Click on the fx icon to add a new step ►in the formula bar refer to “Before” by typing =Before followed by hitting enter.
A new step appears under applied steps “Custom 1” that refers to the step “Before”. You may keep the name of this new step unchanged.
The next step is to add a custom column that extracts all column names from the source data as a list. We will use the Power Query function:
On the Add column Tab► click on Custom Column and name the new column: “ColumnNames”
A new column appears showing “List”. You may preview it by clicking to the right side of the word list.
Click on the double side pointing arrow of the new column and select ► Extract to new Rows.
We get a full list of all the column names in all sheets.
Right click the “ColumnNames” column and select “Remove Duplicates”
After removing duplicates, we want to convert this column into a list of unique column names.
Right Click the “ColumnNames” column (the header) a second time and select “Drill Down” ► we get one single column named List showing all the column names from all files. Under Applied Steps, it gets the name of the custom function “ColumnNames”
Now we need to insert a new step that refers to the step named “Before”.
With the last step under applied steps selected, click on the fx icon and type
=Before ► followed by hitting enter.
A new step named “custom2” is added to the Applied Steps.
Expand the Data column by clicking on the double side pointing arrows ► hit OK. We get the same result as in the step named “Original”.
Look at the power Query function in the formula bar where the column names are hard coded, incomplete and in curly brackets.
Delete everything from the opening to the closing curly bracket, type the name of the step “ColumnNames” and hit enter ► Now we see all the columns from all the sheets from all the files.
Although it’s still a bit early to load the query, but let’s check the result at this point by going to the Home Tab ► Click on close and load
Things look good but we need to do the regular housekeeping:
In Excel click on the Data tab ► click on Queries and connections ► Opens a pane on the right side ►Right click the query name and select “Edit” to open the query Editor.
- Rename the left 2 columns: FileName & SheetName
- Review and Change the Data Types for all the columns
- Sort the SheetName column
- Close and load the consolidated files and check the result.
I noticed that currency formatting is lost but, I can easily reapply it in Excel (CTRL
+ SHIFT + 4)
Combining sheets that are totally non-identical
In this example I show you how to combine any number of Worksheets where data is totally different in Size, location and lots of undesirable values.
I will clean, transform a sample sheet as a model. Then convert the query into a function that dynamically finds the desired values. By invoking the custom function we will combine any number of worksheets then visualize the data with pivot Tables, Chart and slicer.
Watch me doing it by clicking on the link:
In this article we covered many situations for combining files and sheets, whether they are in lists or tables, and, whether they have the same structure or they are totally unidentical.
You can watch many more examples on my YouTube channel YouTube.com/OfficeInstructor.