Combining Files with Inconsistent Headers- No Mapping Table
We have multiple files in a folder. Each file corresponds to a continent. There is a single sheet in each file having a list of countries in that continent, the capital and other data. The headers are totally inconsistent across the multiple files and I do not want to create a mapping table.
Concept
- Get Data from Folder ► Combine and Transform
- Output Query ► Delete Steps starting from Expansion.
- Create a Bookmark
- Drill Down
- Edit the M Code
- Merge Columns
File Download
You can Download the Exercise Files an follow along by clicking on the button
Steps
In a New Blank Excel File ►Data Tab
Get Data ► From File ► From Folder
Navigate to the source folder ►Ok
Click on Combine ► Combine & Transform.
The Combine File Dialog Box opens showing a Sample File from the folder. This Sample file is the First one in the folder (Africa) Clicking on the Sheet shows a preview of the data. Hit OK
The Query editor opens.
In the Queries pane, a bunch of queries and transformations are automatically created as well as the output query where the data is improperly combined. Since the transformation steps occur on the first file, only the column headers in the first file appear in the output query.
Rename the Query “Combined”
Delete the last two steps to go back to the step just before expanding.
Click in the blank space to the right side of each table to preview the data in the lower part of the editor.
I want to save this step to get back to it later. So, I will hit F2 to rename the step “Bookmark”.
Right click the “Transform File” Column and select “Drill Down”.
We get a single column named “List” ► this is a list of all the Tables storing the source data. You can preview the data by clicking in the blank.
We need to extract the column headers from all the Tables.
In the Formula bar wrap the existing step in a function that can do that:
= List.Transform(Bookmark[Transform File],each Table.ColumnNames(_))
We get a list of Lists.
By previewing one of the Lists ► it shows all the column headers from one source.
Since we have 3 files in the source folder, we need to get a List of all the headers from all the files.
To do that, we wrap the previous function in a List.Union function.
= List.Union(List.Transform(Bookmark[Transform File],each Table.ColumnNames(_)))
Under applied Steps, select the last Step ► Hit F2 to rename it “AllHeaders”.
We now need to go back to the Bookmark Step. Click on the icon to add a step. Delete “AllHeaders” and Type Bookmark and hit enter.
We get 2 columns: Source.Name & Transform File.
Click on the double side-pointing arrow of the “Transform File” column to expand. Uncheck “Use original column name as prefix” then hit OK.
The column expands and we see all the File names. However, there are many missing columns.
Replace the text between the curly brackets (including the curly brackets) by the step created earlier: “AllHeaders”. When you hit enter we get all the columns from all the tables.
Since many columns store similar items, we are going to merge them:
Select Nation (press CTRL) then select Country ►On the Transform tab ► Click on Merge Columns ► Name it “Countries”
Select Population (press CTRL) then select Residents ►On the Transform tab ► Click on Merge Columns ► Name it “Populations”
Select Main City (press CTRL) then select Capital ►On the Transform tab ► Click on Merge Columns ► Name it “Capitals”
Note the order of selection rearranges the merged column which takes the position of the last selected column.
Move the Area column by dragging to the right.
Extract the Continent name by clicking on the Source.Name column ►Transform Tab ► Extract ► Text Before Delimiter (to get rid of the file extension)► period.
Rename the column “Continent”
Hit CTRL + A to select all the columns.
On the Transform Tab ► Click on “Detect Data Type”
On the Home Tab ► Click on Close and Load
The combined data is loaded to a new worksheet. We have 58 records.
You Can watch the video Tutorial by clicking on the picture.