In this blog, we’ll explore how to perform text data cleaning in Power BI. With simple techniques and practical tips, you’ll ensure your data is accurate, consistent, and ready for analysis. It contains following things such as
- Convert Text to lowercase
- Removing URLs from the text
- Remove User Mentions from the text
- Remove Emoji from the text
- Extract hashtags and make the bar plot in order to find out the insights from the data.
- Replace contraction word with their full forms
Dataset description
The dataset contains the information about the tweets such as ID and the tweet. You can download the dataset from here.
Steps to perform Text data cleaning in PowerBI
Step 1. Open Power BI Desktop
Step 2. Click on Get Data to select Text/CSV in Home tab
Step 3. Browse csv file and hit Open
Step 4. A window will open to Transform Data and Load, click on Load
Now, let’s dive into different Number functions:
1- Convert Text to lowercase:
Step 1. Go to Home tab
Step 2. Click on Transform data and select Transform data
Step 3. Select Tweet Column
Step 4. Go to Transform tab, click on Format and select lowercase
Step 5. Go to Home tab and click on Close & Apply
2- Remove URLs:
Step 1. Go to Home tab
Step 2. Click on Transform data and select Transform data
Step 3. Go to Add Column tab
Step 4. Select Custom Column and fill the following:
Step5. Click on OK
Step 6. Go to Home tab and click on Close & Apply
3- Remove User Mentions:
Step 1. Go to Home tab
Step 2. Click on Transform data and select Transform data
Step 3. Go to Add Column tab
Step 4. Select Custom Column and fill the following:
Step5. Click on OK
Step 6. Go to Home tab and click on Close & Apply
4- Remove Emoji:
Step 1. Go to Home tab
Step 2. Click on Transform data and select Transform data
Step 3. Go to Home tab, Select Advanced Editor
Step 4. Type the following in Editor :
Step 5. Click on Done
5- Extract hashtags and make bar plot:
Step 1. Go to Home tab
Step 2. Click on Transform data and select Transform data
Step 3. Go to Add Column tab
Step 4. Select Custom Column and fill the following:
Step5. Click on OK
Step 6. Go to Home tab and click on Close & Apply
Step 7. Select Stacked bar chart and fill the fields with following:
6- Replace contraction word with their full forms:
Step 1. Go to Home tab
Step 2. Click on Transform data and select Transform data
Step 3. Go to Add Column tab
Step 4. Select Custom Column and fill the following:
let
ReplaceShortForms = (text as text, replacements as list) as text =>
let
ReplaceFunction = (currentText, replacement) =>
Text.Replace(currentText, replacement[ShortForm], replacement[LongForm]),
Result = List.Accumulate(replacements, text, ReplaceFunction)
in
Result,
ReplacementList = {
[ShortForm = "IDK", LongForm = "I do not know"],
[ShortForm = "don't", LongForm = "do not"],
[ShortForm = "isn't", LongForm = "is not"]
}
in
ReplaceShortForms([Tweet], ReplacementList)
Step5. Click on OK
Step 6. Go to Home tab and click on Close & Apply.
Conclusion
In this blog we have learned about how to perform various text data cleaning steps in PowerBI such as converting text to lowercase, removing emojis and URLs from the text, extracting hashtags analyzing it and replace the contraction words in the test with their full forms.
If you like the article and would like to support me, make sure to:
- 👏 Like for this article and subscribe to our newsletter
- 📰 View more content on my DataSpoof website
- 🔔 Follow Me: LinkedIn| Youtube | Instagram | Twitter