You are currently viewing Text data cleaning in PowerBI

Text data cleaning in PowerBI

Loading

In this blog, we’ll explore how to perform text data cleaning in Power BI. With simple techniques and practical tips, you’ll ensure your data is accurate, consistent, and ready for analysis. It contains following things such as

  • Convert Text to lowercase
  • Removing URLs from the text
  • Remove User Mentions from the text
  • Remove Emoji from the text
  • Extract hashtags and make the bar plot in order to find out the insights from the data.
  • Replace contraction word with their full forms

Dataset description

The dataset contains the information about the tweets such as ID and the tweet. You can download the dataset from here.

Steps to perform Text data cleaning in PowerBI

Step 1. Open Power BI Desktop

Step 2. Click on Get Data to select Text/CSV in Home tab

Step 3. Browse csv file and hit Open

Step 4. A window will open to Transform Data and Load, click on Load

5001.PNG

Now, let’s dive into different Number functions:

1- Convert Text to lowercase:

Step 1. Go to Home tab

Step 2. Click on Transform data and select Transform data

7002.png

Step 3. Select Tweet Column

7003.PNG

Step 4. Go to Transform tab, click on Format and select lowercase

7004.PNG

Step 5. Go to Home tab and click on Close & Apply

7005.PNG

2- Remove URLs:

Step 1. Go to Home tab

Step 2. Click on Transform data and select Transform data

7002.png

Step 3. Go to Add Column tab

Step 4. Select Custom Column and fill the following:

7006.PNG

7007.PNG

Step5. Click on OK

Step 6. Go to Home tab and click on Close & Apply

7005.PNG

3- Remove User Mentions:

Step 1. Go to Home tab

Step 2. Click on Transform data and select Transform data

7002.png

Step 3. Go to Add Column tab

Step 4. Select Custom Column and fill the following:

7006.PNG

7007.PNG

Step5. Click on OK

Step 6. Go to Home tab and click on Close & Apply

7005.PNG

4- Remove Emoji:

Step 1. Go to Home tab

Step 2. Click on Transform data and select Transform data

7002.png

Step 3. Go to Home tab, Select Advanced Editor

Step 4. Type the following in Editor :

7012.PNG

Step 5. Click on Done

5- Extract hashtags and make bar plot:

Step 1. Go to Home tab

Step 2. Click on Transform data and select Transform data

7002.png

Step 3. Go to Add Column tab

Step 4. Select Custom Column and fill the following:

7006.PNG

7007.PNG

Step5. Click on OK

Step 6. Go to Home tab and click on Close & Apply

7005.PNG

Step 7. Select Stacked bar chart and fill the fields with following:

7014.PNG

7015.PNG

7016.PNG

6- Replace contraction word with their full forms:

Step 1. Go to Home tab

Step 2. Click on Transform data and select Transform data

Step 3. Go to Add Column tab

Step 4. Select Custom Column and fill the following:

let

    ReplaceShortForms = (text as text, replacements as list) as text =>

        let

            ReplaceFunction = (currentText, replacement) =>

                Text.Replace(currentText, replacement[ShortForm], replacement[LongForm]),

            Result = List.Accumulate(replacements, text, ReplaceFunction)

        in

            Result,

        ReplacementList = {

            [ShortForm = "IDK", LongForm = "I do not know"],

            [ShortForm = "don't", LongForm = "do not"],

            [ShortForm = "isn't", LongForm = "is not"]

        }

in    

    ReplaceShortForms([Tweet], ReplacementList)

Step5. Click on OK

Step 6. Go to Home tab and click on Close & Apply.

Conclusion

In this blog we have learned about how to perform various text data cleaning steps in PowerBI such as converting text to lowercase, removing emojis and URLs from the text, extracting hashtags analyzing it and replace the contraction words in the test with their full forms.

If you like the article and would like to support me, make sure to: