How to Perform Fast Twitter Sentiment Analysis Explained in the Topic of:

Khulood Nasher

6 min readAug 3, 2020

ICE New Rules on International Students During Fall 2020 Semester

source of image & here

By: Khulood Nasher

Source of the News: here

In this blog, I’m going to explain how to perform a fast twitter sentiment analysis from scratch especially, if you were asked to practice NLP or choose any project in sentiment analysis.

Well, twitter is the best harborage for your search.

First: you need to find a trend to collect your data of tweets. The best way to find trends is through https://www.trendsmap.com/, or through this website: https://getdaytrends.com/united-states/ where you can search the trend in the United States or in any country. You can also search per topic and find all the trends in that topic. If you choose the United States, for example, you can choose the state and a map of trends will pop up in each state with the recent trends at the moment. You can also look for the hashtags at the current moment or the hashtags during the last 24 hours.

For me, I wrote this article on July 6th,2020. One of the most active trends on that day was the F1 student visa. And, I used this hashtag to explain NLP sentiment analysis techniques.

After you choose your trend and decide what topic you will base your research on, You can simply scrape tweets from Tweepy’s Twitter API.

For fast scraping, you can use the following website https://www.vicinitas.io/ which allows you to collect the tweets of one week, and it does some useful analysis as well, and at the same time, you can collect the tweets in the form of an excel sheet and then change it to a CSV file.

Now, I’m going to explain my fast and basic sentiment analysis on a hashtag of student visa that was a trend on July 6th,2020.

For the whole code, you can see my GitHub. We will accomplish this by completing the following tasks:

Task #1: Understand the Problem Statement

Task #2: Import libraries and datasets

Task #3: Perform Exploratory Data Analysis

Task #4: Plot the word cloud

Task #5: Create a pipeline, perform data cleaning and removing punctuation and stopwords, and perform tokenization and Vectorization

Task #6:Model with Bayes Classifier and assess the model.

Task #1: Understand the Problem Statement

Due to the COVID 19 pandemic, many schools including colleges will transfer to online learning in the upcoming Fall semester of the year 2020. Because of this remote transformation, The U.S. Department of Homeland & Security announced new rules that demands the international students under the F-1 and M-1 visas to depart the united states if they are attending schools that will pursue its classes online, otherwise they have to transfer to another onsite school, additionally, visas will not be issued for enrolled international students outside America if their student visa expired and if their schools remain online during the Fall semester, Source of News. In my tutorial here, I performed a fast sentiment analysis exploring the public feelings towards the new ICE rules on international student visas.

Task #2: Import libraries and datasets

Obtain Data

Task #3: Perform Exploratory Data Analysis:

First, visualization of tweet distribution in each class:

We can check the number of neutral vs. negative and positives tagged sentences as follows:

And to get the length of every tweet

And to plot the length of each tweet based on number of characters

We can collect statistical information about my data:

Now,Let’s take a look at the longest tweet:

And let’s take a look at the shortest tweet as well:

To plot word count distribution for both neutral and negative sentiments as follows:

From the graph above, most sentences fall between 25–40 words but it’s fair to say that majority of text on twitter falls between 1 and 45 words. This is no wonder considering that twitter has a limit of how many characters one can use in a message. 280 characters is the limit. In all, it looks like 1–40 words cover more than 90% of all sentences which makes this dataset set a good training candidate.

Task #4: Plot the word cloud

The word cloud is a visualization of the tweet texts. We can see international students, student ban, kicking, terrible, unfair, cruel,…etc. that reflects, in general, a negative sentiment. We still see a useless word like ’https’ which means I still need to add more customized cleaning to my tweets.

Task #5: Create a pipeline, perform data cleaning, removing punctuation and stopwords, and perform tokenization and vectorization.

Now let’s define a pipeline to clean up all the messages.

The pipeline performs the following:(1) remove punctuation,(2) remove stopwords.

Let’s test the newly added function

Vectorization

Word vectorization is a method of converting words to a corresponding vector of real numbers that can be used to find word predictions through machine learning models. Here is how we get the tweets vectorized.

(1652, 3805)

So this means I have 1652 rows corresponding to the number of tweets and 3805 unique words which represent my features

(1652,)

The number of labels corresponds to the number of tweets.

Task #6:Model with Bayes Classifier and assess the model.

# Classification Report:

# Plotting the Confusion Matrix:

Interpreting the Confusion Matrix and Classification Report:

The model was able to correctly classify around 290 tweets as True Negative and 13 tweets as True Positive i.e. 303 tweets out of my testing sample of 331, 303/331= 91.5% of model accuracy, and misclassifies only 3 tweets as False Negative and 27 as False positive, which is 30/331=9% of misclassification. We have here an overall weighted avg accuracy of 94%, 87%,90%. This is really great accuracy. I would suggest just try to improve the accuracy through balance data using smote and also try different modeling with a neural network, XGboost, and logistic regression

How to Perform Fast Twitter Sentiment Analysis Explained in the Topic of:

Task #2: Import libraries and datasets

Task #4: Plot the word cloud

Vectorization

Task #6:Model with Bayes Classifier and assess the model.

Written by Khulood Nasher

No responses yet