Twitter Sentiment Analysis using Data Integration Platform
Introduction
This article explains how to perform sentimental analysis on the tweeted text on the twitter and move the processed data into an SQL Server.
Steps Involved
Using the Syncfusion Data Integration Platform:
- Bring in the real-time tweets with the hashtag #syncfusion and #dashboardcloud.
- Clean and process the JSON data into a required flat schema.
- Perform sentimental analysis over the tweeted text using the Stanford CoreNLP.
- Move the final processed data along with the sentiment score into a SQL Database.
Using the Syncfusion Dashboard
- Create a dashboard to showcase the real-time twitter sentiment analysis.
For steps 1 to 4, we will be defining a data flow in the Data Integration Platform as shown in the following image.
Step 1: Use the get twitter component to bring in real-time tweets with the hashtags #syncfusion and #dashboardcloud.
Ensure that you get the consumer key, consumer secret, access token, and access token secret from the twitter developer site by referring to this guide. Before creating the dataflow, refer to the following configurations where you can provide your hashtags under the terms to filter on property.
Step 2: Data preparation — clean and process the JSON data into required fields (attributes) using the processors — Evaluate the JSON Path and Update attribute.
Evaluate the JSON Path is used to filter the fields like user details, tweeted text, created date, retweet details, language, friends count, followers count, favorites count, and location.
These attributes and property names will be used as column names in the SQL table.
In the Update Attribute processor
- We use a conditional expression to fetch the exact tweeted text based on the following conditions. Then, we add hashtags as an additional attribute. To learn more about tweets, refer to the Twitter documentation.
- Also, cleanse the data for extracting created dates suitable for creating the dashboard.
Step 3: Perform Sentiment analysis over the processed tweet using Stanford Core NLP.
We use the Execute Stream command processor to run the Python script with the Stanford CoreNLP service to process the tweeted text and evaluate its sentiment (mood and the score).
To configure the environment to run Python sentiment analysis script within DIP, follow these steps:
- Install the Stanford NLP package from this location.
- Start the server using the following command.
C:\<installed location>\stanford-corenlp-full-2018-10-05> java -mx5g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000
Java in the command refers to JAVA_PATH. If the JAVA_PATH is set already just run the above command or else mention the JAVA_PATH in the place of java used in the command.
- Install the “pycorenlp” python package using the following command.
- pip install pycorenlp
4. We use the following Python script to perform sentiment analysis. It is a very basic one that uses default settings in the CoreNLP to gauge sentiment.
from pycorenlp import StanfordCoreNLP import sys import re nlp=StanfordCoreNLP('http://localhost:9000') #handle the invalid characters in input data text=re.sub('[^a-zA-Z0-9 \n\.]','',sys.argv[1]) res=nlp.annotate(text, properties={ 'annotators': 'sentiment', 'outputFormat':'json', 'timeout': 1000, }) for s in res["sentences"]: print("'%s', %s, %s" %(" ".join([t["word"] for t in s["tokens"] ], s["sentimentValue"],s["sentiment"]))
Save the previous code in a file named (sentiment.py) and provide this file location as command-line arguments for the Execute Stream Command processor as depicted in the following screenshot.
Command Arguments: <python script file location> <tweeted text>
Command Path: Python exe (installed location)
Step 4: Move the processed data along with sentiment score into a SQL table.
We use the following processors in this step:
- Extract Text — Extract the sentiment results from the Python script into an attribute.
- Update Attribute — Update sentiment and sentiment score attributes from the sentiment results.
- Attributes to Json — Create the JSON out of the attributes (fields), we want to track in the dashboard.
- ConvertJsontoSQL — Convert the JSON string into SQL insert statements.
- PutSQL — Execute the insert statements generated.
Make sure you have created a SQL table (tweets sentiment) using the following query and create a controller service for it in the Data Integration Platform. For more details on controller settings, refer to our documentation.
Create Table [dbo].[tweetssentiment]( [tweet id] [bigint] NULL, [userid] [bigint] NULL, [username] [varchar] (500) NULL, [screenname] [varchar] (500) NULL, [tweetedtext] [varchar] (500) NULL, [language] [varchar] (500) NULL, [location][varchar] (500) NULL, [created_at] [varchar] (500) NULL, [hashtag] [varchar] (500) NULL, [retweet_count] [int] NULL, [favourite_count] [int] NULL, [friends_count] [int] NULL, [followers_count] [int] NULL, [sentiment] [varchar] (500) NULL, [sentiment score] [varchar] (500) NULL ) ON [PRIMARY] GO
The data integration workflow can be scheduled in real-time by setting it to “0 sec”. So, that it looks for input tweets constantly, or it can be scheduled in intervals.
Step 5: The final step is to create a Twitter sentiment analysis dashboard like the following in the Syncfusion Dashboard Cloud.
To learn the basics of creating a dashboard, refer to these links:
- Create business dashboards online
- Creating a sales dashboard with SQL Server and Syncfusion dashboards
You can follow the steps covered in these links to create your dashboards easily.