apache spark - Features for sentiment analysis of twitter data related to music -


need guidance related sentiment analysis on tweets related music on spark.

i trying perform sentiment analysis on twitter data tweets related music. after lot of searching around net, have understood how fetch tweets using 'tweepy' python api , realized can use 'naive bayes classifier' classify tweets. confused regarding how define features classification, supposed define @ least 500 features. here questions. not want use available api 'textblob' find sentiment of tweet.

1) can give examples of features can use classifying music related tweets ? [ can use tweets happy smiley positive training set ? if words in tweets features classifier ?]

2) how generate training set classifier?

3) if want filter tweets music related tweets, can use bloom filter achieve ?

4) size of data can through tweepy api ?

please correct me if there wrong understanding.

since sentiment analysis supervised task, should have training (and test) set. on training set, need labels (in case sentiment analysis: positive, negative) given humans (often called specialist). there no exist magic number of instances training set (i worked 1k5 records). in case need scientific evidence, should analyze mean squared error (mse) of model in function of size of training set.

1) common approach tf-idf. ranks best features (also smiles , other symbols). need set number of features. again, there no best number, should make tests tune model

2) need training set labels (positive or negative) each tweet. generally, obtained human annotator.

3) i've never used bloom filter.

4) generally, tweet api give 1-2% of tweets. guess tweepy cannot give more it.

i hope can .


Comments

Popular posts from this blog

php - How to display all orders for a single product showing the most recent first? Woocommerce -

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

angularjs - How restrict admin panel using in backend laravel and admin panel on angular? -