pandas - Combine Sklearn TFIDF with Additional Data -


i trying prepare data supervised learning. have tfidf data, generated column in dataframe called "merged"

vect = tfidfvectorizer(stop_words='english', use_idf=true, min_df=50, ngram_range=(1,2)) x = vect.fit_transform(merged['kws_name_desc']) print x.shape print type(x)  (57629, 11947) <class 'scipy.sparse.csr.csr_matrix'> 

but need add additional columns matrix. each document in tfidf matrix, have list of additional numeric features. each list length 40 , it's comprised of floats.

so clarify, have 57,629 lists of length 40 i'd append on tdidf result.

currently, have in dataframe, example data: merged["other_data"]. below example row merged["other_data"]

0.4329597715,0.3637511039,0.4893141843,0.35840...    

how can append 57,629 rows of dataframe column tf-idf matrix? don't know begin , appreciate pointers/guidance.

i figured out:

first: iterate on pandas column , create list of lists

for_np = []  x in merged['other_data']:     row = x.split(",")     row2 = map(float, row)     for_np.append(row2) 

then create np array:

n = np.array(for_np) 

then use scipy.sparse.hstack on x (my original tfidf sparse matrix , new matrix. i'll end-up reweighting these 40-d vectors if not improve classification results, approach worked!

import scipy.sparse  x = scipy.sparse.hstack([x, n]) 

Comments

Popular posts from this blog

php - How to display all orders for a single product showing the most recent first? Woocommerce -

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

angularjs - How restrict admin panel using in backend laravel and admin panel on angular? -