pandas - Combine Sklearn TFIDF with Additional Data -

- June 15, 2011

i trying prepare data supervised learning. have tfidf data, generated column in dataframe called "merged"

vect = tfidfvectorizer(stop_words='english', use_idf=true, min_df=50, ngram_range=(1,2)) x = vect.fit_transform(merged['kws_name_desc']) print x.shape print type(x)  (57629, 11947) <class 'scipy.sparse.csr.csr_matrix'>

but need add additional columns matrix. each document in tfidf matrix, have list of additional numeric features. each list length 40 , it's comprised of floats.

so clarify, have 57,629 lists of length 40 i'd append on tdidf result.

currently, have in dataframe, example data: merged["other_data"]. below example row merged["other_data"]

0.4329597715,0.3637511039,0.4893141843,0.35840...

how can append 57,629 rows of dataframe column tf-idf matrix? don't know begin , appreciate pointers/guidance.

i figured out:

first: iterate on pandas column , create list of lists

for_np = []  x in merged['other_data']:     row = x.split(",")     row2 = map(float, row)     for_np.append(row2)

then create np array:

n = np.array(for_np)

then use scipy.sparse.hstack on x (my original tfidf sparse matrix , new matrix. i'll end-up reweighting these 40-d vectors if not improve classification results, approach worked!

import scipy.sparse  x = scipy.sparse.hstack([x, n])

Search This Blog

Swift

pandas - Combine Sklearn TFIDF with Additional Data -

Comments

Post a Comment

Popular posts from this blog

asp.net - How to correctly use QUERY_STRING in ISAPI rewrite? -

jsf - "PropertyNotWritableException: Illegal Syntax for Set Operation" error when setting value in bean -

laravel - Undefined property: Illuminate\Pagination\LengthAwarePaginator::$id (View: F:\project\resources\views\admin\carousels\index.blade.php) -