ruby - How do I remove duplicate rows in my CSV? -
i have csv has data this:
a.a.b. direct http://www.aabdirect.com 348 willis ave mineola ny 11501 (800) 382-1002 no email abeam consulting inc http://abeam.com 245 park ave new york ny 10167 (212) 372-8783 no email abeam consulting inc http://abeam.com 245 park ave new york ny 10167 (212) 372-8783 no email alvarez & marsal http://www.alvarezandmarsal.com 600 madison ave new york ny 10022 (212) 759-4433 no email alvarez & marsal http://www.alvarezandmarsal.com 600 lexington ave ste 6 new york ny 10022 (212) 759-4433 no email
the key thing here columns in both rows match (like abeam consulting inc
), that's not case. websites match, or phone number or name match.
the key thing website. if 2 values have same website, want one.
how de-dupe list in non n+1 way?
preferably native ruby method .uniq
or of sort.
just read strings (which i"ve simplified avoid need horizontal scrolling) array:
arr = [ "a.a.b. direct http://www.aabdirect.com (800) 382-1002", "abeam consulting inc http://abeam.com (212) 372-8783", "abeam consulting inc http://abeam.com (212) 372-8783", "alvarez & marsal http://www.alvarezandmarsal.com (212) 759-4433", "alvarez & marsal http://www.alvarezandmarsal.com 10022 (212) 759-4433" ]
and, suggest, use array#uniq, block:
arr.uniq { |line| line[/\shttp:\s+/] } #=> ["a.a.b. direct http://www.aabdirect.com (800) 382-1002", # "abeam consulting inc http://abeam.com (212) 372-8783", # "alvarez & marsal http://www.alvarezandmarsal.com (212) 759-4433"]
see array#uniq. regex /\shttp:\s+/
reads, "match whitespace followed string "http:"
, followed 1 or more characters other whitespaces (greedily)".
Comments
Post a Comment