Saturday, April 28, 2018

yogi_From Sentences In Column A List Each Unique Word (case sensitive) And Its Frequency

Google Spreadsheet   Post  #2435

Yogi Anand, D.Eng, P.E.      ANAND Enterprises LLC -- Rochester Hills MI     Apr-28-2018
I have a very large column (A) of sentences, with varying punctuation, capitalization, etc. It is around 20,000 rows and 150,000 words long, meaning any manual solution is impossible. I'm basically trying to produce a list of these words. I'm not sure if this is possible in sheets at all, but if so I would like either a cell formula or a script.

I would like to create a list of every word used in this column, and put it in column B. So B:B would be a list of words such as "the", "and", "hello", "world", etc. Each word should only appear once in column B. In column C, I would like a tally of how many times each of these words are used (I already have a formula for counting word usage, I just need something to produce a list of words).

For my purposes, a word is any string of characters separated by spaces, punctuation or the beginning/end of a sentence. So " apple ", " wqehotasjclfkadsnqjlgea ", and " hello;" are all words. Having the formula be case insensitive would be preferable, but not required.