word.trim().length() > 4) //Consider only Words with length greater than 4.map(String::toLowerCase).collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));System.out.println(wordFrequency);}I do not wish to hard-code specific symbols and..." />

How to ignore punctuations and symbols appended to a word, so that they are all treated as same when considering for word count?

I am writing a program for word count of each word in any text file.

The contents of file are NOT known before-hand.

Desired Output :

e.g.

[book] [book!] [book-] [book?] [book,] [book's] and the likes to be treated as same for word count.

Current Output :

book=2, book.=1, book--=1, book?=5, book's=3, book!=1

When I am actually looking for book=13

try(Stream fileContents = Files.lines(filePath)){

Function> splitIntoWords = line -> Pattern.compile(" ").splitAsStream(line);

Map wordFrequency = fileContents.flatMap(splitIntoWords)

.filter(word -> word.trim().length() > 4) //Consider only Words with length greater than 4

.map(String::toLowerCase)

.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

System.out.println(wordFrequency);

}

I do not wish to hard-code specific symbols and...

Read More »

By: StackOverFlow - 5 days ago

Related Posts