Question Software for finding difficults words in text

Feb 20, 2019
6
0
10
Hello.
I'm searching for software that can find, highlight and give synonyms to difficult words in english in given text. The text in my case will be movie subtitles, I recently began watching movies with english subs (not my native language). Although I understand the movie and most of the lines, I would like to brief myself before watching with the 30-40 most difficult words. Need no translation. Synonyms and definitions will do just fine.
So far I found the best for my case to be this - https://www.visualthesaurus.com/vocabgrabber/
In analyzes the subtitles and presents words and their definitions.
Is there anything else?

Thanks in advance! :)
 

Ralston18

Titan
Moderator
Consider Google Translator:

https://play.google.com/store/apps/details?id=com.google.android.apps.translate&hl=en_US

Another way is to just search "Google Translator". Should take you to a box/link where you can just type in all words of interest to get translations.

No need for complete sentences - just list the words and phrases in the box.

Then use the thesaurus if and as necessary to find synonyms to help further your understanding.
 
Everybody has a different definition for "difficult words", so unless you build a list of these words, nobody will help you.

What you can, actually, is to download a list of "Most popular 100, 1000, 5000" words, and build a script which looks whether these words are "popular". I doubt you'll find ready-made solution, so brushing your skills in eg Excel will help.
 
Feb 20, 2019
6
0
10
Consider Google Translator:

https://play.google.com/store/apps/details?id=com.google.android.apps.translate&hl=en_US

Another way is to just search "Google Translator". Should take you to a box/link where you can just type in all words of interest to get translations.

No need for complete sentences - just list the words and phrases in the box.

Then use the thesaurus if and as necessary to find synonyms to help further your understanding.

Yes, but I'm asking about software that extracts 20-30 words, reading the whole subtitle file and translating word by word is not an option.

Everybody has a different definition for "difficult words", so unless you build a list of these words, nobody will help you.

What you can, actually, is to download a list of "Most popular 100, 1000, 5000" words, and build a script which looks whether these words are "popular". I doubt you'll find ready-made solution, so brushing your skills in eg Excel will help.

That's why there are different levels at TOEFL, for example. :) I need a software that can sort the words by let's say Begginer - Intermediate - Advanced, etc.
 

Ralston18

Titan
Moderator
Will second Alabalcho's comment regarding "difficult". Very subjective.

TOELF = "Test of English as a Foreign Language " - just noted for the record and anyone reading this thread.

First you will need to find the subtitles.

According to the following link the subtitles are likely to be found in a file with the extension ".srt" or perhaps ".sub".

http://www.transformativeworks.org/vidding-index/subtitles-and-translations/

More about .srt files:

https://www.lifewire.com/srt-file-4135479

What rules or criteria does TOEFL use to classify words: the letter length, the derivations, frequency of word use....? What do you consider to be "difficult words".

Frequency of use is likely the applicable criteria.

https://www.englishwithexperts.com/blog/posts/files/5000MostCommonWords.pdf

So the necessary software would need to compare each word in the subtitle file to the words found in the 5000MostCommonWords.pdf file

Rank each subtitle word by frequency of use, sort by rank, and then list out the top 20 - 30 words as desired.

Not aware of any such software....

Excel, Access, Python, are all capable of such manipulations.

As for the end list: cut and paste the word(s) into Vocabgrabber as necessary.

Note:

You might be able to make use of video editing software such as Handbrake (free) to capture subtitle words.

https://handbrake.fr/