Skip to content

A quick tool to find co-mentions of keywords within webpages

Notifications You must be signed in to change notification settings

lorenzoromani1983/co-mentioned

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Co-mentioned is a very simple yet effective tool that helps finding co-mentions of keywords in web pages.

While doing research on entities of interest (companies, people, whatever) you may need to check if two or more of the target entities are mentioned by the same web page at the same time.

This tool comes in handy if you have multiple keywords/entities and need to quickly assess whether there are interesting co-mentions.

You will first need to install some python modules:

pip install inscriptis & pip install numpy & pip install unidecode

Now you will need a ScaleSerp API: co-mentioned searches Google via a third-party API. Free tokens are available at ScaleSerp and paid ones are fairly cheap too.

Save your token in the 'token.txt' file.

Now, type your target keywords/entities in the 'entities.txt' file. One line, one entity. No quotes:

alaeddin senguler

trend gyo

hamas

Now you're set and can run the co-mentioned.py file:

python3 co-mentioned.py

The tool will first combine the entities you saved in the entities.txt file generating as many keyword pairs as possible. Then it will search each pair on Google in order to retrieve urls. In our case, it will look up on Google:

"alaeddin senguler" "trend gyo"

"alaeddin senguler" "hamas"

"hamas" "trend gyo"

Quotes are added automatically, so you won't need to enclose keywords in quotes in the entities.txt file.

You can use non standard characters (è,é, à,ò), etc. It may increase the quality of Google search results but it won't affect the way co-mentioned works, as all text is unidecoded for consistency.

In the previous case, just three keyword pairs can be created. Of course, the more entities you type in entities.txt, the more pairs you will get (this may quickly exhaust your ScaleSerp account).

If your ScaleSerp token has enough API calls to complete the search job, it will search Google for URLs. Next, it will start opening each url in order to:

a) detect if at least 2 keywords are on the web page

b) compute the minimum distance between the keywords

If more than 1 keyword are on the same page, co-mentioned computes the minimum distance between the keywords and saves the results in the REPORT.csv file.

Note that co-mentioned searches keyword pairs but detects all the keywords that you typed in the entities.txt file.

The REPORT.csv file is created automatically in the local directory and contains three columns separated by '|':

URL|Entities|Distance

Last but not least, in the case where three or more entities are matched in the same web page, the "Distance" indicator is about the closest pair of entities.

About

A quick tool to find co-mentions of keywords within webpages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages