RedditMiner

RedditMiner is a tool for crawling and analysing comments from Reddit subreddits. It uses the Reddit API to fetch comments, stores them in an SQLite database, and provides a web interface for querying and analysing the data using natural language processing (NLP).

Features

Crawl comments from any specified subreddit
Store comments in an SQLite database
Perform NLP on comments for similarity search
Simple web interface for querying and displaying results

Installation

Clone the repository:

git clone https://github.com/LeviSamuelEvans/RedditMiner
cd RedditMiner

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```
Download the SpaCy model:
```
python -m spacy download en_core_web_sm
```

Configuration

Create a config.py file in the root directory of the project with the following content:

# Configuration variables

REDDIT_CLIENT_ID = "your_client_id"
REDDIT_CLIENT_SECRET = "your_client_secret"
REDDIT_USER_AGENT = "your_user_agent"
DATABASE = "reddit.db"

Replace your_client_id, your_client_secret, and your_user_agent with your Reddit API credentials.

Usage

Crawling comments

To crawl comments from a specified subreddit and store them in the database, run:

python main.py subreddit_name --limit 100

Replace subreddit_name with the name of the subreddit you want to crawl and --limit with the number of comments to fetch (default is 100).

Running a web interface

To start the web interface, run:

python web/app.py

Open your browser and navigate to http://127.0.0.1:5000/ to use the interface.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
metrics		metrics
nlp		nlp
scraper		scraper
tests		tests
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedditMiner

Features

Installation

Configuration

Usage

Crawling comments

Running a web interface

About

Releases

Packages

Languages

License

LeviSamuelEvans/RedditMiner

Folders and files

Latest commit

History

Repository files navigation

RedditMiner

Features

Installation

Configuration

Usage

Crawling comments

Running a web interface

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages