Skip to content

Collaborative-AI/Earthquake-Forecasting

Repository files navigation

SmartQuake

SmartQuake is a research project aimed at predicting earthquakes on a global scale using the latest machine learning technologies. It compiles data from 14 global datasets, creating a robust dataset that can be leveraged for future earthquake prediction research.

Alt Text SmartQuake Data Pipeline

  1. Visit the data_scraping/ folder
  2. Visit the data_processing/ folder
  3. Visit the data_merging/ folder

Dataset Checkpoints

Access the dataset at various stages of acquisition:


Table of Contents

  1. Data Scraping
  2. Data Processing
  3. Data Merging
  4. Data Source

Data Scraping

Overview

This step involves scraping earthquake data from various sources, including text files, web pages, and PDFs. It utilizes BeautifulSoup for web scraping, Pandas for data manipulation, and Tabula for PDF data extraction.

Installation

  1. From Google Drive, download the raw datasets and place them under the dataset/data_scraping/.../raw folder.

  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Run the scraping script:

    python dataset/main.py

    The scraped datasets will be saved under dataset/data_scraping/.../clean.

Usage

  1. Initialization: Create an instance of the Scraper class with parameters:

    • input_path: Path to the input file (for text and PDF sources).
    • output_path: Path to save the output CSV.
    • url: Webpage URL to scrape (for web sources).
    • start_time and end_time: Date range for filtering data.
    • header: Column names for the output CSV.
    • separator: Character for separating data in text files (default is space).
  2. Scraping:

    • find_quakes_txt(num_skips=0): For text files. num_skips skips initial lines.
    • find_quakes_web(): For web pages. Scrapes data based on the body tag and predefined header.
  3. Example:

    scraper = Scraper(input_path='input.txt', output_path='output.csv', url='http://example.com', header=['Date', 'Magnitude', 'Location'])
    scraper.find_quakes_txt(num_skips=1)
    scraper.find_quakes_web()

Data Processing

Overview

Data processing is the second step in the SmartQuake data pipeline. After scraping the datasets, they are compiled into a standardized format where all datasets share the same columns.

Data Standardization

Processed earthquake CSV files will contain the following columns:

  1. Timestamp: Stored as a pd.Timestamp string in UTC (format: YYYY-MM-DD HH:MM:SS.millisecond+00:00).
  2. Magnitude: Moment magnitude (Mw).
  3. Latitude: Range within [-90, 90].
  4. Longitude: Range within [-180, 180].
  5. Depth: Depth in kilometers (optional, may be None for older records).

All datasets are sorted chronologically and contain no duplicates.

File Organization

  • data_processor.py: Contains the DataProcessor class for standardized processing.
  • run_processor.py: Runs the DataProcessor on all scraped datasets.
  • processed/: Folder containing the processed output datasets (CSVs).

Running Data Processing

  1. Ensure that all clean datasets exist in the dataset/data_scraping/.../clean folder.

  2. Verify that the processed/ folder exists in data_processing/.

  3. Run run_processor.py:

    python data_processing/run_processor.py
  4. After completion, check for the processed CSVs in the processed/ folder before proceeding to the merging step.


Data Merging

The merging process combines all processed datasets into a single file for machine learning model input. This step preserves the same columns and ensures chronological order without duplicates.

File Organization

  • helper.py: Contains helper functions for merging.
  • merge.py: Merges non-USGS/SAGE datasets into Various-Catalogs.csv.
  • usgs_pre_1950/: Folder containing scripts for USGS data processing and merging.
  • final/: Folder containing usgs_sage_various_merge.py, which merges all datasets into Completed-Merge.csv.

Running Merge

  1. Compile Processed Datasets: Ensure all processed datasets are in data_processing/processed/ (excluding USGS/SAGE datasets).
  2. First Merge: Run merge.py to create Various-Catalogs.csv and move it to the folder data_merging/final.
  3. USGS Data Processing:
    • Visit the Google Drive and directly download USGS_SAGE_Merged.csv. Store the file in data_merging/final/ for the next step.
  4. Final Merge: Run usgs_sage_various_merge.py to merge all datasets into Completed-Merge.csv.

Data Source

Dataset Status Link Additional Comments
Argentina good Link Downloaded Manually
Canada good Link Downloaded Manually
Japan good Link Downloaded Manually
GHEA good Link Downloaded Manually
NOAA good Link Downloaded Manually
SoCal good Link Downloaded Manually
Turkey good Link Downloaded Manually
World Tremor good Link Downloaded Manually
East Africa good Link Downloaded Manually
Intensity good Link Downloaded Manually
PNW Tremor good Link Downloaded Manually
South Asia good Link Downloaded Manually
Texas good Link Downloaded Manually
USGS good Link Downloaded through python scraper, takes a lot of time to finish
SAGE deprecated Link Advised to use USGS according to the official webpage

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published