Skip to content

Latest commit

 

History

History
46 lines (24 loc) · 3 KB

README.md

File metadata and controls

46 lines (24 loc) · 3 KB

AI-Generated Text Detection using BERT

Introduction

AI-Generated Text Detection using BERT is a project aimed at detecting AI-generated text segments within a given dataset. Leveraging the power of BERT (Bidirectional Encoder Representations from Transformers), the project addresses the challenge of distinguishing between genuine human-authored content and computer-generated text. By implementing advanced natural language processing techniques, the model contributes to enhancing cybersecurity and integrity in digital communications.

Work Flow

The project follows a structured workflow:

  • Data Preprocessing: Cleaning and preprocessing textual data to remove noise, stop words, punctuation, and non-alphabetic characters using BERT-preprocess.

  • Additional Datasets: Collecting various datasets from competitions and concatenating them to increase the training instances. This step enhances the model's ability to identify features and patterns effectively.

  • Model Training: Utilizing a BERT-based sequence classification model to train the system to distinguish between human and AI-generated text segments accurately.

  • Predictions: Generating predictions on test data to highlight potential AI-generated content segments.

  • Result Analysis: Saving the results in a CSV file for submission and further analysis.

Comprehensive Explanation on How BERT Detects AI-Generated Texts

The project includes an in-depth analysis of how BERT detects AI-generated texts, exploring various features, including semantic differences, vocabulary usage, statistical distributions, and sentiment analysis measures. The analysis delves into black-box detection algorithms for AI text detection, shedding light on the underlying mechanisms responsible for distinguishing between human and AI-generated content.

Edge Cases

The project addresses edge cases and potential anomalies in AI-generated text detection. Detailed explanations and possible solutions for edge cases are provided, enhancing the model's robustness and accuracy.

Notable Points

The project highlights notable points and findings, including observations on the differences between human-authored and AI-generated content. Insights from research papers and analysis provide valuable information for understanding and addressing challenges in AI text detection.

Result Summary

A summary of the project's results and findings is presented, including model performance, LB scores, and recommendations for further analysis. Insights into the effectiveness of different models and techniques contribute to advancing research in AI text detection.

References for Further Analysis

Various research papers and resources are referenced for further analysis and exploration of AI text detection. These references provide valuable insights and perspectives for continued research and development in the field.

Author

Kairvee Vaswani

[email protected]