Skip to content

Commit

Permalink
soaper
Browse files Browse the repository at this point in the history
  • Loading branch information
luslucifer committed Nov 29, 2024
1 parent fab62c0 commit 3a5dce6
Show file tree
Hide file tree
Showing 3 changed files with 218 additions and 0 deletions.
82 changes: 82 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Advanced Scraping Tutorial

Welcome to this guide! Whether you’re a total beginner or someone with experience navigating the web, this tutorial is here to help you dive into advanced scraping techniques. Don’t worry if you’re just starting out! A great place to build your foundation is Blatzar's [scraping tutorial](https://github.com/Blatzar/scraping-tutorial/tree/master). Once you’ve got the basics, you’ll be all set to tackle what’s in this guide!

While not mandatory, having some familiarity with **JavaScript/TypeScript** can make things a bit easier as you progress.

----------

## What You’ll Need

Here’s your checklist to get started:

- **Node.js** installed on your system.
- A basic understanding of **cryptography concepts** (just the essentials).
- **Curiosity and determination** (don’t worry if you’re not a pro—persistence is key!).

----------

## What is Obfuscation?

Let’s break down the concept of obfuscation.

**Obfuscation (noun):**
_"The process of making something confusing or difficult to understand, often on purpose."_

In programming, obfuscation involves making code challenging for humans to read while keeping it functional for machines. Think of it as turning clear instructions into a puzzle.

This technique is widely used to protect code and data, despite criticism from security researchers. It’s everywhere—from desktop apps to web applications. Unfortunately, it also makes tasks like debugging, privacy analysis, or simply understanding how your device communicates with a website much harder.

But don’t worry—that’s exactly what we’re here to tackle.

----------

## Why Learn This?

The goal of this guide is to help you **understand and bypass obfuscation techniques** so you can scrape data effectively. We’ll work through examples categorized by difficulty, focusing on real-world scenarios.

### Easy Targets

Perfect for practice and quick wins:

- **Soaper**
- **Nepu**
- **Catflix**
- **Vidlink**
- **Frembed**
- **Warezcdn**
- **Gogoanime**

### Medium Challenges

These sites need a bit more effort but are manageable:

- **Faselhd**
- **M4ufree**
- **Vidsrc**
- **Doodplay**
- **Streamflix**
- **VidBing**

### Advanced Hunts

Test your skills on these tougher targets:

- **9Anime**
- **Hianime**
- **FlixHQ (and its sister sites)**

For every difficulty level, we’ll dive into step-by-step practical guides, teaching you how to scrape effectively and document your progress.

----------

## Stay Connected

Got questions? Stuck on something? Want to share your progress?

Join our community:

- **[Discord](https://discord.gg/aAPmfsRD)**
- **[Telegram](https://t.me/vidjoy)**

We’re here to help you every step of the way. Let’s make scraping exciting, educational, and super rewarding! 🚀
42 changes: 42 additions & 0 deletions basics/gettingStarted.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Configuring Your Environment

To get started with reverse engineering (RE) websites, you’ll often need to bypass detection mechanisms that websites use to spot scrapers or automated browsers. One way to tackle this is by using an undetected browser.

In this guide, we’ll configure **[Librewolf](https://librewolf.net/)**—a privacy-focused browser that’s already equipped with useful patches—and make a few tweaks to make it fully undetectable for scraping tasks.

----------

## Step 1: Installing Librewolf

Head over to [librewolf.net](https://librewolf.net/) and download the version compatible with your operating system. Follow the installation instructions to set it up.

----------

## Step 2: Modifying `about:config`

Once Librewolf is installed, open the browser and type `about:config` into the address bar. You’ll see a page with advanced configuration settings.

Change the following settings to enhance stealth:

- **`librewolf.console.logging_disabled`**`true`
_(Disables console logging to prevent detection.)_
- **`librewolf.debugger.force_detach`**`true`
_(Ensures debugging tools are detached, avoiding detection.)_
- **`webgl.disabled`**`false`
_(Enables WebGL for rendering compatibility.)_
- **`privacy.resistFingerprinting`**`false`
_(Disables anti-fingerprinting measures that can raise suspicion.)_
- **`devtools.toolbox.host`**`window`
_(Changes devtools host to avoid triggering detection flags.)_
- **`devtools.source-map.client-service.enabled`**`false`
_(Disables source map service to prevent devtools-based checks.)_

----------

## Step 3: Congratulations!

You now have a browser configured to bypass many common detection methods. This setup allows you to navigate websites without arbitrary restrictions.

Want to test your new browser? Visit this [DevTools Detector Demo](https://blog.aepkill.com/demos/devtools-detector/) to confirm it’s working undetected.

Happy scraping! 🚀
94 changes: 94 additions & 0 deletions scrapers/Soaper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Soaper Scraping Guide

Scraping Soaper is as straightforward as it gets—it almost feels like it's inviting us to do it! Let me walk you through how to scrape Soaper step-by-step. 🎯

Our target: **[Soaper](https://soaper.live/)**

----------

## Scraping Steps

### Step 1: Pick Your Page

Open any movie or TV series page. For this example, we’re scraping the following page:

```bash
https://soaper.live/movie_1rGplMrG87.html

```

----------

### Step 2: Open DevTools

Fire up **Librewolf** (or any browser) and open the **DevTools** (Ctrl+Shift+I or right-click > Inspect).
Go to the **Network tab**. If no requests are visible, refresh the page. Your Network tab should look like this:

![Open DevTools](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/openDevTools.png)

----------

### Step 3: Inspect Responses

Check the response of the URLs under the **Network tab**. You’ll eventually find one returning an `.m3u8` file—our treasure! 🗺️

![Look for URLs](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/lookUrls.png)

----------

### Step 4: Copy Headers

Once you find the request, right-click and copy **all headers**.

![Copy Headers](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/copyAllHeaders.png)

----------

### Step 5: Extract the Payload

Since it’s a POST request, you’ll also need the payload. Grab it from the **Request section**.

![Copy Payload](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/copyPayload.png)

----------

### Step 6: Use ChatGPT to Mimic the Request

Now, paste the headers, URL, and payload into ChatGPT and ask it to mimic the request in `curl`.

![Paste into ChatGPT](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/pasteInChatgpt.png)

----------

### Step 7: Test the `curl`

Copy the `curl` command ChatGPT provides and use it in **Postman** or **ThunderClient**. When you send the request, you’ll likely get a compressed encoded response—don’t worry, it’s not encrypted.

![Request with Accept-Encoding](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/requesWithAcceptEncodeing.png)

----------

### Step 8: Disable `Accept-Encoding`

Remove the `Accept-Encoding` header from the request and resend it. This time, you’ll get clean, uncompressed data.

![Disable Accept-Encoding](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/diableIngAcceptEncoding.png)

----------

### Step 9: Winner Winner Chicken Dinner ! 🎉

Congratulations, you’ve successfully scraped Soaper! Enjoy your decoded response.

![Winner Winner Chicken Dinner](https://raw.githubusercontent.com/luslucifer/Web-reversing/main/images/winer_winer_chicken_dinner.png)

----------

## Stay Connected

Got questions or want to share your progress? We’ve got you covered:

- **[Join us on Discord](https://discord.gg/aAPmfsRD)**
- **[Chat on Telegram](https://t.me/vidjoy)**

Let’s keep scraping fun, educational, and super rewarding! 🚀

0 comments on commit 3a5dce6

Please sign in to comment.