-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
fab62c0
commit 3a5dce6
Showing
3 changed files
with
218 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Advanced Scraping Tutorial | ||
|
||
Welcome to this guide! Whether you’re a total beginner or someone with experience navigating the web, this tutorial is here to help you dive into advanced scraping techniques. Don’t worry if you’re just starting out! A great place to build your foundation is Blatzar's [scraping tutorial](https://github.com/Blatzar/scraping-tutorial/tree/master). Once you’ve got the basics, you’ll be all set to tackle what’s in this guide! | ||
|
||
While not mandatory, having some familiarity with **JavaScript/TypeScript** can make things a bit easier as you progress. | ||
|
||
---------- | ||
|
||
## What You’ll Need | ||
|
||
Here’s your checklist to get started: | ||
|
||
- **Node.js** installed on your system. | ||
- A basic understanding of **cryptography concepts** (just the essentials). | ||
- **Curiosity and determination** (don’t worry if you’re not a pro—persistence is key!). | ||
|
||
---------- | ||
|
||
## What is Obfuscation? | ||
|
||
Let’s break down the concept of obfuscation. | ||
|
||
**Obfuscation (noun):** | ||
_"The process of making something confusing or difficult to understand, often on purpose."_ | ||
|
||
In programming, obfuscation involves making code challenging for humans to read while keeping it functional for machines. Think of it as turning clear instructions into a puzzle. | ||
|
||
This technique is widely used to protect code and data, despite criticism from security researchers. It’s everywhere—from desktop apps to web applications. Unfortunately, it also makes tasks like debugging, privacy analysis, or simply understanding how your device communicates with a website much harder. | ||
|
||
But don’t worry—that’s exactly what we’re here to tackle. | ||
|
||
---------- | ||
|
||
## Why Learn This? | ||
|
||
The goal of this guide is to help you **understand and bypass obfuscation techniques** so you can scrape data effectively. We’ll work through examples categorized by difficulty, focusing on real-world scenarios. | ||
|
||
### Easy Targets | ||
|
||
Perfect for practice and quick wins: | ||
|
||
- **Soaper** | ||
- **Nepu** | ||
- **Catflix** | ||
- **Vidlink** | ||
- **Frembed** | ||
- **Warezcdn** | ||
- **Gogoanime** | ||
|
||
### Medium Challenges | ||
|
||
These sites need a bit more effort but are manageable: | ||
|
||
- **Faselhd** | ||
- **M4ufree** | ||
- **Vidsrc** | ||
- **Doodplay** | ||
- **Streamflix** | ||
- **VidBing** | ||
|
||
### Advanced Hunts | ||
|
||
Test your skills on these tougher targets: | ||
|
||
- **9Anime** | ||
- **Hianime** | ||
- **FlixHQ (and its sister sites)** | ||
|
||
For every difficulty level, we’ll dive into step-by-step practical guides, teaching you how to scrape effectively and document your progress. | ||
|
||
---------- | ||
|
||
## Stay Connected | ||
|
||
Got questions? Stuck on something? Want to share your progress? | ||
|
||
Join our community: | ||
|
||
- **[Discord](https://discord.gg/aAPmfsRD)** | ||
- **[Telegram](https://t.me/vidjoy)** | ||
|
||
We’re here to help you every step of the way. Let’s make scraping exciting, educational, and super rewarding! 🚀 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Configuring Your Environment | ||
|
||
To get started with reverse engineering (RE) websites, you’ll often need to bypass detection mechanisms that websites use to spot scrapers or automated browsers. One way to tackle this is by using an undetected browser. | ||
|
||
In this guide, we’ll configure **[Librewolf](https://librewolf.net/)**—a privacy-focused browser that’s already equipped with useful patches—and make a few tweaks to make it fully undetectable for scraping tasks. | ||
|
||
---------- | ||
|
||
## Step 1: Installing Librewolf | ||
|
||
Head over to [librewolf.net](https://librewolf.net/) and download the version compatible with your operating system. Follow the installation instructions to set it up. | ||
|
||
---------- | ||
|
||
## Step 2: Modifying `about:config` | ||
|
||
Once Librewolf is installed, open the browser and type `about:config` into the address bar. You’ll see a page with advanced configuration settings. | ||
|
||
Change the following settings to enhance stealth: | ||
|
||
- **`librewolf.console.logging_disabled`** → `true` | ||
_(Disables console logging to prevent detection.)_ | ||
- **`librewolf.debugger.force_detach`** → `true` | ||
_(Ensures debugging tools are detached, avoiding detection.)_ | ||
- **`webgl.disabled`** → `false` | ||
_(Enables WebGL for rendering compatibility.)_ | ||
- **`privacy.resistFingerprinting`** → `false` | ||
_(Disables anti-fingerprinting measures that can raise suspicion.)_ | ||
- **`devtools.toolbox.host`** → `window` | ||
_(Changes devtools host to avoid triggering detection flags.)_ | ||
- **`devtools.source-map.client-service.enabled`** → `false` | ||
_(Disables source map service to prevent devtools-based checks.)_ | ||
|
||
---------- | ||
|
||
## Step 3: Congratulations! | ||
|
||
You now have a browser configured to bypass many common detection methods. This setup allows you to navigate websites without arbitrary restrictions. | ||
|
||
Want to test your new browser? Visit this [DevTools Detector Demo](https://blog.aepkill.com/demos/devtools-detector/) to confirm it’s working undetected. | ||
|
||
Happy scraping! 🚀 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Soaper Scraping Guide | ||
|
||
Scraping Soaper is as straightforward as it gets—it almost feels like it's inviting us to do it! Let me walk you through how to scrape Soaper step-by-step. 🎯 | ||
|
||
Our target: **[Soaper](https://soaper.live/)** | ||
|
||
---------- | ||
|
||
## Scraping Steps | ||
|
||
### Step 1: Pick Your Page | ||
|
||
Open any movie or TV series page. For this example, we’re scraping the following page: | ||
|
||
```bash | ||
https://soaper.live/movie_1rGplMrG87.html | ||
|
||
``` | ||
|
||
---------- | ||
|
||
### Step 2: Open DevTools | ||
|
||
Fire up **Librewolf** (or any browser) and open the **DevTools** (Ctrl+Shift+I or right-click > Inspect). | ||
Go to the **Network tab**. If no requests are visible, refresh the page. Your Network tab should look like this: | ||
|
||
data:image/s3,"s3://crabby-images/dbcc7/dbcc7993346efb06841827f950ba1fba39435d2d" alt="Open DevTools" | ||
|
||
---------- | ||
|
||
### Step 3: Inspect Responses | ||
|
||
Check the response of the URLs under the **Network tab**. You’ll eventually find one returning an `.m3u8` file—our treasure! 🗺️ | ||
|
||
data:image/s3,"s3://crabby-images/0273b/0273b699076d36fbae5246dc1d11d8d3cab5ffef" alt="Look for URLs" | ||
|
||
---------- | ||
|
||
### Step 4: Copy Headers | ||
|
||
Once you find the request, right-click and copy **all headers**. | ||
|
||
data:image/s3,"s3://crabby-images/13283/132830939a39e55074175d5b971a5fa78cd31893" alt="Copy Headers" | ||
|
||
---------- | ||
|
||
### Step 5: Extract the Payload | ||
|
||
Since it’s a POST request, you’ll also need the payload. Grab it from the **Request section**. | ||
|
||
data:image/s3,"s3://crabby-images/e8253/e8253b31b15d929ca8ea0600df33a8fb765efc8b" alt="Copy Payload" | ||
|
||
---------- | ||
|
||
### Step 6: Use ChatGPT to Mimic the Request | ||
|
||
Now, paste the headers, URL, and payload into ChatGPT and ask it to mimic the request in `curl`. | ||
|
||
data:image/s3,"s3://crabby-images/1f241/1f241bf082c3777a1c5998c4fb7ad8ad06af4096" alt="Paste into ChatGPT" | ||
|
||
---------- | ||
|
||
### Step 7: Test the `curl` | ||
|
||
Copy the `curl` command ChatGPT provides and use it in **Postman** or **ThunderClient**. When you send the request, you’ll likely get a compressed encoded response—don’t worry, it’s not encrypted. | ||
|
||
data:image/s3,"s3://crabby-images/14f61/14f61a35a26829f5b88c3c3e007a41f3e53a7930" alt="Request with Accept-Encoding" | ||
|
||
---------- | ||
|
||
### Step 8: Disable `Accept-Encoding` | ||
|
||
Remove the `Accept-Encoding` header from the request and resend it. This time, you’ll get clean, uncompressed data. | ||
|
||
data:image/s3,"s3://crabby-images/18b6f/18b6feb291779fdcdfdaf502a48c5425364703e7" alt="Disable Accept-Encoding" | ||
|
||
---------- | ||
|
||
### Step 9: Winner Winner Chicken Dinner ! 🎉 | ||
|
||
Congratulations, you’ve successfully scraped Soaper! Enjoy your decoded response. | ||
|
||
data:image/s3,"s3://crabby-images/b6f0b/b6f0bf219c55b6503e5a979b7cbf900a2e5c8da1" alt="Winner Winner Chicken Dinner" | ||
|
||
---------- | ||
|
||
## Stay Connected | ||
|
||
Got questions or want to share your progress? We’ve got you covered: | ||
|
||
- **[Join us on Discord](https://discord.gg/aAPmfsRD)** | ||
- **[Chat on Telegram](https://t.me/vidjoy)** | ||
|
||
Let’s keep scraping fun, educational, and super rewarding! 🚀 |