Skip to content

Commit

Permalink
Privacy Sql Tracking Detection Using Easylist Adservers (#3730)
Browse files Browse the repository at this point in the history
* Add GA4 fields to match documentation (#3679)

* Add standard GA4 web-vital fields

* Add value

* Update Timestamps (#3680)

Co-authored-by: tunetheweb <[email protected]>

* Bump web-vitals from 4.1.0 to 4.1.1 in /src (#3681)

Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.1.0 to 4.1.1.
- [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md)
- [Commits](GoogleChrome/web-vitals@v4.1.0...v4.1.1)

---
updated-dependencies:
- dependency-name: web-vitals
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump puppeteer from 22.10.0 to 22.10.1 in /src (#3682)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.10.0 to 22.10.1.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.10.0...puppeteer-v22.10.1)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump prettier from 3.3.1 to 3.3.2 in /src (#3683)

Bumps [prettier](https://github.com/prettier/prettier) from 3.3.1 to 3.3.2.
- [Release notes](https://github.com/prettier/prettier/releases)
- [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md)
- [Commits](prettier/prettier@3.3.1...3.3.2)

---
updated-dependencies:
- dependency-name: prettier
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump puppeteer from 22.10.1 to 22.11.0 in /src (#3684)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.10.1 to 22.11.0.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.10.1...puppeteer-v22.11.0)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Translation of security chapter to Japanese (#3685)

* Bump puppeteer from 22.11.0 to 22.11.2 in /src (#3688)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.11.0 to 22.11.2.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.11.0...puppeteer-v22.11.2)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump web-vitals from 4.1.1 to 4.2.0 in /src (#3690)

Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.1.1 to 4.2.0.
- [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md)
- [Commits](GoogleChrome/web-vitals@v4.1.1...v4.2.0)

---
updated-dependencies:
- dependency-name: web-vitals
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump puppeteer from 22.11.2 to 22.12.0 in /src (#3689)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.11.2 to 22.12.0.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.11.2...puppeteer-v22.12.0)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Timestamps (#3691)

Co-authored-by: tunetheweb <[email protected]>

* Remove deploy.zip step of deployment (#3692)

* Remove deploy.zip

* Remove from ignore files

* Bump puppeteer from 22.12.0 to 22.12.1 in /src (#3694)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.12.0 to 22.12.1.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.12.0...puppeteer-v22.12.1)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump treosh/lighthouse-ci-action from 11.4.0 to 12.1.0 (#3693)

* Bump treosh/lighthouse-ci-action from 11.4.0 to 12.1.0

Bumps [treosh/lighthouse-ci-action](https://github.com/treosh/lighthouse-ci-action) from 11.4.0 to 12.1.0.
- [Release notes](https://github.com/treosh/lighthouse-ci-action/releases)
- [Commits](treosh/lighthouse-ci-action@11.4.0...12.1.0)

---
updated-dependencies:
- dependency-name: treosh/lighthouse-ci-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>

* Upgrade to Node 20

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Barry Pollard <[email protected]>

* Bump web-vitals from 4.2.0 to 4.2.1 in /src (#3695)

Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.0 to 4.2.1.
- [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md)
- [Commits](GoogleChrome/web-vitals@v4.2.0...v4.2.1)

---
updated-dependencies:
- dependency-name: web-vitals
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/setup-python from 5.1.0 to 5.1.1 (#3699)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.1.0 to 5.1.1.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5.1.0...v5.1.1)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump puppeteer from 22.12.1 to 22.13.0 in /src (#3698)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.12.1 to 22.13.0.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.12.1...puppeteer-v22.13.0)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Translation of mobile-web chapter to Japanese (#3700)

* Bump puppeteer from 22.13.0 to 22.15.0 in /src (#3711)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.13.0 to 22.15.0.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.13.0...puppeteer-v22.15.0)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump jsdom from 24.1.0 to 24.1.1 in /src (#3707)

Bumps [jsdom](https://github.com/jsdom/jsdom) from 24.1.0 to 24.1.1.
- [Release notes](https://github.com/jsdom/jsdom/releases)
- [Changelog](https://github.com/jsdom/jsdom/blob/main/Changelog.md)
- [Commits](jsdom/jsdom@24.1.0...24.1.1)

---
updated-dependencies:
- dependency-name: jsdom
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump web-vitals from 4.2.1 to 4.2.2 in /src (#3706)

Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.1 to 4.2.2.
- [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md)
- [Commits](GoogleChrome/web-vitals@v4.2.1...v4.2.2)

---
updated-dependencies:
- dependency-name: web-vitals
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump prettier from 3.3.2 to 3.3.3 in /src (#3702)

Bumps [prettier](https://github.com/prettier/prettier) from 3.3.2 to 3.3.3.
- [Release notes](https://github.com/prettier/prettier/releases)
- [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md)
- [Commits](prettier/prettier@3.3.2...3.3.3)

---
updated-dependencies:
- dependency-name: prettier
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump web-vitals from 4.2.2 to 4.2.3 in /src (#3715)

Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.2 to 4.2.3.
- [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md)
- [Commits](GoogleChrome/web-vitals@v4.2.2...v4.2.3)

---
updated-dependencies:
- dependency-name: web-vitals
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Timestamps (#3716)

Co-authored-by: rviscomi <[email protected]>

* tracking detection using easylist adservers

* easylist_adserver tracking detection and query

* 2022 cdn portuguese (#3725)

* add file to translation

* done translation cdn.md

Makes progress on #505

* Bump puppeteer from 22.15.0 to 23.0.2 in /src (#3719)

Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.15.0 to 23.0.2.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](puppeteer/puppeteer@puppeteer-v22.15.0...puppeteer-v23.0.2)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Timestamps (#3726)

Co-authored-by: tunetheweb <[email protected]>

* Replace `<object>` with `<iframe>` for embedded SVG (#3727)

* Replace object with iframe for embedded SVG

* Translations

* auto upload easylist data to table

* Fix the build to ignore 2024 chapters (for now) (#3728)

* Fix the build to ignore 2024 chapters (for now)

* Remove test line

* Update Timestamps (#3729)

Co-authored-by: tunetheweb <[email protected]>

* liniting

* liniting

* linting

* linting

* linting

* linting

* fixes of Simplified Chinese translation for 2020 Performance (#3734)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Barry Pollard <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tunetheweb <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sakae Kotaro <[email protected]>
Co-authored-by: rviscomi <[email protected]>
Co-authored-by: Hadi Amjad <[email protected]>
Co-authored-by: William Constantinov <[email protected]>
Co-authored-by: Zuckjet <[email protected]>
Co-authored-by: Max Ostapenko <[email protected]>
  • Loading branch information
11 people authored Aug 16, 2024
1 parent a239c25 commit baf490d
Show file tree
Hide file tree
Showing 2 changed files with 114 additions and 0 deletions.
38 changes: 38 additions & 0 deletions sql/2024/privacy/tracking-detection/easylist-tracker-detection.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
CREATE TEMP FUNCTION
CheckDomainInURL(url STRING, domain STRING)
RETURNS INT64
LANGUAGE js AS """
return url.includes(domain) ? 1 : 0;
""";

-- We need to use the `easylist_adservers.csv` to populate the table to get the list of domains to block
-- https://github.com/easylist/easylist/blob/master/easylist/easylist_adservers.txt
WITH easylist_data AS (
SELECT string_field_0
FROM `httparchive.almanac.easylist_adservers`
),
requests_data AS (
SELECT url
FROM `httparchive.all.requests`
WHERE
date = '2024-06-01' AND
is_root_page = TRUE
),
block_status AS (
SELECT
r.url,
MAX(
CASE
WHEN CheckDomainInURL(r.url, e.string_field_0) = 1 THEN 1
ELSE 0
END
) AS should_block
FROM requests_data r
LEFT JOIN easylist_data e
ON CheckDomainInURL(r.url, e.string_field_0) = 1
GROUP BY r.url
)
SELECT
COUNT(0) AS blocked_url_count
FROM block_status
WHERE should_block = 1;
76 changes: 76 additions & 0 deletions sql/util/populate_easylist_adserver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# pylint: disable=import-error
import requests
import pandas as pd
from google.cloud import bigquery


def extract_domains_from_file(file_path):
domains = []
try:
with open(file_path, "r") as file:
for line in file:
# Remove the '||' prefix and '^' suffix
domain = line.strip().lstrip("||").rstrip("^")
if domain: # Ensure the line is not empty
domains.append(domain)
except FileNotFoundError:
print(f"Error: The file {file_path} does not exist.")
except Exception as e:
print(f"An error occurred: {e}")
return domains


def save_domains_to_csv(domains, csv_file_path):
try:
# Create a DataFrame from the list of domains
df = pd.DataFrame(domains, columns=["Domain"])
# Save the DataFrame to a CSV file
df.to_csv(csv_file_path, index=False)
except Exception as e:
print(f"An error occurred while writing to CSV: {e}")


def upload_csv_to_bigquery(csv_file_path):
# this needs the GOOGLE_APPLICATION_CREDENTIALS env variable to be set
client = bigquery.Client()

# Configure the job
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV,
skip_leading_rows=1, # Adjust if your CSV doesn't have a header row
autodetect=True, # Automatically infer schema
)

# Load data from the CSV file
with open(csv_file_path, "rb") as source_file:
load_job = client.load_table_from_file(
source_file, "httparchive.almanac.easylist_adservers",
job_config=job_config
)

# Wait for the job to complete
load_job.result()


# URL to the text file containing the regex patterns
url = "https://raw.githubusercontent.com/easylist/easylist/master/" \
"easylist/easylist_adservers.txt"
file_path = "easylist_adservers.txt"
# Path to the output CSV file
csv_file_path = "easylist_adservers.csv"

# Download the file and save it locally
response = requests.get(url)
with open(file_path, "wb") as file:
file.write(response.content)

# Extract domains
domains = extract_domains_from_file(file_path)

# Save domains to CSV
save_domains_to_csv(domains, csv_file_path)

# upload domains to BQ
upload_csv_to_bigquery(csv_file_path)

print(f"Domains have been saved to {csv_file_path}")

0 comments on commit baf490d

Please sign in to comment.