Skip to content
This repository has been archived by the owner on Jul 2, 2024. It is now read-only.

Add a script to search and add githubs repos for a specific language #181

Merged
merged 4 commits into from
Nov 29, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions search-and-add-github-repos.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
set -ex

# This script is exclusively developed for parser development purposes. It serves the specific use case of ingesting GitHub repositories internally for SAAS.

# Create out folder if not exist
mkdir -p out

# Search and add github repos by language, change the below language

# generate `gh.json` which includes repo fullName and branch name,
# Manually customize the language and adjust the count below according to your specific requirements.
# option to add filter: --updated="> 2023-11-21"
gh search repos --language javascript --visibility public --limit 1000 --json fullName,defaultBranch > out/gh.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well make javascript a script parameter. In addition we could also document that it could be interesting to set --sort stars (and / or --updated "> $(date -v-1m "+%Y-%m-%d")" to get repos updated in the last month), as users might be more inclined to run recipes against popular repos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I thought about this to put both language and count as script parameters, and then I think it will be used by us internally and we might want to do other customization, so I leave it as hard coded since anyway we need to type the language somewhere. We can update this for any needs for sure.


# generate `new.csv`, which will be merged to `repos.csv`
cat out/gh.json | jq -r '.[] | ",\(.fullName),\(.defaultBranch),,,,,,,"' > out/new.csv

# generate `repos.json`, which will be used to update `ownership.json`
cat out/gh.json | jq -r '.[] | " {
\"origin\": \"github.com\",
\"path\": \"" + .fullName + "\",
\"branch\": \"" + .defaultBranch + "\"
},"' | sed '$s/,$//' > out/repos-content.json

# Merge `new.csv` to `repos.csv`
cd parser/
./gradlew build && java -cp build/libs/parser-1.0-SNAPSHOT.jar io.moderne.jenkins.ingest.Merger ../repos.csv ../out/new.csv
kunli2 marked this conversation as resolved.
Show resolved Hide resolved

# Quick analysis of largest organizations
cut --delimiter='/' --fields=1 repos.csv | sort | uniq -c | sort -h | tail -n 20