Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RING-44448 - export_s3_keys.sh improvements #49

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

TrevorBenson
Copy link
Member

RING-44448 - keep raw files option
RING-44448 - determine the IP to query for bucketd
RING-44448 - basic error logs for each RID and bucket

I'm opening this as a draft PR for us to discuss and for some initial testing.

  1. It asks about raw files up front and keeps them all or none.
  2. Uses ss to determine if port 9000 is bound to a specific IP or not.
    • Matches [::]:9000for when ipv6 is enabled, or disabled via sysctl.
    • Matches 0.0.0.0:9000 when ipv6 is disabled via grub and [::] will not appear in ss output.
  3. There is a very basic log output for INFO, WARNING and ERROR.
    • A basic RAW_COUNT vs. PROCESSED_COUNT is performed.
    • Empty buckets are logged as WARNINGS.

Currently the script will report any failures that occurs into the RID_${RID}.log as well as the ${bucket}.log file, then it will exit with a specific RC code as defined in the script. This "fail early" logic will stop the processing of the RAFT session and report back which buckets have not yet been processed at the time of the ERROR and exiting the script.

RING-44448 - keep raw files option
RING-44448 - determine the IP to query for bucketd
RING-44448 - basic error logs for each RID and bucket
@TrevorBenson TrevorBenson force-pushed the improvement/RING-44448/export-s3-keys branch from a992cfc to a3f857b Compare October 2, 2023 19:38
@TrevorBenson TrevorBenson changed the title RING-44448 export_s3_keys.sh improvements RING-44448 - export_s3_keys.sh improvements Oct 2, 2023
@TrevorBenson
Copy link
Member Author

TrevorBenson commented Oct 2, 2023

Currently, the script has a hardcoded s3utils version which has to be bumped when the offline archive bundles a newer version.

Options:

  1. Manage the script by a github action. When the s3utils version is bumped, the export_s3_keys.sh would get a the new versioned hardcoded.
  2. Manage export_s3_keys.sh as a Jinja template. Would make the script deployed/updated by Ansible, potentially having an older version if Ansible is not used.
  • Check container hosts local registry for all s3utils versions. Determine the highest one, and use it.

Something like:

vercomp() {
	if [[ $1 == "$2" ]]; then
		return 0
	fi
	local IFS=.
	local i
	local ver1=($1)
	local ver2=($2)
	for ((i = ${#ver1[@]}; i < ${#ver2[@]}; i++)); do
		ver1[i]=0
	done
	for ((i = 0; i < ${#ver1[@]}; i++)); do
		if [[ -z ${ver2[i]} ]]; then
			ver2[i]=0
		fi
		if ((10#${ver1[i]} > 10#${ver2[i]})); then
			return 1
		fi
		if ((10#${ver1[i]} < 10#${ver2[i]})); then
			return 2
		fi
	done
	return 0
}

mapfile -t S3UTILS_VERSION < <(docker images registry.scality.com/s3utils/s3utils --format '{{ .Tag }}')

while [[ ${#S3UTILS_VERSION[@]} -gt 1 ]]; do
	element1=${S3UTILS_VERSION[0]}
	for ((i = 1; i < ${#S3UTILS_VERSION[@]}; i++)); do
		element2=${S3UTILS_VERSION[i]}
		vercomp "${element1}" "${element2}"
		case $? in
		1)
			S3UTILS_VERSION=("${S3UTILS_VERSION[@]:0:i}" "${S3UTILS_VERSION[@]:${i}+1}")
			i=$((i - 1))
			;;
		2)
			S3UTILS_VERSION=("${S3UTILS_VERSION[@]:1:i}" "${S3UTILS_VERSION[@]:${i}+1}")
			i=$((i - 1))
			;;
		esac
	done
done

@TrevorBenson TrevorBenson marked this pull request as ready for review October 10, 2023 15:30
@TrevorBenson
Copy link
Member Author

@scality-fno @fra-scality any additional suggestions before we widen the pool of reviewers?

Copy link
Contributor

@scality-fno scality-fno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't been able to test this one... do you have a lab handy with everything in place from this "dev" environment?
That said, gave a few suggestions (pointless, essentially).
Question: is the double tee pipeline used for logging somewhat redundant?

@scality-fno
Copy link
Contributor

@scality-fno @fra-scality any additional suggestions before we widen the pool of reviewers?

absolutely widen it as soon as possible — I'm no reliable proofreader right now.

@TrevorBenson
Copy link
Member Author

TrevorBenson commented Oct 10, 2023

haven't been able to test this one... do you have a lab handy with everything in place from this "dev" environment? That said, gave a few suggestions (pointless, essentially). Question: is the double tee pipeline used for logging somewhat redundant?

Sort of, but not exactly. There are items that are only in the RID logs because they are not specific to a single bucket. THe RID log however does contain every log entry from each bucket. Below is the breakdown of log messages for a bucket log, and then the additional log messages only found in the RID log.

TL;DR

  • The <bucket>.log contains

    • Everything for its own bucket, including:
      • INFO about Starting/Completing export of bucket
      • INFO (or) WARNING about verifyBucketSproxydKeys.js scan completion
      • ERROR about failed to export keys or process keys
      • INFO/WARNING about raw and processed count comparisons
      • INFO about keep/remove of raw files
  • The PID_<PID>.log contains

    • Everything inside each <bucket>.log
    • Only present in the RID logs:
      • INFO/ERROR about finding bucketd instance (valid IP etc.)
      • INFO about the list of buckets per Raft ID
      • ERROR about Raft ID empty (ENOENT / no buckets found)
      • ERROR list of all unprocessed buckets whenever dumping the entire RID does not end successfully

@TrevorBenson TrevorBenson force-pushed the improvement/RING-44448/export-s3-keys branch from ffb1dcc to e246e65 Compare October 27, 2023 22:39
@TrevorBenson TrevorBenson force-pushed the improvement/RING-44448/export-s3-keys branch from e246e65 to c033ac9 Compare October 27, 2023 23:07
RING-44448 - Autodetect the Bucketd IP address
RING-44448 - Optional -i parameter to define Bucketd IP address (disables autodetect)
RING-44448 - Optional -p parameter to define Bucketd Port (requied for IP autodetect if changed from 9000)
RING-44448 - Required -r parameter for defining RAFT ID
RING-44448 - Optional -s parameter for setting S3 Utils image name
RING-44448 - Autodetect the S3 Utils version and define minimum version
RING-44448 - Optional -v parameter for setting S3 Utils image version (disables autodetect)
RING-44448 - Optional -w parameter for setting Working Directory
@TrevorBenson
Copy link
Member Author

Opening this up to a wider audience.

@scality-fno
Copy link
Contributor

I haven't been able to sync up with @fra-scality (wr/ s3utils improvements) or Cédrick (wr/ latest spark toolkit use case scenario with EDF)...
And even if this PR makes the TSKB about Spark usage obsolete, I don't care. These improvements are a must. My only concern is that some of them may have already been addressed "offline" by Francesco or Cédrick;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants