Skip to content

Commit

Permalink
Adjust how converted records are copied to GCS (#2023)
Browse files Browse the repository at this point in the history
The previous copying technique optimised for getting the records to GCS
as fast as possible for opportunistic ingestion by combine-to-osv.

The problem with this approach is that if any improvements to the
conversion code stop generating records that shouldn't be generated, the
most recent copy lingers in GCS indefinitely. This is causing situations
like the one described in #1961

So collect all of the records generated and copy them to GCS at the end
of the complete run, deleting anything in GCS that wasn't just copied.
  • Loading branch information
andrewpollock authored Mar 4, 2024
1 parent 0d89b3b commit de0024a
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions vulnfeeds/cmd/nvd-cve-osv/run_cve_to_osv_generation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ gcloud --no-user-output-enabled storage -q cp "${NVD_GCS_PATH}/*-????.json" "${W
echo "Downloading latest CPE Git repository map"
gcloud --no-user-output-enabled storage -q cp "${CPEREPO_GCS_PATH}" "${WORK_DIR}"

mkdir -p "${WORK_DIR}/nvd2osv/gcs_stage"

# Convert NVD CVE records to OSV.

for (( YEAR = $(date +%Y) ; YEAR >= ${FIRST_INSCOPE_YEAR} ; YEAR-- )); do
# Run OSV record generation.
echo "Converting NVD CVE records from ${YEAR} to OSV"
Expand All @@ -39,10 +43,10 @@ for (( YEAR = $(date +%Y) ; YEAR >= ${FIRST_INSCOPE_YEAR} ; YEAR-- )); do
--out_dir "${WORK_DIR}/nvd2osv/${YEAR}" \
--out_format PackageInfo

# Copy results to GCS bucket.
echo "Copying NVD CVE records from ${YEAR} successfully converted to OSV to GCS"
# Copy results to staging area.
echo "Copying NVD CVE records from ${YEAR} successfully converted to OSV to aggregated staging"
find "${WORK_DIR}/nvd2osv/${YEAR}" -type f -name \*.json \
| gcloud --no-user-output-enabled storage -q cp -I "${OSV_OUTPUT_GCS_PATH}"
| xargs cp -t "${WORK_DIR}/nvd2osv/gcs_stage"

# Copy conversion summary to GCS bucket.
DURABLE_OUTCOMES_CSV="${OSV_OUTPUT_GCS_PATH}/nvd-conversion-outcomes-${YEAR}-$(date -Iminutes).csv"
Expand All @@ -51,4 +55,8 @@ for (( YEAR = $(date +%Y) ; YEAR >= ${FIRST_INSCOPE_YEAR} ; YEAR-- )); do
echo "Results summary available at $DURABLE_OUTCOMES_CSV"
done

# Copy results to GCS bucket.
echo "Copying NVD CVE records successfully converted to GCS bucket"
gcloud --no-user-output-enabled storage rsync "${WORK_DIR}/nvd2osv/gcs_stage" "${OSV_OUTPUT_GCS_PATH}" --checksums-only -c --delete-unmatched-destination-objects -q

echo "Conversion run complete"

0 comments on commit de0024a

Please sign in to comment.