-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloads stats are off #3697
Comments
If someone else with access wants to have a look: https://console.cloud.google.com/run/detail/us-central1/processlogs/metrics?project=nodejs-org |
|
How can I have access to look into it? |
I formalized an access request: #3705 |
I currently don't have time to look into this either, but for a quick glance at GCP some things that probably warrant looking at:
|
I'm still planning to look into it, but things got deferred. |
Unfortunately, I did not have time to look into this, but it looks like this is a serious issue as the last day we had the data is the 2024/5/2. Maybe somebody else can step in? I don't think I would not be able to look into it in the coming weeks/months. |
@trivikr maybe could you take a look? |
Direct link for viewing April stats: https://storage.googleapis.com/access-logs-summaries-nodejs/?marker=nodejs.org-access.log.20240331.json Verified that the access logs are missing for the dates, as mentioned in the description #3697 (comment) above. |
As per the source code, it looks like the contents of the filename are not a valid JSON build/ansible/roles/metrics/files/process-cloudflare/process-cloudflare.js Lines 142 to 164 in 437f7e3
|
Example log which throws error BUCKET cloudflare-logs-nodejs
FILENAME 20240618/20240618T225603Z_20240618T225716Z_da79c9d1.log.gz
PROCESSEDFILENAME 20240618/20240618T225603Z_20240618T225716Z
INSIDE CREATE PIPELINE
String length: 21570831
Array Length: 44744
SyntaxError: Unexpected end of JSON input Direct link to file: 20240618/20240618T225603Z_20240618T225716Z_da79c9d1.log.gz Sample code which processes this file by reading file locally from import { readFile } from "fs/promises";
const data = await readFile("test.json", "utf8");
for (const line of data.split("\n")) {
try {
JSON.parse(line);
} catch (error) {
const { message } = error;
console.log({ message, line });
}
} Example output { message: 'Unexpected end of JSON input', line: '' } An easy fix would be to skip reading lines which don't have any data // ...
for (const line of data.split("\n")) {
if (line.length === 0) {
continue;
}
// ... |
The SyntaxError is just logged in console but the processing continues. build/ansible/roles/metrics/files/process-cloudflare/process-cloudflare.js Lines 162 to 168 in 437f7e3
I posted a fix at #3765 to reduce the number of errors. |
Posted another PR to log line with length if parsing fails #3766 |
I'll resume my deep dive once fixes for SyntaxError are reviewed, merged and deployed. |
I've created some bug reports based on logs. However, the access logs are not created after The next steps would be to deep dive on #3771, and add more logs in build/ansible/roles/metrics/files/summaries/summaries.js Lines 76 to 87 in e3e4b4a
I can take it up after getting approvals for existing open PRs. |
Updates:
|
No action needed here, although increasing the timeout is recommended in #3771 (comment) The summaries are being produced since 20240629. I couldn't find the exact reason, the removal of
A fix for this is pending merge in #3801 |
On examining the source code of build/ansible/roles/metrics/files/summaries/summaries.js Lines 70 to 72 in aa33990
However, I couldn't find a clear correlation between the missing summaries and number of files processed Test code: import { Storage } from "@google-cloud/storage";
const projectId = "nodejs-org";
const storage = new Storage({ projectId });
for (let i = 1; i < 31; i++) {
const date = `202404${i.toString().padStart(2, "0")}`;
const filePrefix = `${date}/`;
const [files] = await storage.bucket("processed-logs-nodejs").getFiles({ prefix: filePrefix });
const fileName = `nodejs.org-access.log.${date}.json`;
const [exists] = await storage.bucket("access-logs-summaries-nodejs").file(fileName).exists();
console.log(date, files.length, exists);
} Output 20240401 3611 false
20240402 3757 false
20240403 3774 false
20240404 3725 false
20240405 3895 false
20240406 3671 true
20240407 3402 true
20240408 3708 false
20240409 3781 false
20240410 3776 false
20240411 4204 false
20240412 4883 false
20240413 4659 false
20240414 4694 false
20240415 4452 false
20240416 2700 false
20240417 2674 false
20240418 2654 true
20240419 2643 true
20240420 2609 true
20240421 2631 true
20240422 2689 false
20240423 2663 true
20240424 2660 true
20240425 2652 true
20240426 2640 true
20240427 2618 true
20240428 2636 true
20240429 2754 true
20240430 2664 false It could happen because of the content of the files, or some other reason. |
The summaries are created for months of April to June, with the fix in #3809 Test code to call summaries import { GoogleAuth } from "google-auth-library";
import { Storage } from "@google-cloud/storage";
import { Agent } from "undici";
const projectId = "nodejs-org";
const storage = new Storage({ projectId });
const cloudRunUrl = "https://produce-summaries-kdtacnjogq-uc.a.run.app";
const auth = new GoogleAuth();
const client = await auth.getIdTokenClient(cloudRunUrl);
globalThis[Symbol.for("undici.globalDispatcher.1")] = new Agent({
headersTimeout: 30 * 60 * 1000,
});
const start = new Date("04/01/2024");
const end = new Date("06/30/2024");
for (let date = start; date <= end; date.setDate(date.getDate() + 1)) {
const dateString = date.toISOString().split("T")[0].replace(/-/g, "");
const fileName = `nodejs.org-access.log.${dateString}.json`;
const [exists] = await storage.bucket("access-logs-summaries-nodejs").file(fileName).exists();
if (!exists) {
const startTime = Date.now();
console.log(`Fetching for ${dateString}`);
// Make a call to the API to populate the file
const url = `${cloudRunUrl}/date/${dateString}`;
try {
const token = await client.idTokenProvider.fetchIdToken(cloudRunUrl);
const response = await fetch(url, {
method: "POST",
headers: { Authorization: `Bearer ${token}` },
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
console.log(`Successful in ${Date.now() - startTime}ms for ${dateString}`);
} catch (error) {
console.error(`Error in ${Date.now() - startTime}ms for ${dateString}`, error);
}
}
} The script was run in parallel for different dates. This issue can be closed. |
This comment was marked as off-topic.
This comment was marked as off-topic.
thanks @trivikr!!! |
I took a look at https://nodedownloads.nodeland.dev/ and it seems downloads stats for April are off.
Quite a bit of data is missing, as those are the only files available for April:
nodejs.org-access.log.20240406.json
nodejs.org-access.log.20240407.json
nodejs.org-access.log.20240418.json
nodejs.org-access.log.20240419.json
nodejs.org-access.log.20240420.json
nodejs.org-access.log.20240421.json
nodejs.org-access.log.20240423.json
nodejs.org-access.log.20240424.json
nodejs.org-access.log.20240425.json
nodejs.org-access.log.20240426.json
nodejs.org-access.log.20240427.json
nodejs.org-access.log.20240428.json
nodejs.org-access.log.20240429.json
Can somebody with access retrieve the missing data?
The text was updated successfully, but these errors were encountered: