Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the validator to accept a "time of validation" argument #1292

Closed
atvaccaro opened this issue Nov 14, 2022 · 10 comments · Fixed by #1628 or #1658
Closed

Allow the validator to accept a "time of validation" argument #1292

atvaccaro opened this issue Nov 14, 2022 · 10 comments · Fixed by #1628 or #1658
Labels
enhancement New feature request or improvement on an existing feature status: Work in progress A PR that would close this issue has been opened.

Comments

@atvaccaro
Copy link

atvaccaro commented Nov 14, 2022

Describe the problem

As part of the Cal-ITP data pipeline, we collect and validate GTFS and GTFS-RT data over time, which includes executing the Schedule and RT validators against hundreds of feeds daily. Sometimes we need to re-process old data for a variety of purposes such as changing metadata or updating validator versions; usually the best way is to just run a full historical re-processing starting from the raw data. While this is fine for most situations, it does mean that sometimes we are validating data much later than it was originally collected, which affects validation checks that rely on the validator execution time. (The same situation can occur simply due to retries or latency in our batch processing data pipeline.)

I believe currently there are two affected checks: feed_expiration_date_7_days and feed_expiration_date_30_days.

Proposed solution

The validator accepts a timestamp argument to use when determining whether a feed has expired, etc. This would allow data pipelines to be more idempotent and less reliant on actual validator execution time.

Alternatives you've considered

No response

Additional context

It looks like we have an open PR to add additional checks that would be affected by this.

Deferring to @themightychris and @nkkl but I think we (Jarvus) can look at helping with this after the MVP web validator is deployed.

@atvaccaro atvaccaro added enhancement New feature request or improvement on an existing feature status: Needs triage Applied to all new issues labels Nov 14, 2022
@nkkl
Copy link

nkkl commented Nov 15, 2022

The Jarvus team needs to prioritize getting the validator online and deployed so that agencies can access it, but once that's done we likely have bandwidth to pick this up! 👍

@derhuerst
Copy link

Would it be possible to just use the input GTFS dataset's time modification time?

@wklumpen
Copy link

wklumpen commented Jul 6, 2023

In a similar vein, I'm now getting errors in 4.1.0 that I wasn't in 4.0.0 via ExpiredCalendarValidator due to me wanting to validate an expired feed (for research purposes)

Seems like this shouldn't be a system error? Do I need to open a new issue for this?

@bradyhunsaker
Copy link
Contributor

It looks like the date is set just once (in a CurrentDateTime object) in ValidationRunner.java. That makes things easier.
This could easily be changed to use an optional date value added to ValidationContext.

The main question I see is whether setting the date would be supported in all three interfaces (web, app, cli). It would be straightforward to add it to the cli, for example, as another argument. Adding it to the other interfaces would require some UI decisions.

Is it reasonable to add this only in the cli?
Does the project have a strong desire or policy for feature parity across the interfaces?
Would it satisfy the users asking for the feature to have it only in the cli?

@emmambd emmambd added status: Ready An issue that is ready to be worked on. status: Needs triage Applied to all new issues and removed status: Needs triage Applied to all new issues status: Ready An issue that is ready to be worked on. labels Jul 10, 2023
@emmambd
Copy link
Contributor

emmambd commented Jul 13, 2023

Does the project have a strong desire or policy for feature parity across the interfaces?

@bradyhunsaker - we intend for everything in the Web UI to be available in the cli, but parity the other way around isn't necessary. It seems like having a cli argument is a sufficient first step given @atvaccaro was talking from a data pipeline perspective, but would definitely like to hear others answer your questions and see if adding it to the web UI is critical!
@derhuerst @atvaccaro @lauriemerrell @wklumpen.

@wklumpen
Copy link

At the moment I just use the CLI exclusively - my main requirement here is to not have the validator crash when running on an "out of date" GTFS package, which the new version currently does.

@emmambd
Copy link
Contributor

emmambd commented Jul 13, 2023

@wklumpen Since the validator is crashing, please open a new bug issue with links to the feeds you're using.

@bradyhunsaker
Copy link
Contributor

Thanks, Emma.

I'll try to create a PR in the next couple of weeks that adds a command-line flag to the cli. (If someone is reading this and is excited to create a PR sooner, go ahead!)

@bradyhunsaker
Copy link
Contributor

Apologies for not updating sooner. I did not get to this in the summer as I expected. I plan to make an initial attempt in the next month.

@bradyhunsaker
Copy link
Contributor

I just created a PR that allows specifying the date to use in the cli.

@emmambd emmambd modified the milestones: Next, Now Jan 9, 2024
@emmambd emmambd added status: Work in progress A PR that would close this issue has been opened. and removed status: Needs triage Applied to all new issues labels Jan 9, 2024
@jcpitre jcpitre linked a pull request Jan 24, 2024 that will close this issue
5 tasks
@emmambd emmambd removed this from the Now milestone Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature request or improvement on an existing feature status: Work in progress A PR that would close this issue has been opened.
Projects
None yet
6 participants