-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the validator to accept a "time of validation" argument #1292
Comments
The Jarvus team needs to prioritize getting the validator online and deployed so that agencies can access it, but once that's done we likely have bandwidth to pick this up! 👍 |
Would it be possible to just use the input GTFS dataset's time modification time? |
In a similar vein, I'm now getting errors in 4.1.0 that I wasn't in 4.0.0 via Seems like this shouldn't be a system error? Do I need to open a new issue for this? |
It looks like the date is set just once (in a CurrentDateTime object) in ValidationRunner.java. That makes things easier. The main question I see is whether setting the date would be supported in all three interfaces (web, app, cli). It would be straightforward to add it to the cli, for example, as another argument. Adding it to the other interfaces would require some UI decisions. Is it reasonable to add this only in the cli? |
@bradyhunsaker - we intend for everything in the Web UI to be available in the cli, but parity the other way around isn't necessary. It seems like having a cli argument is a sufficient first step given @atvaccaro was talking from a data pipeline perspective, but would definitely like to hear others answer your questions and see if adding it to the web UI is critical! |
At the moment I just use the CLI exclusively - my main requirement here is to not have the validator crash when running on an "out of date" GTFS package, which the new version currently does. |
@wklumpen Since the validator is crashing, please open a new bug issue with links to the feeds you're using. |
Thanks, Emma. I'll try to create a PR in the next couple of weeks that adds a command-line flag to the cli. (If someone is reading this and is excited to create a PR sooner, go ahead!) |
Apologies for not updating sooner. I did not get to this in the summer as I expected. I plan to make an initial attempt in the next month. |
I just created a PR that allows specifying the date to use in the cli. |
Describe the problem
As part of the Cal-ITP data pipeline, we collect and validate GTFS and GTFS-RT data over time, which includes executing the Schedule and RT validators against hundreds of feeds daily. Sometimes we need to re-process old data for a variety of purposes such as changing metadata or updating validator versions; usually the best way is to just run a full historical re-processing starting from the raw data. While this is fine for most situations, it does mean that sometimes we are validating data much later than it was originally collected, which affects validation checks that rely on the validator execution time. (The same situation can occur simply due to retries or latency in our batch processing data pipeline.)
I believe currently there are two affected checks: feed_expiration_date_7_days and feed_expiration_date_30_days.
Proposed solution
The validator accepts a timestamp argument to use when determining whether a feed has expired, etc. This would allow data pipelines to be more idempotent and less reliant on actual validator execution time.
Alternatives you've considered
No response
Additional context
It looks like we have an open PR to add additional checks that would be affected by this.
Deferring to @themightychris and @nkkl but I think we (Jarvus) can look at helping with this after the MVP web validator is deployed.
The text was updated successfully, but these errors were encountered: