Skip to content

feat: Add shapes.txt validator#2123

Open
cswilson252 wants to merge 48 commits into
MobilityData:masterfrom
cswilson252:noShapesNoDrt
Open

feat: Add shapes.txt validator#2123
cswilson252 wants to merge 48 commits into
MobilityData:masterfrom
cswilson252:noShapesNoDrt

Conversation

@cswilson252

@cswilson252 cswilson252 commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

Summary:

Add validator that checks if either a shapes.txt and/or a fixed or zone-based DRT service is present (as per #1792)

DRT functionality added as per this reference

Fixes #1792

Expected behavior:

Validator will print WARNING if shapes.txt and signs of a DRT feature are not found.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues

@cswilson252 cswilson252 marked this pull request as ready for review March 9, 2026 00:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new validator intended to warn when shapes.txt is missing unless the feed appears to use zone-based or fixed-stop DRT (per #1792).

Changes:

  • Introduces MissingShapesFileValidator and a corresponding test class.
  • Adds unit tests covering shapes present + DRT present scenarios (and a no-shapes/no-DRT scenario).
  • Adds an extra import in ShapeUsageValidator.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
main/src/main/java/org/mobilitydata/gtfsvalidator/validator/MissingShapesFileValidator.java New multi-file validator for warning on missing shapes.txt unless DRT indicators are present.
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/MissingShapesFileValidatorTest.java New unit tests for the validator behavior.
main/src/main/java/org/mobilitydata/gtfsvalidator/validator/ShapeUsageValidator.java Adds an import for a nested notice type.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cswilson252

Copy link
Copy Markdown
Contributor Author

i'm a bit stuck as to why the test failure is happening

@cswilson252

cswilson252 commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

yippee! ready for review, thanks again! :)

@cswilson252 cswilson252 requested a review from davidgamez June 20, 2026 02:55
@davidgamez

Copy link
Copy Markdown
Member

📝 Acceptance Test Report

📋 Summary

❌ The rule acceptance test has failed for commit ca15e80
Download the full acceptance test report here (report will disappear after 90 days).

📊 Notices Comparison

New Errors (0 out of 1001 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Errors (0 out of 1001 datasets, ~0%) ✅

No changes were detected due to the code change.

New Warnings (48 out of 1001 datasets, ~5%) ❌

Details of new errors due to code change, which is above the provided threshold of 1%.

Dataset Notice Code
mdb-1009 missing_recommended_file
mdb-1010 missing_recommended_file
mdb-1012 missing_recommended_file
mdb-1013 missing_recommended_file
mdb-1031 missing_recommended_file
mdb-1089 missing_recommended_file
mdb-1090 missing_recommended_file
mdb-1091 missing_recommended_file
mdb-1139 missing_recommended_file
mdb-1150 missing_recommended_file
mdb-1151 missing_recommended_file
mdb-1175 missing_recommended_file
mdb-1176 missing_recommended_file
mdb-1205 missing_recommended_file
mdb-1259 missing_recommended_file
mdb-1264 missing_recommended_file
mdb-1271 missing_recommended_file
mdb-1782 missing_recommended_file
mdb-1783 missing_recommended_file
mdb-1859 missing_recommended_file
mdb-1871 missing_recommended_file
mdb-1970 missing_recommended_file
mdb-1984 missing_recommended_file
mdb-2055 missing_recommended_file
mdb-2077 missing_recommended_file
mdb-2085 missing_recommended_file
mdb-2134 missing_recommended_file
mdb-2151 missing_recommended_file
mdb-2231 missing_recommended_file
mdb-2237 missing_recommended_file
mdb-2597 missing_recommended_file
mdb-2615 missing_recommended_file
mdb-2661 missing_recommended_file
mdb-2668 missing_recommended_file
mdb-2690 missing_recommended_file
mdb-2770 missing_recommended_file
mdb-2772 missing_recommended_file
mdb-2800 missing_recommended_file
mdb-2875 missing_recommended_file
mdb-2898 missing_recommended_file
mdb-2902 missing_recommended_file
mdb-2918 missing_recommended_file
mdb-550 missing_recommended_file
mdb-686 missing_recommended_file
mdb-768 missing_recommended_file
mdb-781 missing_recommended_file
mdb-855 missing_recommended_file
mdb-979 missing_recommended_file
Dropped Warnings (0 out of 1001 datasets, ~0%) ✅

No changes were detected due to the code change.

New Info Notices (0 out of 1001 datasets, ~0%) ✅

No changes were detected due to the code change.

Dropped Info Notices (0 out of 1001 datasets, ~0%) ✅

No changes were detected due to the code change.

🛡️ Corruption Check

2 out of 1003 sources (~0 %) are corrupted.
Dataset Ref Report Exists Ref Report Readable Latest Report Exists Latest Report Readable
mdb-1114
mdb-1123
🔍 System errors for mdb-1114 (reference)
[
  {
    "code": "i_o_error",
    "severity": "ERROR",
    "totalNotices": 1,
    "sampleNotices": [
      {
        "exception": "java.util.zip.ZipException",
        "message": "Archive is not a ZIP archive"
      }
    ]
  }
]
🔍 System errors for mdb-1114 (latest)
[
  {
    "code": "i_o_error",
    "severity": "ERROR",
    "totalNotices": 1,
    "sampleNotices": [
      {
        "exception": "java.util.zip.ZipException",
        "message": "Archive is not a ZIP archive"
      }
    ]
  }
]
🔍 System errors for mdb-1123 (reference)
[
  {
    "code": "i_o_error",
    "severity": "ERROR",
    "totalNotices": 1,
    "sampleNotices": [
      {
        "exception": "java.util.zip.ZipException",
        "message": "Archive is not a ZIP archive"
      }
    ]
  }
]
🔍 System errors for mdb-1123 (latest)
[
  {
    "code": "i_o_error",
    "severity": "ERROR",
    "totalNotices": 1,
    "sampleNotices": [
      {
        "exception": "java.util.zip.ZipException",
        "message": "Archive is not a ZIP archive"
      }
    ]
  }
]

💾 Out of Memory Check

No datasets experienced an OutOfMemoryError.

⏱️ Performance Assessment

📈 Validation Time

Assess the performance in terms of seconds taken for the validation process.

Time Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 5.57 5.92 ⬆️+0.35
Median -- 1.64 1.86 ⬆️+0.22
Standard Deviation -- 21.11 22.35 ⬆️+1.24
Minimum in References Reports mdb-518 0.43 0.52 ⬆️+0.10
Maximum in Reference Reports mdb-2014 559.76 608.69 ⬆️+48.93
Minimum in Latest Reports mdb-2018 0.44 0.45 ⬆️+0.01
Maximum in Latest Reports mdb-2014 559.76 608.69 ⬆️+48.93
📜 Memory Consumption
Metric Dataset ID Reference (s) Latest (s) Difference (s)
Average -- 582.06 MiB 557.69 MiB ⬇️-24.38 MiB
Median -- 323.93 MiB 325.07 MiB ⬆️+1.15 MiB
Standard Deviation -- 1.04 GiB 960.58 MiB ⬇️-102.28 MiB
Minimum in References Reports mdb-107 40.99 MiB 53.45 MiB ⬆️+12.46 MiB
Maximum in Reference Reports mdb-2393 10.39 GiB 10.37 GiB ⬇️-20.97 MiB
Minimum in Latest Reports mdb-1812 411.93 MiB 41.58 MiB ⬇️-370.34 MiB
Maximum in Latest Reports mdb-2393 10.39 GiB 10.37 GiB ⬇️-20.97 MiB

@davidgamez davidgamez left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@emmambd

emmambd commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@skalexch Tagging you here to review the acceptance tests

@davidgamez davidgamez requested a review from skalexch June 22, 2026 14:16

@skalexch skalexch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmambd @davidgamez I did a query to check for the flagged feeds. Only one "seemed" to have Fixed Stops DRT. I double checked it and it does not have trips whatsoever so it's a false flag.

A second query is regarding which feeds from the subset should fail the tests but did not. So for some reason, there are more than 120 missing warnings.

@emmambd

emmambd commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@skalexch mdb-2237 doesn't have fixed stops demand responsive transit on the Mobility Database, so I'm wondering if that's an acceptance test issue rather than an issue with the PR logic?

As for false negatives, after a few spot checks I suspect it's due to the shapes.txt file check. I see several cases where shapes.txt exists but only the column headers are populated and nothing else, as well as a few cases where the file does not exist at all. cc @davidgamez @cswilson252

@skalexch

skalexch commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@emmambd for mdb-2237 the acceptance tests and the MobilityDatabase are both accurate in saying that it does not have any of the 3 features. My query shows it associated with Fixed-Stops DRT so that may be a previous validation before we refined the logic of feature detection, so that's not an issue.

But overall, the logic of the validation needs to be refined to make sure all feeds are detected. @cswilson252 I don't think there is a missing condition in the code, my assumption is that maybe one of the boolean variables is returning a false positive. In the validator code, I have not seen an expression like shapesTable = null. So better just try with Boolean missingShapes = shapeTable.isMissingFile() || shapeTable.isEmpty();
Otherwise, I don't yet see another issue in the code.

@davidgamez

Copy link
Copy Markdown
Member

@emmambd for mdb-2237 the acceptance tests and the MobilityDatabase are both accurate in saying that it does not have any of the 3 features. My query shows it associated with Fixed-Stops DRT so that may be a previous validation before we refined the logic of feature detection, so that's not an issue.

But overall, the logic of the validation needs to be refined to make sure all feeds are detected. @cswilson252 I don't think there is a missing condition in the code, my assumption is that maybe one of the boolean variables is returning a false positive. In the validator code, I have not seen an expression like shapesTable = null. So better just try with Boolean missingShapes = shapeTable.isMissingFile() || shapeTable.isEmpty(); Otherwise, I don't yet see another issue in the code.

I would keep shapeTable == null in the expression to avoid null pointer exceptions at runtime, and add shapeTable.isEmpty() as you suggested.

@cswilson252

Copy link
Copy Markdown
Contributor Author

hi all, I will add shapeTable.isEmpty() to the check and see if that fixes the flaky false negatives

@davidgamez davidgamez requested a review from skalexch June 25, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add shapes.txt as recommended file

5 participants