KRT Checker
Validate and analyze Key Resources Tables for scientific compliance
Upload KRT File
Validation Methodology
Overview
The validation process implements a multi-stage methodology for assessing compliance with ASAP (Aligning Science Across Parkinson's) standards for Key Resources Tables. The process consists of automated data structure analysis, column mapping, and content validation to identify deviations from the required format specifications.
File Processing and Data Ingestion
The validation process begins with file format detection and encoding analysis. CSV and Excel (.xlsx) formats are supported, with encoding detection performed in the following order: UTF-8, Latin-1, CP1252, ISO-8859-1. For files containing mixed format structures, header detection analyzes the first two rows to identify the data starting position. Rows containing only null values or whitespace are removed during preprocessing.
Column Mapping
Column recognition uses case-insensitive string matching to map user-defined column names to the six required ASAP columns: RESOURCE TYPE, RESOURCE NAME, SOURCE, IDENTIFIER, NEW/REUSE, ADDITIONAL INFORMATION. The mapping algorithm handles common variations including underscore substitutions, hyphen variations, and abbreviated forms. Columns beyond the required set are identified and reported.
Resource Type Validation
Resource type validation compares entries against the ASAP-approved list of 14 standardized categories using exact case-insensitive matching. Non-conforming entries are flagged, and alternative matches are suggested using fuzzy string matching based on character similarity and keyword analysis. The algorithm accounts for pluralization variants, abbreviated forms, and commonly used alternative terminology.
Required Field Completeness Analysis
Completeness validation is enforced for four mandatory fields: RESOURCE TYPE, RESOURCE NAME, IDENTIFIER, and NEW/REUSE. The analysis identifies multiple forms of missing data including null values, "N/A" entries, whitespace-only cells, and common placeholder strings. For the IDENTIFIER field, when formal identifiers are unavailable, the system accepts the specific string "No identifier exists" to maintain data completeness requirements.
Data Availability Assessment
The system analyzes resource entries to identify new datasets and software/code resources based on RESOURCE TYPE and NEW/REUSE field combinations. When new datasets are absent, a warning is generated with template language for Data Availability Statements. Similarly, when new software/code resources are not present, appropriate template language is provided for Code Availability Statements in accordance with ASAP reproducibility requirements.
Output and Reporting
Validation results are categorized into three types: errors (compliance failures), warnings (recommended improvements), and successes (conforming elements). Error messages include row-specific references and descriptions of the non-compliance issue. The validation process generates a complete log of all processing steps and decisions made during analysis.