Issues found in the input files are categorized under “counters” in report.json
and summary_report.html
. Counters aggregate the issues, and provide a
high-level overview of what went wrong.
Counters with “Suffix Description” have an additional suffix that details the
subject of the error. For example, Resolution_UnresolvedExternalId_
will be
suffixed with the ID property (like isoCode
) that could not be resolved. The
“Suffix Description” field describes the nature of the suffix for these counters.
These counters are logged from processing the corresponding types of files.
A CSV row had different number of columns from the rest of the file.
Malformed CSV value for dcid property; must be a text or reference.
A reference in the form of dcid:<entity>
was detected, but <entity>
was empty.
Suggested User Actions:
report.json
to find a line that looks like:
<property>: dcid:
Column referred to in TMCF is missing from CSV header.
Column references are TMCF values that look like C:<table>-<CSVColumnName>
,
please ensure that “CSVColumnName
” exists in the input CSV.
Found a row in input CSV with fewer columns than expected.
The number of columns to expect is the number of columns that exist in the header.
Suggested User Actions:
Unable to parse header from CSV file.
Suggested User Actions:
There was a fatal sanity error in TMCF.
Suggested User Actions:
Complex value was not enclosed in brackets []
in the MCF.
Suggested User Actions:
Complex value had less than 2 or more than 3 parts.
Suggested User Actions:
In a complex value with 2 parts, the part that was expected to be a number was not a number.
Invalid latitude part in complex value; latitude must be decimal degrees with an optional N/S suffix.
Invalid longitude part in complex value; longitude must be decimal degrees with an optional E/W suffix.
An unexpected part was found in the complex value in MCF, error message will specify the type of issue.
Suggested User Actions:
report.json
.MCF line was missing a colon to separate the property and the value.
Suggested User Actions:
<property>: <value>
.Value of Node
prop either included a comma or started with a quote.
Suggested User Actions:
A regular <property>: <value>
line was found without a preceding Node
line to associate with.
Either;
Node
property was surrounded by quotes (must be non-quoted), orNode
property included a comma (must be a unary value).Found malformed complex value without a closing bracket (]
).
Found an internal l:
reference in resolved entity value.
When processing the first (Node: <value>
) line of a node in TMCF, the value did
not have the required E:
prefix to be an entity name.
TMCF had a malformed entity/column; the value must have a ->
delimeter that was missing.
A TMCF property referencing a CSV column was found. This is not supported yet.
In TMCF, value of DCID was an E:
entity. However, this must instead be a C:
column or a constant value.
Expected value to be a TMCF column that starts with a C:
value, but it was not.
These counters are logged when there are errors assigning DCIDs to each node in the graph.
External ID reference could not be resolved.
Suffix Description: Property for which the ID could not be resolved.
Suggested User Actions:
External IDs resolved to different DCIDs, however, they must all map to the same DCID.
Suffix Description: The properties that were found, separated by an underscore _
.
For example, a counter named Resolution_DivergingDcidsForExternalIds_isoCode_wikidataId
means that the isoCode
and wikidataId
properties were both external IDs, but
they resolved to different DCIDs (which is not permitted).
Suggested User Actions:
Unable to replace a local reference.
This is likely a cycle of local references, which the import tool is not able to resolve.
Suggested User Actions:
Unable to assign DCID due to an unresolved local reference.
Suggested User Actions:
The node could not be assigned a DCID based on the data available.
The tool can generate DCID for;
Population
),Observation
and is not StatVarObservation
);or if there is an external ID resolver provided.
Suffix Description: The typeOf value of the node (first value, if multiple).
Suggested User Actions:
The reference was resolved, but it was to a failed node, therefore, this node is also marked as a failure.
Suffix Description: The property this reference was found in.
Suggested User Actions:
These counters log issues raised from sanity checks of nodes against a simple set of assumptions expected of DC nodes.
Found different values provided for the same StatVarObservation
.
Suggested User Actions:
The same curated DCID was found for different StatVars.
Suggested User Actions:
Found different curated IDs for same StatVar.
Suggested User Actions:
An node was references using an entity (E:
) reference in TMCF, but this node was not found in the processed graph.
Expected value to be a TMCF column that starts with C:
, but did not find such a value.
Column referred to in TMCF is missing from CSV header.
Suggested User Actions:
C:
match the names of the columns in the header line in your CSV.Found an unknown statType value.
StatTypes values either:
value
, estimate
, stderror
, samplesize
, growthrate
}, orpercentile
, ormarginoferror
, measurementResult
}.Found a non-ISO8601 compliant date value.
Suggested User Actions:
Found an StatVarObservation
node with a value that was not a number.
StatVarObservation
node is missing the required value
property.
An empty property (property with no text) was found.
Suggested User Actions:
:
).Found property name that does not start with a lower-case. All property names must start with a lower-case letter.
The value of the dcid
property had more than one value.
Value of the dcid
property was an E:
reference in TMCF, which is invalid.
Found a DCID that was too long. In the current configuration, the maximum allowed length of DCID is 256 characters.
Found non-ASCII characters in a value which was not a text.
A text value is a value surrounded by quotes.
Found text/numeric value in a property where the value is expected to be a reference.
DCID reference included invalid characters.
Suffix Description: The property whose value included invalid chars.
A property was found that was not expected for the type of the Node.
Suffix Description: The type of the node.
Found empty value for a property.
Schema node has property values with non-ascii characters.
The name and the DCID of Schema nodes must match, but this node did not satisfy this requirement.
Found a missing or empty property value.
Suffix Description: The required property that was missing from this node.
Found multiple values for single-value property.
Suffix Description: The property with the multiple values.
Found a class reference that does not start with an upper-case.
Suffix Description: The property, and optionally, the type of the node separated with an underscore (_
) from the property.
Found a property reference that does not start with a lower-case.
Suffix Description: The property, and optionally, the type of the node separated with an underscore (_
) from the property.
Found a SVObs whose value
was not a number. If you are importing a dataset
where this is expected (for example, statType
is measurementResult
and
therefore the SVObs values are references), set --allow-non-numeric-obs-values=true
in the command line invocation.
Existence counters are logged for issues relating to the existence check of references against Data Commons.
Network request to DataCommons API failed.
External reference existence check with the DataCommons API returned no results.
External triple existence check with the DataCommons API returned no results.
Could not find the statType
of a StatVar, but --check-measurement-result
was
set to true
and we need the statType
s of the StatVars to determine if we should
perform existence checks for SVOs measuring this SV.
These counters represent potential issues found in the statistical analysis of the input data for pitfalls such as extreme outliers, holes in dates that the data is available for, etc.
Two different types (for example, numbers and strings) were found for the observations of the same StatVar.
Two different values were found for the observations of the same StatVar.
A datapoint with a value farther than 3 standard deviations (sigma) to the mean of the series was found.
These two stat counters look at adjacent datapoints in each timeseries, and reports a log if any two adjacent values are more than 100% (or 500%) different.
Note that only the largest difference in each bucket will be logged.
This counter will be logged if the date could not be parsed as an ISO8601 string.
Please check that your dates are formatted according to the ISO 8601 standard.
This stats check logs a counter when the timeseries have datapoints with varying date lenghts. For example, if 9 points in a timeseries are monthly (in the form yyyy-MM
), but another point is a day (yyyy-MM-dd
), this counter will be logged.
The problematic datapoints that will be logged in report.json
are those with the less common date length.
This stats check considers the gaps between adjacent datapoint dates. If any two adjacent datapoints have a different gap than the rest of the dataset, this flag is raised.
Currently, the tool only checks for inconsistent gaps in the unit of months.
Mutation is a step of MCF processing where e.g. complex values are expanded.
MCF node missing required typeOf property.
Observation value must be either a number or text.