資料品質檢核表
Use this checklist to help review biodiversity datasets. Note it is particularly suited for checking occurrence and sampling event datasets.
The checklist will help ensure that the data is complete, meaning it contains valid answers to the five Ws:
Examples of events include a species observation, a physical specimen being collected, or a biological sampling event.
Additionally, the checklist ensures that the Dataset Metadata also contains answers to the five Ws in order to facilitate reuse of the data.
Instructions
If the dataset has been registered with GBIF, give yourself a running start by reviewing the dataset’s 'Stats' page. Here you can find the set of issues that GBIF discovered while interpreting the dataset:
Next, read the dataset metadata to get a better understanding about the data.
Next, load the data into OpenRefine. This will allow faceted browsing to get the big picture of the data.
There are various ways each of the five Ws can be answered. Each 'check' relates to one or more Darwin Core fields. Therefore try to perform as many checks as possible based on the Darwin Core fields present in the dataset.
Compile a list of all checks that fail and report them back to the data publisher, referring to each check by its 'Check-ID'. This will make providing feedback a less time consuming and verbose process.
Quality checks
What event happened?
What type of event was it?
Check-ID | Fields | Requirements |
---|---|---|
what 1 |
|
The species observation event uniquely identified by |
what 2 |
|
The specimen preservation event uniquely identified by |
what 3 |
|
The physical result of a sampling event uniquely identified by both |
what 4 |
|
The actual sampling event uniquely identified by |
If it was a species occurrence related event - how many species were there?
Check-ID | Fields | Requirements |
---|---|---|
what 5 |
|
The species abundance must be filled in using |
If it was a species occurrence related event - what species was it?
Check-ID | Fields | Requirements |
---|---|---|
what 6 |
|
The full scientific name with authorship and date information if known must be entered in |
what 7 |
|
The identifier for the Taxon assigned to the subject. If the Taxon is defined according to a well known source, it is recommended filling in |
Case 1: Species observation from a camera trap
Field | Value | Constraint |
---|---|---|
|
"HAMAARAG:T0_L_049:6199" |
Must be a GUID or an identifier that is near globally unique. Integer identifiers are not allowed. |
|
"MachineObservation" |
Must match Darwin Core Type Vocabulary |
|
1 |
Must be an integer, 0 or greater |
|
1 |
Must pair with |
|
"individuals" |
Must match GBIF Quantity Type Vocabulary |
|
"present" |
Must match GBIF Occurrence Status Vocabulary |
|
"Canis aureus Linnaeus, 1758" |
Must be the full scientific name, with authorship and date information if known. |
|
"species" |
Must match GBIF Taxon Rank Vocabulary |
|
"Animalia" |
Must be the full scientific name of the kingdom in which the taxon is classified. |
|
"Chordata" |
Must be the full scientific name of the phylum or division in which the taxon is classified. |
|
"Mammalia" |
Must be the full scientific name of the class in which the taxon is classified. |
|
"Carnivora" |
Must be the full scientific name of the order in which the taxon is classified. |
|
"Canidae" |
Must be the full scientific name of the family in which the taxon is classified. |
|
"Canis Linnaeus, 1758" |
Must be the full scientific name of the genus in which the taxon is classified. |
|
Must be a GUID or an identifier related to the source |
|
|
"GBIF Backbone Taxonomy, May 2016" |
Must be reference including date |
|
"http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c" |
Must be a GUID or an identifier for the source |
Who acted in the event?
Check-ID | Fields | Requirements |
---|---|---|
who 1 |
|
The full names of each person acting in the event (e.g. collecting, observing, etc) should be entered in |
who 2 |
|
A name or acronym of the institution acting in the event may be entered in |
who 3 |
|
The full names of each person, group, or organization responsible for assigning the Taxon to the subject should be entered in |
Case 1: Two different people collecting and identifying a specimen
Field | Value | Constraint |
---|---|---|
|
"Ole Karsholt" |
Must be one or more persons' names |
|
"ZMUC" |
Must be an acronym or name of an institution |
|
"ZMUC" |
Must be an acronym or name of an institution |
|
"Jan Pedersen" |
Must be names of one or more persons, groups or organizations |
When did the event take place?
Check-ID | Fields | Requirements |
---|---|---|
when 1 |
|
The date, date-time, date range, or date-time range during which the Event occurred should be entered in |
when 2 |
|
If the original value has to be converted into ISO 8601 |
when 3 |
|
Although it appears redundant, it is recommended trying to fill in |
when 4 |
|
Although it appears redundant, it is recommended trying to fill in |
when 5 |
|
If no |
Case 1: Single date
Field | Value | Constraint |
---|---|---|
|
2007-03-20 |
Must be in ISO 8601 format |
|
2007 |
Must be four-digit year |
|
3 |
Must be between 1-12 |
|
20 |
Must be between 1-31 |
|
79 |
Must be between 1-366 |
|
"Mar 20, 07" |
Original date or date description |
Case 2: Date-time range spanning days
Field | Value |
---|---|
|
2007-03-20T00:00:00Z/2007-03-27T06:00:00Z |
|
00:00:00Z/06:00:00Z |
|
2007 |
|
3 |
|
|
|
79 |
|
86 |
|
"The third week in March 07, for 6 hours starting at midnight." |
Where did the event take place?
Check-ID | Fields | Requirements |
---|---|---|
where 1 |
|
The point location coordinates should be entered in decimal degrees in |
where 2 |
|
To provide a specific shape location enter a well-Known Text (WKT) representation of the shape in |
where 3 |
|
|
where 4 |
|
If the original point location coordinates had to be converted from another coordinate system such as 'degrees minutes seconds' |
where 5 |
|
If actions were taken to make the point location less specific than in its original form or the coordinateUncertaintyInMeters is very high, an explanation should be provided in |
where 6 |
|
If the point location should exist, but has not been entered, an explanation should be provided in |
where 7 |
|
If the point location does not exist, or the point location is calculated from the cent er of a grid cell (versus from GPS reading) an explanation should be provided in |
where 8 |
|
As much supplementary information as possible about the location should also be provided. If no |
Case 1: Point location converted from degrees minutes seconds to decimal degrees
Field | Value | Constraint |
---|---|---|
|
42.4566 |
Must be between -90 and 90, inclusive |
|
-76.45442 |
Must be between -180 and 180, inclusive |
|
"EPSG:4326" |
Ideally an EPSG code or from a controlled vocabulary otherwise "unknown" |
|
500 |
Zero is NOT a valid value |
|
42° 27' 23.76", -76° 27' 15.91" |
|
|
42° 27' 23.76" |
|
|
-76° 27' 15.91" |
|
|
"degrees minutes seconds" |
|
|
"North America" |
Must be preferred English name according to Getty Thesaurus of Geographic Names |
|
"United States" |
Must be preferred English name according to Getty Thesaurus of Geographic Names |
|
"US" |
Must be ISO 3166-1-alpha-2 country code |
|
"New York" |
|
|
"Tomkins County" |
|
|
"Ithaca, Forest Home, CU Rifle Range" |
Must be a specific description of the place |
Case 2: Point location that was generalized
Field | Value |
---|---|
|
42.44 |
|
-76.33 |
|
"EPSG:4326" |
|
5000 |
|
"Point location obscured by a factor of 5000m" |
Why did the event happen?
Check-ID | Fields | Requirements |
---|---|---|
why 1 |
|
The name of the method or sampling protocol used to create the event should be entered in |
Case 1: Because of a butterfly monitoring scheme
Field | Value | Constraint |
---|---|---|
|
"Pollard walks" |
Must be a short name or URL referencing a method or sampling protocol |
|
250 |
Must pair with |
|
"square_metre" |
Must match Unit of Measurement Vocabulary |
|
"Average of 30 Minutes walk along transect" |
Can be a free-text description |
|
"No occurrences of Lepidoptera recorded for entire transect" |
Can be a free-text description |
Dataset Metadata
The dataset metadata should contain enough information to facilitate reuse of the data while preventing misinterpretation. Publishers should also provide evidence of the rigour that went into producing the data while acknowledging its various contributors and funders. Ultimately this may lead to new sources of collaboration and funding.
Field | Requirements | Examples |
---|---|---|
|
is a concise name that describes the contents of the dataset and that distinguishes it from others |
"Reef Life Survey: Global reef fish dataset", "Insects from light trap (1992–2009), rooftop Zoological Museum, Copenhagen" |
|
is a short paragraph (abstract) describing the content of the dataset. |
"This dataset contains records of bony fishes and elasmobranchs collected by Reef Life Survey (RLS) divers along 50 m transects on shallow rocky and coral reefs, worldwide. Abundance information is available for all records found within quantitative survey limits (50 x 5 m swathes during a single swim either side of the transect line, each distinguished as a Block), and out-of-survey records are identified as presence-only (Method 0)." |
|
the organization responsible for publishing (producing, releasing, holding) this resource. |
"Reef Life Survey" |
|
must be one of three machine-readable options (CC0 1.0, CC-BY 4.0 or CC-BY-NC 4.0), which provide a standardized way to define appropriate uses of the dataset. |
"This work is licensed under a Creative Commons Attribution (CC-BY) 4.0 License." |
|
the people and organizations who created the dataset, in priority order. Use of a personnel identifier such as an ORCID or ResearcherID is highly recommended. |
"John Smith, jsmith@gbif.org, http://orcid.org/0000-0002-1825-0097" |
|
the people and organizations who wrote the dataset metadata, in priority order. Use of a personnel identifier such as an ORCID or ResearcherID is highly recommended. |
"John Smith, jsmith@gbif.org, http://orcid.org/0000-0002-1825-0097" |
|
the people and organizations who should be contacted for more information about the resource or to whom putative problems with the dataset should be addressed. Use of a personnel identifier such as an ORCID or ResearcherID is highly recommended. |
"John Smith, jsmith@gbif.org, http://orcid.org/0000-0002-1825-0097" |
|
is a GUID or other identifier that is near globally unique. Note this is required for BID projects. |
"BID-AF2015-0134-REG" |
|
information about the sampling methodology used in creating the dataset, similar to the methods section of a journal article. Note this is required for sampling event datasets. |
See here |
|
how the dataset should be cited. Use of the IPT Citation Format (based on DataCite’s preferred citation format and that satisfies the Joint Declaration of Data Citation Principles) is highly recommended. |
"Edgar G J, Stuart-Smith R D (2014): Reef Life Survey: Global reef fish dataset. v2.0. Reef Life Survey. Dataset/Sampling event. http://doi.org/10.15468/qjgwba" |