Best Practices in Publishing Sampling-event data - planned additions and notes for revision
Version 2.0
Additional information that could also be included or was previously included in 發布調查活動資料的最佳實務導引.
什麼是調查活動(sampling-event)資料?
調查活動資料
TODO: Provide recommendations on how to work around limitations of DwC-A star schema such as not being able to relate measurements and facts to both events and occurrences in the same dataset. The current work around requires publishers to publish separate datasets. Note OBIS is prototyping an Extended Measurement or Facts Extension that could also help overcome this limitation. Discussion on this prototype extension is taking place in GitHub here. However, issues raised that this prototype extension does not explicitly make it clear if the measurement or fact relates to an occurrence or an event. One alternative is to add resourceID (and perhaps resourceType?) instead of adding eventID (and occurrenceID) as attribute to the measurement or fact extension as is explored by the OBIS extension.
Sample size
TODO: Provide recommendations on how to represent the sampling area by choosing the appropriate WKT shape or simple latitude/longitude point location. Done correctly, the direction sampling was carried out can also be derived. For example, an ocean trawl line represented using a WKT shape LINESTRING allows the direction of the trawl to be determined based on the standard notation for writing the start and end points.
How to uniquely identify sampling events
TODO: Better guide users on how to fill in dwc:eventID and dwc:parentEventID using persistent globally unique identifiers:
-
dwc:eventID should be a persistent globally unique identifier. Remember to reuse existing stable identifiers. Do not create a new identifier for the event when one already is declared.
-
In the absence of a GUID, and as a last resort, reuse the original fieldNumber.
How to capture hierarchy of events
TODO: Better guide users how to publish a hierarchy of events (recursive data type) with the proper use of dwc:parentEventID
Publishing sampling-event data
Using GUIDs for identifiers
TODO: Advise publishers to use GUIDs, coupled with guidance on how to create GUIDs for applicable fields such as dwc:occurrenceID, dwc:eventID, dwc:organismID and dwc:locationID. For example, it is possible to use http://www.geonames.org/ to find (or even generate new) identifiers for dwc:locationID, e.g. http://sws.geonames.org/10793757/ is a GUID for a lake in Greenland.
Filling in the required and recommended terms
TODO: Guide users how to obfuscate the location of sensitive species, such as by: - Simply removing these species from the dataset - Publishing the species identifications at Genus level only - Publishing the sensitive/protected species in a separate dataset - Publish obfuscated sensitive data points in the main dataset and publish non-obfuscated details in an access-limited separate dataset, both datasets including all data records
Preserving verbatim data
TODO: Guide users how to enter verbatim descriptions. For example, the ID or code given to the original event should be entered into dwc:fieldNumber; the ID or code given to the original occurrence observation should be entered into dwc:recordNumber.
Publishing project data as a single dataset
TODO: Provide a recommendation on how to publish data produced from large projects. The current recommendation is to publish a single dataset, because dividing it into multiple datasets results in more duplication of effort entering metadata. Publishers insisting on publishing multiple datasets should link them using Project.ID in EML.
Republishing occurrence data as sampling-event data
TODO: Provide rationale and guidance for migrating existing occurrence datasets to sampling event format. The following questions need to be answered: - Should the sampling event version replace the existing occurrence version, or should both versions be kept online at the same time? - If replacing, should the new sampling event version be assigned a brand new DOI? - What are the benefits of producing the sampling event version?
Modelling continuous monitoring of live individuals
TODO: Provide a recommendation on how to model continuous monitoring of live individuals, such as bird tracking data by using dwc:organismID to store the ID of the individual being tracked and by using a single event for representing each individual being tracked (with associated occurrences where it was recorded).
Managing issues related to the dataset
TODO: Provide a recommendation on how to manage issues related to the dataset using GitHub’s issue management system, just like INBO does for example.
Sharing scripts and programs used to produce or clean the dataset
TODO: Provide a recommendation on how to make custom scripts and programs (e.g. for transforming cross table data) publicly available using GitHub, for the benefit of other publishers, just like INBO does for example. The recommendation should encourage users to include a detailed set of instructions on how to run the scripts to make them more usable.
Describing sampling-event data in dataset metadata
TODO: Advise publishers to document as much as possible about the sampling event, especially the sampling methodologies, before attempting to try and standardize it into DwC.
Linking related datasets
TODO: Advise publishers on how to link related datasets that come out of the same research context so they can be easily retrieved by the users. Publishers may have to publish separate datasets in order to work around the limitations of the DwC-A star schema. Publishers may also choose to publish separate occurrence datasets derived from the same sampling events. The current recommendation is to link them using Project.ID.
範例
Macrophyte survey
TODO: Update example based on Dutch Vegetation Database (LVD) version republished as sampling-event dataset. The Relevé extension underwent significant changes following the publication of the primer. For more information about LVD and the data model for vegetation sampling-event data see: https://gbif.blogspot.com/2016/07/probably-turbovegs-best-kept-secret.html