Part 4 – Metadata and Databasing

This section defines the metadata fields used to describe spectral reflectance measurements of herbarium specimens. This standardization supports interoperability, cross-project data aggregation, and future integration into biodiversity informatics platforms.

Projects can download the IHerbSpec Metadata Spreadsheet (available at https://iherbspec.github.io/protocol) as a foundation for organizing their metadata. This spreadsheet includes all required and optional but recommended fields defined the tables in Section 4.1. The controlled vocabularies necessary for enumerating certain fields are provided in Section 4.2. Finally, Section 4.3 provides general guidelines for data and metadata storage and dissemination.

While metadata are expected to be disseminated in a flat file format, such as the IHerbSpec Metadata Spreadsheet, fields are presented here in logical groups—project, specimen, and tissue—to support conceptual understanding and integration with local databases.


4.1 Metadata Tables

Table 4.1: Session Metadata

Session metadata are those metadata usually associated globally with a continuous digitization project. These metadata can usually be captured once per project and automatically populated for each measurement instance.

Metadata Field Filename Code (Table 3.2) Status Field Description Data Type
projectId PI Required Unique identifier for the spectral measurement project. Example: HUHERYspec1. TEXT
sessionId SN Required Unique identifier for a measurement session, generated as YYYYMMDDHHMM. Example: 20240617132251. TEXT
instrumentModel - Required Spectroradiometer model name. Example: SVC HR-1024i. TEXT
opticalSetupDescription - Required Description of optical probe setup. Example: LC-RP contact probe with leaf clip removed. TEXT
measurementSettings - Required Instrument settings for measurements. Example: 2 seconds, high light setting. TEXT
whiteReferenceDescription - Required Material of the white reference. Example: Spectralon SL Standard 99%. TEXT
operator - Optional Name(s) of person(s) conducting measurements. TEXT
lightSourceType - Optional Light source for optical setup. Example: tungsten halogen. TEXT
distanceTargetToSensor - Optional Distance (mm) between target tissue and sensor face. Example: 12. NUMERIC
lensFieldOfView - Optional Angle (degrees) of sensor field of view. Example: 22.5. NUMERIC
angleLightToSensor - Optional Angle (degrees) of light source to sensor. Example: 10. NUMERIC
measurementAreaDiameter - Optional Diameter (mm) of illuminated tissue area. Example: 6. NUMERIC

 1: Fig. 4.1. Schematic of Session Metadata fields related to optical setup (see Table 4.1). distanceTargetToSensor, fieldOfView, angleLightToSensor, measurementAreaDiameter

Table 4.2: Specimen Metadata

Specimen metadata include identifiers and information about the physical specimen, with priority given to required fields needed to link spectral measurements to existing digital records (e.g., specimenId). Optional fields related to taxonomic determination and specimen storage environment are included to support integrative research, quality control, and downstream analysis.

Users should avoid duplicating metadata that are already digitized, maintained, and available in herbarium or institutional platforms, as these sources are better suited for future updates. Instead, users are encouraged to reference those records and supplement only missing required or recommended fields. Due to variation in the metadata recorded on institutional platforms, users should apply caution in the interpretation of presence or absence of determination information or any other recommended but optional fields.

Metadata Field Filename Code (Table 3.2) Status Field Description Data Type
herbariumCode HC Required Acronym for herbarium or collection (Index Herbariorum code). Examples: GH, P, US-Botany. TEXT
specimenId SI Required Identifier for specimen or record (catalog no., barcode, GUID, or collector+number). Examples: 00238762, Thorne24070a. TEXT
scientificName - Optional Full scientific name at lowest confident rank. Examples: Quercus bicolor, Erythroxylum coca ipadu. TEXT
identificationQualifier - Optional Uncertainty in taxonomic ID (Darwin Core). Examples: cf., aff. TEXT
identifiedBy - Optional - Person/group who made the identification. Example: T. Plowman. TEXT
dateIdentified - Optional Date of identification. Examples: 1999, 2004-12-30. TEXT
isTempControlled - Optional Whether storage has active temperature control. BOOLEAN
annualTempMin - Optional Minimum annual storage temperature (°C). Example: 18. TEXT
annualTempMax - Optional Maximum annual storage temperature (°C). Example: 26. TEXT
isHumidityControlled - Optional Whether storage has active humidity control. BOOLEAN
annualHumidityMin - Optional Minimum annual storage relative humidity (%). Example: 20. TEXT
annualHumidityMax - Optional Maximum annual storage relative humidity (%). Example: 60. TEXT

Table 4.3: Tissue Metadata

Fields describing the type, condition, and position of the tissue measured. Includes required and recommended metadata for linking spectral measurements to individual tissue units. Note that timestamps for tissue measurement files are often captured within the file.

Metadata Field Filename Code (Table 3.2) Status Field Description Data Type
backgroundClass BG Required EEnumerated abbreviated code from Background Class Codes Table 4.7 describing the type of background used behind target tissue. Both abbreviated codes and descriptive codes are accepted. Examples: BGW, BGB, BGP. ENUM Table 4.7
hasLowReflectanceBackground - Required True or False statement that the background (black or paper) has low reflectance as defined as less than 4% reflectance across the spectral range of the instrument. For a paper background, this would be scored false. BOOLEAN
backgroundDescription - Cond. Req. Description of the black or other background material, including manufacturer and product information when available. Not required for paper backgrounds. Required field when tissue has black or other type (not paper) of background. Example: Musou IR Flock TEXT
targetClass TC Required Free text or enumerated code from Target Class Codes (Table 4.5) describing type of tissue or background being measured. Both abbreviated codes and full codes can be used. Examples: AD, perigynium. TEXT or ENUM (Table 4.5)
targetTissueId TN Optional Character index tracking the measured tissue units when multiple tissues are measured from a single specimen (e.g., loose1, 1, 2). For compound or more complex structures, projects are encouraged to develop their own consistent naming convention. Examples: loose1, leaflet1, petal1. TEXT
tissueDevelopmentalStage - Required Tissue developmental stage as coded in Developmental Stage Class Codes Table 4.4. Examples: Mature, Uncertain. ENUM Table 4.4
hasBackgroundInMeasurement - Required True or false values indicating that the target tissue does not cover the full measurement area and the background is part of the measurement. BOOLEAN
percentBackgroundInMeasurement - Optional Numeric estimate for the percentage of the measurement area that is not covered by the target tissue and is background material (black background or herbarium paper). It is recommended to describe the estimation method in the comment field. Example: 25. INTEGER
hasGlue - Required True/False/Uncertain: glue present in measurement area. ENUM (true/false/uncertain)
hasNonGlueContamination - Required True, false, or uncertain values indicating a contaminant other than glue is present in the measurement area. This includes foreign biotic or abiotic agents on the target tissue, such as fungus or preservatives. ENUM (true/false/uncertain)
measurementFlags - Optional SStandardized categorical descriptors of the condition of the tissue within the measurement area. Values are selected from the predefined Tissue Descriptor Codes Table 4.6. Multiple descriptors should be separated with a pipe character (|). Example: GoodPreservation|PathogenPresent. ENUM (Table 4.6)
tissueNotes - Optional Free-text field used to record additional observations on the condition of the specimen that may aid interpretation of spectral data. It can be used to clarify or elaborate on descriptors already included in measurementFlags, such as the conditions evidencing the quality of preservation (e.g., a MediumPreservation flag could be explained with the note, ‘measurement area discolored and wrinkled’). Notes should specify whether the information applies to the measurement area, the tissue unit, or the specimen as a whole. Examples: mold in measurement area, formaldehyde preserved. TEXT
tissueLocation - Optional The location of the target tissue on the herbarium sheet. For mounted tissues, record as an X,Y coordinate in centimeters, with 0,0 at the top-left corner of the sheet (e.g., 17,29; see Fig. 4.2). If the sheet has non-square angles, align it flush with the left-side ruler. For unmounted tissues, provide a descriptive note indicating location. Examples: 17,29, envelope TCAD_TN1. TEXT (coordinates preferred)
comment - Optional Free-text field for recording any additional notes relevant to the measurement, including observations about the instrument, session, specimen, tissue, or data quality that are not captured elsewhere in the metadata. Example: Amazing specimen. TEXT
measurementIndex IDX Required The measurement number index appended to the base filename (Part 3, Table 3.2: IDX) to properly associate each row of metadata with its single, corresponding measurement file. Example: 0001. TEXT

 2: Fig. 4.2: Diagram of coordinate system for scoring tissueLocation in Tissue Metadata. Herbarium sheet is placed on top of the benchtop black background with centimeter rulers at top and left sides for ‘x,y’ notation of measurement area (white dashed circles) in centimeters with 0,0 at the top left. Black background cards are placed under unglued portions of leaves. From left to right, tissue TCAB_TN3 has the location 10,26, tissue TCAD_TN2 has location 17,16 and tissue TCAD_TN1 is stored with a label in a glassine envelope inside the packet with location envelope TCAD_TN1. For reference, specimen NEBC_00651639 metadata fields are proposed in Appendix II. Specimen courtesy of the New England Botanical Society.

4.2 Controlled Vocabularies

Table 4.4: Developmental Stage Class Codes

Available codes for enumerating the required tissueDevelopmentalStage metadata Table 4.3. Codes follow ‘CamelCase’ format with capitalized initial letters.

Code Description
Young Actively developing tissue that is not yet fully expanded; may appear thinner, lighter in color, or more pliable than mature tissue.
Mature Fully developed and expanded tissue showing typical structural and color characteristics for the taxon; not visibly senescent.
Old Senescent tissue showing visible signs of aging or decline, such as yellowing, darkening, curling, or drying.
Uncertain The development stage has been assessed but cannot be confidently determined due to intermediate features, damage, or insufficient visual cues.
NotScored Developmental stage was not assessed or recorded for this tissue.

Table 4.5: Target Class Codes.

Available codes for enumerating the required targetClass metadata Table 4.3. Either the abbreviated code or the full CamelCase-formatted code (with an initial capital letter) may be used.

Code Full code Description
W WhiteReference White reference.
WC WhiteCalibratedReference White calibrated reflectance standard (see Section 5.3).
B BlackBackground Black background material, recorded when used as background for other target tissue measurements.
BC BlackCalibratedReference Black calibrated reflectance standard (see Section 5.3).
P Paper Herbarium sheet paper, recorded when used as background for other target tissue measurements.
AB LeafAbaxial Abaxial leaf surface.
AD LeafAdaxial Adaxial leaf surface.
LF Leaf Leaf surface. Applied when the abaxial and adaxial side cannot be differentiated or when leaves are terete or otherwise not bifacial (e.g. Senecio rowleyanus).
PT Petal Petal.
IF Inflorescence Inflorescence.
BR Bract Bract.
FR Fruit Fruit. The specific tissue (e.g., exocarp, mesocarp) can be described in tissueNotes.
PSS Photosynthetic-SucculentStem Photosynthetic stem as in succulents like Cactus.
OB OuterBark Outer bark, rhytidome. Woody branch outer bark as in Hadlich et al. (2018).
IB InnerBark Phloem. Woody branch inner bark as in Hadlich et al. (2018).
HS HerbaceousStem Herbaceous stem; can be photosynthetic as in PSS.
WD Wood Wood.

Table 4.6: Tissue Descriptor Codes

Code Description
GoodPreservation Tissue in the measurement area appears well preserved in color, structure, and texture (including original features from disease or herbivory) and shows minimal signs of degradation or breakage from pressing, drying, or storage. Tissues that are simply discolored may still be considered well preserved if other aspects of integrity are maintained.
MediumPreservation Tissue in the measurement area shows moderate degradation, such as partial discoloration, wilting, or deformation. Some structural loss may be present, though not severe. Evidence of degradation from other parts of the specimen (e.g., mold elsewhere on the tissue) may support assigning this level of preservation, but such observations should be recorded in tissueNotes if not present within the measurement area. This is expected to be the most common preservation condition for herbarium specimens used in spectral measurement.
PoorPreservation Tissue in the measurement area shows clear signs of degradation, including severe discoloration, wrinkling, deformation, or breakage. Mold, insect damage, or other signs of poor preservation may also be present. As with other flags, evidence of degradation outside the measurement area (e.g., mold elsewhere on the tissue) may support the assigned flag but should be recorded in tissueNotes if not directly observed in the measurement area. Note that natural discoloration tendencies of certain taxa should be considered when applying this flag (see Appendix II).
MidveinPresent Target measurement area contains midvein or similarly prominent secondary venation.
OrganismPresent Indicates that a visible organism (e.g., bryophyte, lichen, fungal structure) is present on or within the measurement area. This includes epiphyllous, endophytic, or other leaf-associated organisms, regardless of their ecological role (e.g., mutualistic, parasitic, or incidental). This flag serves as a general indicator and can be used in conjunction with more specific organism flags below.
BryophytePresent Indicates that a visible bryophyte (e.g., moss, liverwort, or hornwort) is present on or within the measurement area.
LichenPresent Indicates that a visible lichen thallus or fragment is present on or within the measurement area.
FungusPresent Indicates that fungal structures are visible in the measurement area (e.g., hyphae, mycelium, fruiting body) and are presumed to be pre-mortem associates, such as endophytes or pathogens active while the plant was alive. This flag can overlap with PathogenPresent or MoldPresent.
PathogenPresent Target measurement area contains necrotic tissue or other signs of pathogenic infection. Can be used with FungusPresent when fungal pathogens are suspected or known. This flag is based on visible tissue symptoms, not molecular confirmation.
MoldPresent Indicates that the measurement area shows signs of post-mortem fungal growth (e.g., surface mold, fuzz, bloom), likely resulting from poor drying or storage conditions. In practice, mold may be difficult to distinguish from other fungal growth without microscopic or culture analysis. Use judgment and note uncertainty in tissueNotes if needed.
HerbivoryPresent Target measurement area contains herbivory.
AlcoholPresent Target measurement area was preserved with ethanol or other alcohol.
PreservativePresent Target measurement area contains chemical preservative contamination excluding alcohol preservatives (e.g., diatoms, formaldehyde).
BurnPresent Target measurement area contains burned tissue.
DebrisPresent Target measurement area contains non-specific material not described in other codes (e.g., dust, soil particles, insect parts, fibers, etc.) that may interfere with clean measurements. Can be elaborated in the tissueNotes field.

Table 4.7: Background and White Reference Class Codes

Available codes for enumerating the required tissueBackgroundClass metadata Table 4.3. Either the abbreviated code or the full CamelCase with initial capital letter jformatted code may be used.

Code Full code Description
W WhiteReference White reference.
B BlackBackground Black background (<4% reflectance).
P PaperBackground Herbarium sheet paper.
O OtherBackground Other background material (must be described in metadata).

4.3 Guidelines for Data Archiving and Sharing

  • For each specimen, a copy of all unprocessed spectral files and metadata should be archived in the digital repository of the herbarium or institution that owns the specimen, or otherwise in accordance with their data storage practices. This ensures that the data remain co-located with the physical specimen and integrated with institutional capabilities for managing specimen metadata.

  • In addition, projects should upload unprocessed, original format data files and metadata to a persistent, open-access repository that issues DOIs and supports versioning, such as Dryad, Zenodo, or Harvard Dataverse. These platforms are preferred over tools like EcoSIS, which have shown reduced reliability and uncertain long-term support.

  • To facilitate reuse, users may also choose to also share tabular data files (e.g., .csv) with samples in rows and columns for metadata fields and spectral band values. If processed spectra are included (e.g., interpolated, splice/jump-corrected, or continuum- removed), the applied processing steps should be clearly documented.

  • See example of data and metadata sharing here: https://doi.org/10.7910/DVN/LXPHBC


CC BY 4.0 — IHerbSpec Protocol. DOI: 10.5281/zenodo.15849668