Who should set standards for dataset annotation in scientific AI?

Scientific AI depends on trustworthy labeled data, so the question of who should set standards for dataset annotation is fundamentally about accountability, expertise, and public interest. Standards cannot be imposed by a single company or discipline without risking narrow priorities and unaddressed harms. Instead, responsibility should be shared among technical standard-setters, domain experts, professional societies, funders, journals, and the communities represented in data.

Technical and standards bodies

Organizations such as the National Institute of Standards and Technology provide neutral processes and evaluation frameworks that help make annotation practices reproducible and comparable. Patrick Grother of the National Institute of Standards and Technology has led evaluations showing how inconsistent test data undermines reliable system assessment, underscoring the role of formal standards in benchmarking. Formal bodies can publish interoperable formats, provenance requirements, and quality metrics that reduce ambiguity about what an annotated dataset actually represents.

Scientific communities and domain experts

Domain scientists must define the semantics of labels. Medical, ecological, and social science datasets require standards grounded in disciplinary knowledge so labels capture clinically or ecologically meaningful distinctions. The National Science Foundation already conditions funding on data management plans, signaling that funders and journals should enforce annotation transparency as part of reproducibility. Standards should therefore be co-produced with the researchers who understand the phenomena being labeled.

Advocates for dataset documentation bring ethical and social perspectives into technical debates. Timnit Gebru of the Distributed Artificial Intelligence Research Institute has argued for detailed dataset documentation to reveal collection contexts and limitations, which helps prevent misuse and misinterpretation. Community representatives, including groups advocating for Indigenous data sovereignty, must help set rules where cultural, territorial, or historical sensitivities affect labeling choices.

Consequences and governance

When standards are absent or weak, consequences include scientific irreproducibility, amplified biases, and harms to marginalized groups whose data are misrepresented. In environmental science, inconsistent annotation can produce flawed models that misguide conservation decisions and resource allocation. Robust standards increase trust, reduce wasted compute and effort, and enable cross-study synthesis.

A practical governance model combines ISO-style processes, peer review by journals, funding conditionality from national science agencies, and participatory mechanisms that include affected communities and domain experts. This multi-stakeholder approach balances technical rigor, ethical oversight, and territorial and cultural nuance, producing annotation standards that serve both scientific integrity and social responsibility.