Standardized data in art research and digital catalogues raisonnés
The enormous volume of art historical research opens the opportunity for innovative approaches to organizing, sharing, and analyzing data. Standardized data — information organized according to agreed-upon conventions — supports innovative research methodologies and collaboration. This article presents standardized data, explaining its foundational principles, practical applications, and the benefits it delivers to digital art historical research.
Defining standardized data and structured data
It’s helpful to take a quick step back to talk about structured data before exploring standardized data. Structured data is information organized in a consistent, machine-readable format that makes it comprehensible and usable across platforms. It’s the format that organizes data. Standardized data defines the content and terminology within that format. Together, they enable the integration and analysis of information, which is particularly valuable in collaborative and interdisciplinary art historical research.
Controlled vocabularies are excellent examples of standardized data. They provide a consistent way to describe and categorize information by using predefined, authorized terms. For example, using the Getty Art & Architecture Thesaurus (AAT) to describe object types or materials means everyone uses the same terms — like "oil paint" instead of variations like "oil," "oil on canvas," or "oils." This standardization ensures that everyone using the data refers to concepts in the same way, which improves searchability, interoperability, and data integration across systems.
Structured data
Definition: Information organized in a consistent, machine-readable format.
Focus: How data is formatted, categorized, and stored.
Purpose: Enables efficient storage, retrieval, and processing of information.
Example: A database where each entry has fields for “artist,” “title,” and “medium.”
Key benefits: Makes data accessible and machine-readable.
Works with: Databases, spreadsheets, and structured metadata formats like XML or JSON.
Standardized data
Definition: Data that follows established rules and controlled vocabularies to ensure consistency and clarity.
Focus: Ensuring uniform terminology and meaning across different sources.
Purpose: Supports accurate, interoperable, and reliable data across institutions and platforms.
Example: Using the Getty Art & Architecture Thesaurus (ATT) to ensure “oil paint” is always recorded the same way.
Key benefits: Helps researchers and institutions communicate clearly by using consistent terms.
Works with: Controlled vocabularies, authority files, and metadata frameworks like LIDO or CIDOC CRM.
Established data standards in art research
A variety of data standards exist to help art researchers standardize their data. Below is an overview of some of the most prominent standards, each playing a pivotal role in ensuring that data can be reliably shared, understood, and utilized in art historical research.
1. Dublin Core is a simple, widely-used metadata schema for describing resources such as artworks, books, and digital objects, using elements like title, creator, and date.
2. Encoded Archival Description (EAD) is a standard for encoding archival finding aids, enabling structured and detailed descriptions of collections.
3. Exhibition Object Data Exchange Model (EODEM) facilitates the exchange of standardized information about loaned objects for exhibition planning, streamlining communication between institutions.
4. Lightweight Information Describing Objects (LIDO) is a metadata schema designed for cohesive publishing of metadata online, focusing on the full range of descriptive information about museum objects.
Benefits of standardized data in art history
Standardization may sound like a technical chore, but its advantages for art historians are far-reaching.
Enhanced searchability for researchers and readers across institutions
When datasets adhere to consistent standards, they become easier to search, interpret, and share. Art historians working across institutions or disciplines can access and understand data without decoding idiosyncratic formats. For instance, standardized metadata ensures that researchers can readily locate and compare relevant records, regardless of the museum housing them.
Standardized data is a cornerstone of collaborative projects such as the Europeana database, which aggregates cultural heritage information from libraries, museums, and archives across Europe. Researchers can explore relationships between objects housed in different countries because the data adheres to shared standards.
DE-BIAS, for example, is developing an AI-powered tool to analyse more than 4.5 million records in five different European languages currently published on the Europeana website and to automatically detect and flag item descriptions that contain derogatory language. This work would be nearly impossible to achieve if the data wasn’t standardized.
Mapping artworks and trends
Spatial and temporal analysis is a growing field in art history, offering insights into how artistic movements spread and evolved. Standardized data allows researchers to plot artworks on maps or timelines with precision. For instance:
Geographic coordinates in standard formats enable accurate mapping of where artworks were created, moved to, or what they depict.
Uniform date formats help identify chronological trends, such as the diffusion of Impressionism across Europe.
Standardized information on transactions and buyers allows analyzing collecting behaviours (a standard for provenance data is yet to be created, CMOA published a first draft in 2016)
Color data extracted from images can be used to explore how an artist used specific hues throughout their oeuvre
Researchers can uncover patterns that might otherwise remain hidden by combining standardized data with tools like Geographic Information Systems (GIS). A team at Science Magazine examined a rich yet deceptively simple dataset—birth and death dates and places of prominent artists, scholars, and writers—to reveal patterns in Europe’s cultural history. Berlin, Moscow, and London emerge distinctly as “death attractors.” These cities consistently drew cultural figures away from their birthplaces, becoming hubs where they lived, worked, and eventually died. While art historians have long recognized these trends, the study by Schich et al. introduces a usable data set on the phenomenon.
Improved accuracy and consistency
Standardized data minimizes errors arising from inconsistencies or ambiguities. In a database where dates are recorded uniformly, a machine-readable format enables further analysis and processing, such as searching/filtering and sorting entries by date. Similarly, controlled vocabularies for terms like "oil paint" or "tempera" ensure that materials are identified accurately and consistently.
The Getty vocabularies — such as the Art & Architecture Thesaurus (AAT), Getty Thesaurus of Geographic Names (TGN), and Union List of Artist Names (ULAN) — are authoritative, structured resources that provide standardized terminology for the art, cultural heritage, and information management sectors. These standardized terms help people, machines, and institutions communicate on a semantic level. Suppose a catalogue raisonné publishes that a work was done in “oil paint (AAT ID 300015050).” Others know exactly what “oil paint” means with the ATT ID number attached. And while it isn’t too likely that a reader would interpret “oil paint” to mean something too far from the intended term, that’s not the case for all terms. This clarity is critical for systems sharing data.
These standards can be directly integrated into digital catalogues raisonnés. The Wildenstein Plattner Institute’s Claude Monet: The Revised Catalogue Raisonné uses the Getty Thesaurus of Geographic Names for cataloguing creation places, allowing readers to easily search the catalogue raisonné by location. By using the TGN, readers don’t need to worry about missing a work made at Le Havre because a researcher entered its location as “Havre” or “LeHavre,” for example. “Le Havre” is the only option for researchers and readers.
New insights through big data and advanced computational analysis
Standardization opens the potential of big data in art history. By aggregating and analyzing structured datasets with digital tools, researchers can uncover macro-level trends, such as shifts in artistic production over centuries or the global movement of artworks. Digital humanities methods rely heavily on standardized data, including network analysis, machine learning, and text mining.
Imagine a piece of software designed to analyze artist networks during the Renaissance. The model's accuracy diminishes if datasets use inconsistent formats for artist names or workshop locations. With standardized data, algorithms can recognize relationships, such as which artists studied under the same master; text mining tools can identify recurring themes in digitized exhibition catalogs or artist letters.
Preparedness for growth and technological developments
As datasets grow, maintaining coherence becomes increasingly important. Controlled vocabularies ensure that large-scale projects remain manageable and meaningful. Researchers can scale their analyses without being hindered by incompatible formats or terminologies. Digital technologies also evolve rapidly, and data formats that seem cutting-edge today may become obsolete tomorrow. Standardized data helps future-proof research by adhering to widely recognized conventions that are more likely to endure.
Digital catalogues raisonnés developed on standardized data offer art researchers the ability to grow their publications and adapt them as technology changes. This is especially true for projects that begin on one software and end up on another. Standardized data alleviates the challenges of migrating large databases and ensuring no data is lost in the process, and updating established fields of information as new research arises is quick to accomplish since standardized data improves findability.
Overcoming challenges in standardized data
Despite its many benefits, implementing standardized data in art history is not without challenges. Researchers and institutions may encounter obstacles such as:
Limitations: Researchers may find standardized data limiting if the applied vocabularies don’t cover their use cases.
Incomplete vocabularies: Although most established vocabularies are quite robust, they may still be incomplete.
Time and resource constraints: Standardizing legacy datasets requires effort and expertise, which may strain budgets or staff capacity.
Complexity of standards: Navigating and applying the right standards — especially when multiple frameworks exist — can be daunting.
Collaboration and education can be key to overcoming these hurdles. Institutions can provide training on data standards and foster partnerships with digital humanities experts.
NFDI4Culture, a German consortium that is part of the broader National Research Data Infrastructure (NFDI) initiative, exemplifies ongoing efforts to develop, improve, and implement data standards through a collaborative process. It provides guidance and tools for using controlled vocabularies, ontologies, and metadata standards (including LIDO, CIDOC CRM, and Dublin Core). It also supports researchers with best-practice documentation, services for long-term data preservation, and tools for semantic enrichment and linked data — all of which depend on standardized data structures.
The future of standardized data and art history
Standardized data is more than a technical system or tool; it strongly supports creative art historical research. Organizing information according to consistent rules and formats transforms how scholars access, analyze, and share knowledge. From creating digital catalogues raisonnés to conducting interdisciplinary studies, the applications are expansive. Embracing standardized data opens new avenues of inquiry and collaboration.