Creating an Import Mapping: Overview

Introduction

The purpose of an import mapping is to define specifically how and where source data will be imported into CollectiveAccess. Data must be imported into CollectiveAccess using an import mapping.

An import mapping is a spreadsheet (XLSX or GoogleSheets) that defines how data is imported into CollectiveAccess. This spreadsheet acts as a crosswalk, detailing where data is coming from outside of CollectiveAccess, and where that same data will go in CollectiveAccess. For a tutorial using the Sample Import Mapping Spreadsheet and Sample Import Data, please see Tutorial: Import Mapping Spreadsheet.

Each import mapping spreadsheet contains a set of intrinsic columns. A comprehensive description and function of these columns will be described below. Each import mapping also contains specific, customized Settings. Tables will be used to list possible options and the functions of each aspect of the spreadsheet in more detail.

Here is a column-by-column explanation of each component of the import mapping spreadsheet. To follow along with this tutorial, download the following two files:

Sample Import Mapping Spreadsheet: The import mapping spreadsheet; a schema crosswalk. For every data source, a target “destination” in CollectiveAccess is defined. This file is in the supported file format of XLXS; therefore, columns and rows are numbered using 1, 2, 3, and so on.

Sample Import Data (Source Data): The sample source data. This sample data includes three records (one row = one record), with 10 examples of possible metadata fields (one column = one field). The sample data is in the supported file format of XLXS; therefore, columns and rows are numbered using 1, 2, 3, and so on.

Settings

Every import mapping requires general settings. Settings include the importer name, data format of the source data (for a comprehensive list of supported file formats, please see Supported File Formats), the selected CollectiveAccess table, and more. This section can be placed at the top or bottom of a mapping spreadsheet. Although the Settings are integrated into the spreadsheet, they do function separately from the main column-defined body of the import mapping.

Note

In Settings, the rule types in Column 1 must be set to “Settings.”

Setting

Description

Parameter notes

Example

name

Give your mapping a name.

Arbitrary text

My Sample mapping

code

Give your mapping an alphanumeric code of the mapping

Arbitrary text, with no special characters or spaces

my_sample_mapping

inputFormats

Sets type of source (input) data that can be handled by this import mapping. Values are format codes defined by the various DataReader plugins.

file type

XLSX

table

Sets the table for the imported data. If you are importing Objects, set the table to ca_objects. If you are importing Collections, set this to ca_collections, and so on.

Corresponds to the CollectiveAccess basic tables

ca_objects

type

Set the Type of record to set all imported records to. If you are importing Objects, what type are they? Photographs, Artifacts, Paintings, etc. This value needs to correspond to an existing value in the the types list. For objects, the list isobject_types. If the import includes a mapping to type_id, that will be privileged and the type setting will be ignored.

CollectiveAccess list item code

image

numInitialRowsToSkip

The number of rows at the top of the data set to skip. Use this setting to skip over column headers in spreadsheets and similar data.

numeric value

1

existingRecordPolicy

Determines how existing records in the CollectiveAccess system are checked for and handled for the mapping. Also determines how records created by the mapping are merged with other instances (idno and/or preferred label) in the data source.

(In CollectiveAccess, the primary ID field is “idno” and the title/name field of each record is “preferred_label”.)

From version 1.8 options to skip, merge or overwrite on internal CollectiveAccess record ids is also supported via the *_on_id options. These options can be useful when re-importing data previously exported from a CollectiveAccess instance.

none
skip_on_idno
merge_on_idno
overwrite_on_idno
skip_on_preferred_labels
merge_on_preferred_labels
overwrite_on_preferred_labels
skip_on_idno_and_preferred_labels
merge_on_idno_and_preferred_labels
overwrite_on_idno_and_preferred_labels
merge_on_idno_with_replace
merge_on_preferred_labels_with_replace
merge_on_idno_and_preferred_labels_with_replace
skip_on_id
merge_on_id
merge_on_id_with_replace
overwrite_on_id

none

ignoreTypeForExistingRecordPolicy

If set record type will be ignored when looking for existing records as specified by the existing records policy.

0 or 1

0

omitPreferredLabelFieldsForExistingRecordPolicy

Comma or semicolon-delimited list of preferred label fields to omit when matching for existing records using preferred labels. Typically used with entity labels to remove specific subfields such as display name from consideration.

Typically used for entity labels to restrict fields used for matching.

displayname;middlename

mergeOnly

If set data will only be merged with existing records using the existing records policy and no new records will be created. Available from version 1.8.

0 or 1

0

dontDoImport

If set then the mapping will be evaluated but no rows actually imported. This can be useful when you want to run a refinery over the rows of a data set but not actually perform the primary import.

0 or 1

0

basePath

For XML data formats, an XPath expression selecting nodes to be treated as individual records. If left blank, each XML document will be treated as a single record.

Must be a valid Xpath expression

/export

locale

Sets the locale used for all imported data. Leave empty or omit to use the system default locale. Otherwise set it to a valid locale code (Ex. en_US, es_MX, fr_CA).

Must be a valid ISO locale code.

en_US

errorPolicy

Determines how errors are handled for the import. Options are to ignore the error, stop the import when an error is encountered and to receive a prompt when the error is encountered. Default is to ignore.

ignore stop

ignore

Columns

Column 1: Rule Types

For each row in the import mapping spreadsheet, a Rule Type must be set. These rules determine how the importer will treat the record, or row. In other words, the rules define how each row will be imported. There are five rules available to choose from:

Rule type

Description

Mapping

Maps a data source (such as a column in an Excel spreadsheet or a specific tag in XML) to a CollectiveAccess metadata element. Set this rule to ensure that the row will be imported.

Skip

Use Skip to ignore a data source; it will not be included in the import when this rule is set.

Constant

Sets an arbitrary constant value. Add the value to the source column and the value will be set in the corresponding metadata element for every record that is imported. This is used when mapping to Containers.

Setting

Sets general preferences for the mapping itself. This simply defines various settings as Settings.

Column 2: Source

As mentioned above, the purpose of an import mapping spreadsheet is to define specifically how and where source data will be imported into CollectiveAccess. The Source column defines where data is coming from outside CollectiveAccess; this is the first part of the crosswalk.

How values go in the Source column depends on the file format of the source data that is being imported. CollectiveAccess supports a variety of file formats, and each format has a unique, corresponding Source column value. A few of these are described below:

File Format

Source Format

Excel (XLSX), other spreadsheets

Column letters must be converted to numbers. Once this is done, Source values will be input as numbers: 1, 2, 3, and so on.

XML

Source values will be input as the verbatim name of the XML tag, preceded with a forward slash, /.

XPath

MARC

FMPXML/RESULT

A full description of the supported import formats and how they may be referenced in an import mapping is available in the Supported File Formats page.

Note

In the example we’re using for this tutorial, the sample data is in Excel. However, you may need to import data that is in an XML format. XML sources are cited in xPath, which is the standard syntax for retrieving data encoded in XML. Documentation regarding xPath be found here.

Source data columns may also be referenced elsewhere in the import mapping (generally in the Options or Refinery columns described below) by prefixing the column number with a caret “^” (for example “^10”), which indicates to the mapping that the value from column 10 should be inserted.

This allows multiple columns to be combined by using the Options settings and is frequently used within the Refineries to create detailed related entities, collections etc.

Column 3: CA table.element

As a crosswalk, the import mapping spreadsheet determines where data comes from outside of CollectiveAccess (source data), but it also determines where that data will go in CollectiveAccess. Similarly to how Column 2 defines the source data, Column 3 determines where that source data goes in CollectiveAccess, using various ca_table.element_codes.

This column declares the bundle code or metadata element in CollectiveAccess that the source data will be mapped to. It is possible to view what metadata elements are available and their formatting directly in CollectiveAccess. To do so, navigate to Manage > My Preferences > Developer > Show Bundle Codes, and select a preference. Navigate back to any record’s page, and these codes will be displayed for each field; these then can go directly into Column 3. To copy a bundle code, simply select it, and paste into the import mapping spreadsheet.

When you are importing to simple free text, DateRange, Numeric, Currency, or other kinds of datatypes, a ca_table.element code is all that is needed.

Note

When creating Lot records in an import mapping, set the ca_table.element_code to ca_objects.lot_id.

However, there are a few cases where some additional steps are involved. For more, see Containers and Using Bundle Codes in an Import Mapping.

Column 4: Group

In many cases, data will map into corresponding metadata elements bundled together in a container. Declaring a Group in Column 4 of an import mapping is a simple way to ensure that all of your mappings to a Container actually end up in the same place. Group names are arbitrary; CollectiveAccess will recognize a group of any name for any number of metadata elements, as long as the name is consistent.

To create a group, assign the arbitrary group name to a line in the Group column. This will direct the mapping to place rows of data into a single container. To make sure both the Date itself and the date type end up in the same instance of the Date container, simply assign them to the same group in the fourth mapping column.

The name you assign the group is arbitrary, but it should be something that is recognizable to you.

Column 5: Options

Options can be used in an import mapping to set a variety of formatting choices and set conditions on the import itself. Options can also help process data that needs a clean-up, or can format data with a variety of templates. Some Options are designed to set parameters on the import mapping behavior, such as preventing the import of certain fields.

Options are written in code. Within that code are specific terms for Options that function to manipulate the behavior of the source data. Current Options for import mappings are listed and described below:

Type of Option

Description

Parameter notes

Example for “Options” column of mapping

skipIfEmpty

If the data value corresponding to this mapping is empty, skip the mapping line.

set to a non-zero value

{“skipIfEmpty”: 1}

delimiter

Delimiter to split repeating values on.

delimiter value

{“delimiter”: “;”}

In the sample mapping, note the delimiter option set on our mapping to ca_objects.subject. Now refer to the second record in our sample data. You’ll notice that there are multiple subject values in the same cell that are separated by semi-colons. By setting the delimiter option in the mapping, you are ensuring that these subject values get parsed and imported to discrete instances of the Subject field. Without the delimiter option, the entire string would end up a single instance of the Subject field.

Column 6: Refinery

A refinery is designed to take a specific data format and transform it via a specific behavior as it is imported into CollectiveAccess. Refineries allow for greater complexity in data representation, and can be used to create separate but related records from the import spreadsheet. For more on Refineries, their definitions, types, and how to use them, see the Refineries page.

If a data import requires related records, then refineries must be used.

While you can get really complex with refinery parameters, at its most basic, a refinery simply creates a record, or matches on an existing record, and creates a relationship between it and the record you are importing directly from the source data.

The objectLotSplitter requires a few extra settings, all of which are cited in our example mapping and detailed in Mapping Object Lot Records in an Import Mapping Spreadsheet.

Lastly, Splitters aren’t the only type of Refinery - they’re just the most common. A complete list of Refineries and Splitters can be seen here.

Column 7: Refinery parameters

Refinery parameters define the conditions for the refinery being used in the import mapping. Where a Refinery declares what data is being manipulated, the refinery parameter dictates how the data will be changed.

Refinery parameters are written in code, and require valid code to function properly in the import mapping.

Type of Refinery parameter

Parameter notes

Example for “Refinery Parameter” column of mapping

relationshipType

Accepts a constant type code for the relationship type or a reference to the location in the data source where the type can be found

{"relationshipType": "^10"} or {"relationshipType": "author"}

entityType

Accepts a constant list item idno from the list entity_types or a reference to the location in the data source where the type can be found

{"entityType": "individual"}

attributes

Sets or maps metadata for the entity record by referencing the metadataElement code and the location in the data source where the data values can be found

{
"attributes": {
   "biography":"^23",
   "address": {
      "address1": "^24",
      "address2": "^25",
      "city": "^26",
      "stateprovince": "^27",
      "postalcode": "^28",
      "country": "^29"
   }
}
}

Columns 8 and 9: Original Values/Replacement Values

An import mapping can find values within source data and replace them with new values upon import. This is a necessary step for data that does not match the list item code for corresponding values in CollectiveAccess. Values for the source data will be input in Column 8, while the values replacing those will be input in Column 9. Multiple values may be added to a single cell in an import mapping, so long as the replacement value matches the original value line by line.

In our sample data, there is a list element called “Reproduction” with values for reproduction, original, and unknown. In our source data, however, you’ll notice that the data input for these values are abbreviated (e.g “orig”, “repro”, and “dontknow”). By using original and replacement values, our mapping transforms “orig” to “original” and “repro” to “reproduction” so that they can match on the list item code for the corresponding values in CollectiveAccess.

Note

Original Values and Replacement Values are ideal for smaller replacements. For large transformation dictionaries, use the Option transformValuesUsingWorksheet

For an example of when to use these columns and how, please see Using Original and Replacement Values in an Import Mapping.

Columns 10 and 11: Source Description and Notes

Source Description and Notes are the final two columns included in an import mapping spreadsheet, and are optional. Used to clarify the source data and purpose of each line in the import mapping itself, these columns can be useful for keeping track of where exactly data in the import mapping is coming from. The Notes column provides a space to explain how and why a certain line is mapped in the manner that it is. Both columns allow for easy reference, and are particularly useful when multiple users are creating an import mapping.

These columns can be useful for future reference, if a mapping is intended to be used repeatedly. These columns also ensure that the mapping matches the source data.