Arches Data

Physical Data model

Note

The physical data model section is most useful for those who wish to issue SQL queries directly against the Arches back end PostgreSQL database.

The Arches physical data model consists of tables and referential integrity constraints implemented within a PostgreSQL database. It is driven by a requirement that Arches be able to support data management for most any cultural heritage application in the world without the need to modify the underlying table structure.

To accomplish this, Arches stores both metadata - defining the set of resource types and attributes available to store - and business data - inventorying and describing the cultural heritage resources themselves.

The physical data model is best understood when broken into three distinct parts: Ontology Data, Reference Data, and Resources Data. The entity_types table plays a role in all three of these parts of the data model.

../_images/Arches_ERD.jpg

Ontology Data

The ontology portion of the data model stores the metadata sourced from the resource graphs. These tables therefore contain information about what is the valid set of semantic “types” of data (by default, Arches uses CIDOC CRM [http://www.cidoc-crm.org/] to define these things), and what are the valid types of relationships between them.

Reference Data

This is where Arches stores the hierarchical controlled vocabularies that power drop down lists in Arches applications. Items in drop-down lists are called “concepts”. Metadata about concepts, such as their labels and scope notes are stored in the related values table, and relationships between concepts are managed in the relations table.

Resources Data

The Resources Data section is where Arches stores the actual cultural heritage resource information - the business data. The key concepts to understand in this portion of the database are entities, relations, and “business tables”.

Entities within an Arches database represent one of three things:

  1. the existence of a cultural heritage resource
  2. an attribute of a cultural heritage resource, or
  3. a record required to maintain ontological consistency between attributes

Relations represent relationships between entities within an Arches database. Records in this table indicate a relationship between two distinct entities.

Business tables contain strings, numbers, dates, or geometries that capture actual attributes of resources. These work by linking entities (by entityid) to respective values.

One special business table is the domains table. It is unique in that it only stores valueid (as a foreign key from the values table within the Reference Data) in order to relate entities that are constrained by a controlled vocabulary to its appropriate Reference Data records.

Entity Types

The entity_types table really belongs in all three categories because:

  • they classify entities into an “attribute” that an implementor defines in the course of creating resource graphs.
  • they specify the ontological category that (CRM classification) that an entity falls into
  • they identify which attributes must be powered by drop-down lists from the reference data manager.

A very simple example is this: a resource happens to be a person with a name will, at a minimum, contain two entities and one relation. That is, an entity that represents the existence of the resource, another entity that contains the person’s name, and a relation that links the two together.

Hint

Each of these three sections within Arches database are contained in a separate Postgres schema. Each schema, in turn, has within in it two predefined views called vw_nodes and vw_edges. The Arches application does not use these views, but they could be useful in developing graph-based outputs of resource instances, resource graphs, and RDM graphs.

Resource Graphs

Resource graphs are the logical framework that is used by Arches to define the set of resources and attributes of resources to be managed in an Arches Application.

Arches applications must provide a resource graph for each resource type being inventoried.

One example of a fully featured resource graph from the Arches HIP application is HERITAGE_RESOURCE.E18.

../_images/HERITAGE_RESOURCE_graph.png

The graph’s data is populated into Arches from two specially formatted CSV files named {resource type name}_nodes.csv and {resourcetype name}_edges.csv.

Note

As of the writing of this document, the Arches developers’ primary tool to develop and visualize the resource graphs is Gephi (http://www.gephi.org/). However, it is possible that a future version of Arches may include tools that support resource graph development and visualization.

Regardless of how the nodes and edges CSV files are created, they must include the columns defined below.

Nodes:

  • Id: a unique and arbitrary ID for each record within the file
  • Label: the name of the entity type that is stored in a given node. This name must be concatenated with a dot (”.”) and the CRM class associated to that entity type
  • mergenode: defines the upstream node that occurs one time (and only one time) within a given resource instance. In most cases, that node is the one that represents the resource itself.
  • businesstable: identifies the respective data type (strings, numbers, dates, geometries, domains) that values associated to instances of the node will be stored as. This value should be left null for nodes that never store a business data.

Note

There is an excellent thread on the Arches forum discussing the purpose and implementation of mergenode here: https://groups.google.com/forum/?hl=en#!topic/archesproject/xRt1FPQXrLg

Edges:

  • Source: the ID of the source node
  • Target: the ID of the target node
  • Label: the semantic CRM “property” that allows them to connect to each other.

Ontologies

In addition to defining resources and their attributes, resource graphs also support associating those resources and attributes to semantic classes and properties defined in CIDOC CRM. Details on how this is done are provided in the section above on Resource Graphs.

Note

One should refer to CIDOC CRM (http://www.cidoc-crm.org/) to make an informed decision about what class or property to assign to nodes and edges respectively.

For those who do not care about CRM, ontologies or semantic interoperability, the simplest way to make a valid graph is to assign all nodes with class “E1” and all edges with property “P1”.

Reference Data Graph

The reference data graph stores information necessary to use thesauri as input to pick lists (sometimes called “controlled vocabularies”) within resource data entry forms. The functional requirements for reference data in the cultural heritage field are quite robust, and best defined in SKOS documentation (http://www.w3.org/2004/02/skos/). For the purpose of this document, we will outline a few of the most important and visible requirements:

  • Ability to associate sets of reference data entries as valid selections in drop-down lists for “domain” nodes in the resource graph
  • Ability to associate arbitrary attributes (such as preferred labels, alternate labels, scope notes, sort order) to entries within the reference data
  • Ability to store words associated to given reference data entries in multiple languages (to support internationalization)
  • Ability to relate entries to each other in semantically rich hierarchies

Like the resource data graph, reference data is also stored in a graph structure. Reference data includes concepts and relations.

Concepts are the core component of the reference data graph. It indicates an idea that must be wrapped in metadata (such as a primary label) to communicate meaning. Concepts, in turn, are related to each other using relations.

The “types” of available relations are categorized as semantic relations and collections. The meaning of a given concept is not only derived from its own individual description (or scope note), but also from the context from its position within a hierarchy of concepts. Semantic relations are the connective tissue of hierarchical relationships that infuse meaning into concepts. Collections, on the other hand, are used for the purpose of defining which concepts belong in a given Entity Types’ drop-down list. Collections group concepts independently of their semantic hierarchy. In short, semantic relationships infuse meaning and collections define the content of drop-down lists.

Loading Reference Data

Arches supports two formats for migrating concepts into the Reference Data graph. The first approach, using CSV files (also called Authority Files), is designed to support loading of legacy reference data to Arches. As such, source data are formatted and then uploaded as part of the installation of a specific Arches Application.

The second approach uses SKOS files. With SKOS files, the assumption is that an Arches application is already up and running, and that a person will use commands in the RDM’s user interface to pull in new thesauri from SKOS-formatted files.

from .csv

The following steps outline how to create your own CSV-formatted authority documents for use in a system implementation. They can be followed using Excel as your text editor but the final file must be saved as a UTF-8 formatted CSV file.

Data is required for all columns in the authority file with the exception of AltLabels.

  1. Create a text file and paste the following text to the first line. These are the column headings for each required field of data:

    conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
    
  2. On the second line we will begin creating our concept. Start by creating a conceptid that will be unique both within this file and across all authority document files to be loaded. A good pattern to follow is to use the name of the authority document itself (with ‘_’ instead of spaces and minus the AUTHORITY_DOCUMENT.csv) with the number of the particular concept within the authority document appended. As an example, if your authority document is name ‘EXAMPLE_AUTHORITY_DOCUMENT.csv’ the first concept within the authority document would have the conceptid ‘EXAMPLE_1’:

    //EXAMPLE_AUTHORITY_DOCUMENT.csv
    
    conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
    EXAMPLE_1,
    EXAMPLE_2,
    
  3. Next, add your preferred label for the concept. In most cases this will be the value you have in mind or from your data that you would like to use for the domain list this authority document will be related to:

    //EXAMPLE_AUTHORITY_DOCUMENT.csv
    
    conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
    EXAMPLE_1,Example concept label,
    EXAMPLE_2,Second example concept,
    
  4. Now add an alternate label for your concept. This may be left blank if you have no alternate label for your concept:

    //EXAMPLE_AUTHORITY_DOCUMENT.csv
    
    conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
    EXAMPLE_1,Example concept label,Alternate concept label,
    EXAMPLE_2,Second example concept,Second alternate concept label,
    
  5. Then add a Parent ConceptID. If the concept is not hierarchically related to any other concept in the authority document then the Parent ConceptID will be the full name of the authority document itself. If the concept is hierarchically related to another concept in the authority file, ensure the parent concept is defined above the child concept and input the parent concepts’ conceptid here. In our example I have added a second concept that references the first concept as its parent:

    //EXAMPLE_AUTHORITY_DOCUMENT.csv
    
    conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
    EXAMPLE_1,Example concept label,Alternate concept label,EXAMPLE_AUTHORITY_DOCUMENT.csv,
    EXAMPLE_2,Second example concept,Second alternate concept label,EXAMPLE_1,
    
  6. Next, add a Concept Type. The only two valid values for ConceptType are: Index, and Collector. For more information on what the difference between these two concept types refer to the the general Arches documentation (http://arches3.readthedocs.org/en/latest/arches-data/#loading-reference-data):

    //EXAMPLE_AUTHORITY_DOCUMENT.csv
    
    conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
    EXAMPLE_1,Example concept label,Alternate concept label,EXAMPLE_AUTHORITY_DOCUMENT.csv,Collector,
    EXAMPLE_2,Second example concept,Second alternate concept label,EXAMPLE_1,Index,
    
  7. The Provider identifies where a given concept came from:

    //EXAMPLE_AUTHORITY_DOCUMENT.csv
    
    conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
    EXAMPLE_1,Example concept label,Alternate concept label,EXAMPLE_AUTHORITY_DOCUMENT.csv,Collector,Organization_1
    EXAMPLE_2,Second example concept,Second alternate concept label,EXAMPLE_1,Index,Organization_2
    
  8. Save your document.

  9. IMPORTANT: Finally, we have to make sure our authority file maps to a ‘domain node’ in a resource graph. To do this either create or open the existing ENTITY_TYPE_X_ADOC.csv file that should reside in the same directory as your authority files (this file must be named ENTITY_TYPE_X_ADOC.csv). If you are modifying an existing ENTITY_TYPE_X_ADOC.csv file skip to step b below.

    1. Copy and past the following text into the file on the first line:

      entitytype,authoritydoc,authoritydocconceptschemename
      
    2. Add a comma separated entry with the following information: the entity type to which you would like your authority documented associated (exactly how it is spelled in the resource graph including the .E## suffix), the name of the exact name of the authority document, and the concept scheme into which you would like to load this authority document:

      //ENTITY_TYPE_X_ADOC.csv
      
      entitytype,authoritydoc,authoritydocconceptschemename
      EXAMPLE_NODE.E55,EXAMPLE_AUTHORITY_DOCUMENT.csv,Example Concept Scheme Name
      
  10. You can optionally define a seperate but related [EXAMPLE_AUTHORITY_DOCUMENT].values.csv file to add additional information about concepts in your initial concepts file. This file uses a key-value pair approach to associating additional attributes to a given concept. The structure of the values file is as follows (the conceptid is the linking key to the initial concept within the initial concept file):

    //EXAMPLE_AUTHORITY_DOCUMENT.values.csv
    
    conceptid,Value,ValueType,Provider
    EXAMPLE_1,This an example scope note,scopeNote,GCI
    EXAMPLE_1,2[filepath to an image],image,GCI
    

Note

Arches comes loaded with a set of ValueTypes necessary to maintain compliance with SKOS and Dublin Core standards. The set of preloaded value types are nicely documented in the DML file located in the code repository here: arches\db\dml\db_data.sql. However, an implementor has the freedom to add additional ValueType’s by simply adding them within *.values.csv file. To be clear, these additional ValueTypes will not fall into the fold of SKOS standards, but they could be useful for your individual implementation.

One example where this is particularly relevant is the use sortorder as a ValueType. By default, Arches orders drop-down lists alphabetically. Sometimes, as in the cases of time periods, users prefer that their drop-down lists are presented in specific order. In the case of time periods, users prefer it be based on some semblance of chronological order.

When the sortorder ValueType is specified, corresponding numeric values can be used to ensure that the concept labels appear in the drop-down in a specific order.

Properly structured Authority files can be loaded into Arches using the following command from inside the application root directory:

python manage.py packages -o load_concept_scheme --source '{PATH TO AUTHORITY FILES}'

Loading Business Data

Resources and their attributes can be loaded to Arches resource graphs using one of two formats: shapefiles or a specially formatted text file with a .arches extension.

Shapefiles have the significant limitation that they cannot define explicit relationships between resources and they can only hold one attribute value per entity type.

.arches files, on the other hand, take a key-value pair approach populating resource graphs with resources with attributes. Therefore, they are able to add as many of a given attribute as exists in the data. The downside of the .arches format is that it can take a significant data processing or reformatting effort to get a significant amount of source data ready for loading to Arches from this format.

From .arches

The .arches format is intended to support upload of Arches data containing rich content and complex relationships while also being achievable to format properly using common software like MS Excel or OpenOffice. The .arches format actually requires two files: one for loading resources (with a .arches extension) and another for loading relationships between resources (with a .relations extension).

The file is really just a list of business entities to be loaded to Arches. “Business entities” are those entities that actually store a business value. In this way, the .arches format is obfuscated from the details of the resource graph that it is being loaded to. The only real requirement is that the entity types referenced in the ATTRIBUTENAME field exist within the resource graph that the data are being loaded into.

The format takes a key-value pair approach to storage, where the ATTRIBUTENAME defines the key (the entity type as defined in the resource graph) and the ATTRIBUTEVALUE defines the business value of the entity.

The .arches file is a pipe (|) delimited text file containing the following column headers:

  • RESOURCEID is a user-generated unique ID for each individual resource. Since any given resource will likely have many attributes, it is expected that a given RESOURCEID value will repeat on many lines. As a point of reference, Arches will create a separate unique ID for any resource that is loaded and will save the RESOURCEID provided within the .arches file as an external reference of type “Legacy ID”.
  • RESOURCETYPE specifies which resource type graph a given attribute is being loaded to. In the Arches HIP Application, the available resource types include: ACTIVITY.E7, ACTOR.E39, HERITAGE_RESOURCE.E18, HERITAGE_RESOURCE_GROUP.E27, HISTORICAL_EVENT.E5, and INFORMATION_RESOURCE.E73.
  • ATTRIBUTENAME specifies the node from the appropriate resource graph that the supplied value (in the ATTRIBUTEVALUE column) will load to. Note that ATTRIBUTENAME is synonymous with entitytypeid. Essentially, you are specifying the enitytypeid of the (business) entity that you are loading to the system.
  • ATTRIBUTEVALUE stores the the actual business value of the entity. ATTRIBUTEVALUE values must conform to the data type specified by the business table associated to the entity type referenced by the ATTRIBUTENAME. See the PostgreSQL documentation for appropriate formatting for strings, numbers, and dates. See the notes below for formatting details on Geometries and Domains.

Note

Geometry: Any value that is to be loaded as a geometry must be formatted using Well Known Text (or “WKT” for short) with coordinates set to ESPG 4326 or latitude/longitude WGS84 (http://spatialreference.org/ref/epsg/4326/). WKT is a standard format for storing vector geometry as human-readable text. Details of the standard can be found here: https://portal.opengeospatial.org/files/?artifact_id=54797, and Wikipedia has a much more readable aggregation of the relevant information here: http://en.wikipedia.org/wiki/Well-known_text.

Note

Domains: Another special case here is values associated to ATTRIBUTENAME``(s) that link to the ``domains businesstable. In those cases, the value stored in ATTRIBUTEVALUE must be the conceptid fed in from CSV-formatted “authority files.” More on authority files is available in the “Loading Reference Data” section.

GROUPID is intended to support cases where business values stored in separate nodes within the resource graph must be associated to each other. To illustrate the need, the classic example is an ACTOR resource graph that contains both FIRST_NAME.E1 and LAST_NAME.E1 as separate nodes. In that case, an ACTOR that has two separate names would need to know which first name and which last name go together. (An example might be Mark Twain and Samuel Clemens.) To group these appropriately, the two rows containing “Mark” and “Twain” should share a common GROUPID value and the two rows containing “Samuel” and Clemens” should have their own GROUPID value.

Hint

The code that loads data from .arches files assumes that the records within the file are sorted first on RESOURCEID and then on GROUPID. If the rows are sorted incorrectly, it could cause the load to fail, or (worse) load the data into the database incorrectly. Luckily, the data loading code has built into it a number of validations which search for conditions that could cause problems. Those validations are outlined in the next section.

The .relations file is a pipe (|) delimited text file containing the following column headers:

  • RESOURCEID_FROM indicates one of the two resources to be related. The value here must also be present as a RESOURCEID in the .arches file.
  • RESOURCEID_TO indicates the other of the two resources to be related. The value here must also be present as a RESOURCEID in the .arches file.
  • START_DATE is a non-required field intended to indicate the date at which a given relationship between resources began.
  • END_DATE is a non-required field intended to indicate the date at which a given relationship between resources ended.
  • RELATION_TYPE is a domain value driven by an authority document file called ARCHES RESOURCE CROSS-REFERENCE RELATIONSHIP TYPES.E32.csv. Any value that you put into this column must be a conceptid from that authority document. If you wish to customize the set of available relationship types, simply modify the authority document.
  • NOTES is a non-required field into which additional text about the relationship can be captured.

For example, you may want to relate a house to a person (both of which can be resources) using RELATION_TYPE “lived in” and specify the date at which the person moved into the house as the START_DATE, and the date the person moved out as the END_DATE.

.arches file validations

Before any data are loaded from the .arches format into the Arches database, our data loading code runs a series of validations to ensure adherence to the rules outlines above. Below is a quick enumeration of the validity checks that are executed:

  1. syntax of the .arches and .relation files (including proper headings and column delimiters)
  2. verifies that the ATTRIBUTENAME listed in the .arches files exists in the resource graph for the specified resource type
  3. checks that the ATTRIBUTEVALUE is valid for the businesstablename of its related entity type
  4. checks that rows are ordered by RESOURCEID then GROUPID
  5. checks that all RESOURCEID values listed in the .relations file exist in the .arches file

From Shapefile

Shapefile upload is one of the nice, new features of Arches 3. Though a little configuration is needed to load your shapefiles into Arches, the process is very simple. There are just a couple of limitations to loading data using shapefiles:

  • Your shapefile can contain only one resource type per shapefile. For example, you would need one shapefile to load Heritage Resources and a second shapefile for Activities.
  • Your shapefile projection must be defined as WGS84 Lat/Lon (EPSG:4326).
  • Currently the shapefile loader does not load the .relations file. If you need to add relationships between resources, you should use the .arches format with a .relations file rather than a shapefile, or create relationships between resources using the web application after loading your shapefile.

Loading from shapefile requires the creation of a configuration file to tell Arches how the fields in your shapefile correspond to nodes in a resource graph. The configuration file should have the same base name as your shapefile with the extension .config. For example, if your shapefile is called heritage_resources.shp, you will will need a heritage_resources.config file along with your heritage_resources.dbf, etc.

The .config file is formatted in JSON as you can see in this example:

{
        “RESOURCE_TYPE”: “EXAMPLE_RESOURCE.E11″,
        “GEOM_TYPE”: “GEOMETRY.E1″,
        “FIELD_MAP”: [
                [“place”,”PLACE.E1″],
                [“location”,”DESCRIPTION_OF_LOCATION.E1″],
                [“name”,”NAME.E1″],
                [“period”,”PERIOD.E1″],
                [“descrip”,”RESOURCE_DESCRIPTION.E1″],
                [“heri_type”,”HERITAGE_RESOURCE_TYPE.E1″]
        ]
}

Notice that it contains the following properties:

  • RESOURCE_TYPE is the resource type of your shapefile.
  • GEOM_TYPE is simply the name of the entity type that Arches HIP uses to manage geometry.
  • FIELD_MAP is where you list your shapefile fields and their corresponding entity types. Each field must be enclosed in brackets, with the shapefile column name in the first position and the entitytypeid in the second.

..note:: The curly braces at the beginning and end of the file are important! Also, remember that the projection of your data must be EPSG:4326 (WGS84 Lat/Lon).

The data type of your shapefile columns need to match the data type of the entity types to which they are mapped. For example, if you want to load a name column in your shapefile to NAME.E41 (which the HIP defines as a string data type), you must ensure that your shapefile treats name as a string.

If you want to load a field in your shapefile to an Arches controlled vocabulary, you’ll need to make sure that the values in the shapefile match a preferred label in the Arches Reference Data Manager (RDM). For example, if you want to load values from your shapefile into the HERITAGE_RESOURCE_TYPE.E55 node, the values in your shapefile column must match a preferred label in the RDM scheme (by default Arches uses the HERITAGE_RESOURCE_TYPE_AUTHORITY_DOCUMENT.csv scheme)

Now you can run the load_resources command. The console commands for loading a shapefile are the same as they are for loading a .arches file (see above). Once your authority files are loaded and your virtual environment is activated just navigate to the directory with your project’s manage.py file and run:

$ python manage.py packages -o load_resources -s /path/to/my/mydata.shp

An Example Data Load

Below is a very simple resource graph, set of authority documents, and .arches file. The intent is to quickly illustrate the construction of a logical model and a simple set of source data that can be loaded into the model.

Resource Graph

../_images/example_resource_graph2.jpg

Note

that in the examples below, the character string of “[null]” is used as a substitute for no value.

EXAMPLE_RESOURCE.E1_edges.csv:

Id,Label,mergenode,businesstable
69,EXAMPLE_RESOURCE.E1,EXAMPLE_RESOURCE.E1,[null]
66,PERIOD.E1,EXAMPLE_RESOURCE.E1,strings
8,DESCRIPTION_OF_LOCATION.E1,PLACE.E1,strings
4,PHASE_TYPE_ASSIGNMENT.E1,EXAMPLE_RESOURCE.E1,[null]
42,HERITAGE_RESOURCE_TYPE.E1,EXAMPLE_RESOURCE.E1,domains
172,RESOURCE_DESCRIPTION.E1,EXAMPLE_RESOURCE.E1,strings
16,NAME_TYPE.E1,EXAMPLE_RESOURCE.E1,domains
71,PLACE.E1,EXAMPLE_RESOURCE.E1,[null]
5,GEOMETRY.E1,PLACE.E1,geometries
17,NAME.E1,EXAMPLE_RESOURCE.E1,strings

EXAMPLE_RESOURCE.E1_nodes.csv:

Source,Target,Id,Label
69,4,1,P1
69,17,2,P1
69,71,3,P1
69,172,4,P1
4,42,5,P1
4,66,6,P1
71,5,7,P1
71,8,8,P1
17,16,9,P1

Reference Data

NAME_TYPE.E1.csv:

conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
NAME_TYPE_1,Primary Name,,NAME_TYPE.E1.csv,Index,Organization_1
NAME_TYPE_2,Alias Name,,NAME_TYPE.E1.csv,Index,Organization_1

HERITAGE_RESOURCE_TYPE.E1.csv:

conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider
HERITAGE_RESOURCE_TYPE_1,Architecture,[null],HERITAGE_RESOURCE_TYPE.E1.csv,Collector,Organization_1
HERITAGE_RESOURCE_TYPE_2,Spanish Colonial,[null],HERITAGE_RESOURCE_TYPE_1,index,Organization_1
HERITAGE_RESOURCE_TYPE_3,Postmodern,[null],HERITAGE_RESOURCE_TYPE_1,index,Organization_1
HERITAGE_RESOURCE_TYPE_4,Archealogy,[null],HERITAGE_RESOURCE_TYPE.E1.csv,Collector,Organization_1
HERITAGE_RESOURCE_TYPE_5,Pottery Sherd,[null],HERITAGE_RESOURCE_TYPE_4,index,Organization_1
HERITAGE_RESOURCE_TYPE_6,Arrowhead,[null],HERITAGE_RESOURCE_TYPE_4,index,Organization_1

Business Data

resource.arches:

RESOURCEID|RESOURCETYPE|ATTRIBUTENAME|ATTRIBUTEVALUE|GROUPID
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|DESCRIPTION.E1|long description of the resource|DESCRIPTION.E1-0
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|NAME.E1|The Taj Mahal|NAME.E41-0
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|NAME_TYPE.E1|NAME_TYPE_1|NAME.E41-0
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|GEOMETRY.E1|POINT(0 0)|GEOMETRY.E1-0
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|DESCRIPTION_OF_LOCATION.E1|In the Atlantic Ocean off the coast of Africa|DESCRIPTION_OF_LOCATION.E1
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|HERITAGE_RESOURCE_TYPE.E1|HERITAGE_RESOURCE_TYPE_3|PHASE_TYPE_ASSIGNMENT.E1-0
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|PERIOD.E1|1950 to 1980|PHASE_TYPE_ASSIGNMENT.E1-0
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|HERITAGE_RESOURCE_TYPE.E1|HERITAGE_RESOURCE_TYPE_5|PHASE_TYPE_ASSIGNMENT.E1-1
EXAMPLE RESOURCE_1|EXAMPLE_RESOURCE.E1|PERIOD.E1|Prehistoric|PHASE_TYPE_ASSIGNMENT.E1-1
EXAMPLE RESOURCE_2|EXAMPLE_RESOURCE.E1|NAME.E1|The Little Taj Mahal|NAME.E41-0
EXAMPLE RESOURCE_2|EXAMPLE_RESOURCE.E1|NAME_TYPE.E1|NAME_TYPE_1|NAME.E41-0
EXAMPLE RESOURCE_2|EXAMPLE_RESOURCE.E1|GEOMETRY.E1|POINT(1 1)|GEOMETRY.E1-0

Note

EXAMPLE RESOURCE_1 has two distinct “groups” of phase-type assignments. That means that the resource was different things at different times.

Note

In the case of NAME_TYPE.E1 and HERITAGE_RESOURCE_TYPE.E1, the ATTRIBUTEVALUE is actually the ID of the concept as defined in the authority file.

resource.relations:

RESOURCEID_FROM|RESOURCEID_TO|START_DATE|END_DATE|RELATION_TYPE|NOTES
EXAMPLE RESOURCE_1|EXAMPLE RESOURCE_2|[null]|[null]|RELATIONSHIP_TYPE:1|[null]

Before you can load data, you must have ElasticSearch running so that your data is indexed during the load. To start ElasticSearch open a console and run:

$ path\to\my_hip_app\my_hip_app\elasticsearch\elasticsearch-1.4.1\bin\elasticsearch

To load your resource graphs, make sure that your virtual environment is activated. If it’s not, in a new terminal run: On Linux and Unix systems:

$ source path\to\ENV\bin\activate

On Windows:

> path\to\ENV\Scripts\activate

With your virtual environment activated, navigate to the root directory your project and load your authority files with the following command where -s is a directory containing your authority files:

$ python manage.py packages -o load_concept_scheme -s /path/to/authority_files/directory

Now you can load your .arches file with the following command:

$ python manage.py packages -o load_resources -s /path/to/my/mydata.arches

Note

When loading with a .arches file, relationships defined in your .relations file will be automatically loaded if they are in the same directory as your .arches file.

Exporting Data

By default, Arches will export the id, primary name and resource type to KML, shapefile and CSV formats. However, if you need to export more information for a resource, or customize the column names of your export file, an export mapping file is required.

Resource export mappings

The resource export mappings is a JSON file that maps resources in your search results, to the schema of an export format; KML, CSV or SHP. If you need to create this file, its path needs to be assigned to the EXPORT_CONFIGS variable in your applications settings.py file. For example:

EXPORT_CONFIG = os.path.normpath(os.path.join(PACKAGE_ROOT, 'source_data', 'business_data', 'resource_export_mappings.json'))

The top level members of the JSON object are the format extensions (e.g. “csv”, “shp”, “kml”). Each format extension has the following properties:

  • NAME: base name of the export file
  • SCHEMA: array of objects defining each field to be added to your exported file. Each object should have the following properties:
    • field_name: name of the exported field (required)
    • source: The location of the data in the search results. If the value is “field_map” the application will lookup the value using the field map for the resource.
    • alternatename : every name for the resource that is not primary will be concatenated into a list of alternate names. A value of “resource_name” will lookup the resource name from the RESOURCE_TYPE_CONFIGS in the application’s settings.py file.
    • data_type: “str”, “datetype”, or “float” (required only for shapefiles)
    • data_length: an integer in quotes (e.g. “128” required only for shapefiles)
  • RESOURCE_TYPES: an object containing an object for each resource type that will be exported. The key for each resource type object is the resource’s entitytypeid. The value for each type contains the following properties:
    • FIELD_MAP: a list of objects each containing information needed to map child entity data to an export file field. Properties for the field map include:
      • field_name: The field_name value - match a field in the SCHEMA (required)
      • entitytypeid: The entitytypeid of the child entity (required)
      • value_type: A conceptid of the value used to define the value’s type. For example, an address may have different types such as primary or postal. If you want to export only the primary address for this column, you can add the concept type for ‘primary address’ here. Concept ids can be found in your application’s Reference Data Manager (RDM). (optional)
      • alternate_entitytypeid: Alternate entity type to use if no value is available for the entitytypeid (optional)

Example Resource Export Mappings

{
    "csv": {
        "NAME": "HistoricPlacesLA_Search_Results_Export",
        "SCHEMA": [
            {"field_name": "PRIMARY NAME","source": "primaryname"},
            {"field_name": "OTHER NAMES","source": "alternatename"},
            {"field_name": "ARCHES ID","source": "entityid"},
            {"field_name": "ARCHES RESOURCE TYPE","source": "resource_name"},
            {"field_name": "TYPE","source": "field_map"},
            {"field_name": "PRIMARY ADDRESS/LOCATION DESCRIPTION","source": "field_map"},
            {"field_name": "DESIGNATIONS","source": "field_map"}
        ],
        "RESOURCE_TYPES": {
            "ACTOR.E39": {
                "FIELD_MAP": [
                    {
                        "field_name": "TYPE",
                        "entitytypeid": "ACTOR_TYPE.E55"
                    },
                    {
                        "field_name": "PRIMARY ADDRESS/LOCATION DESCRIPTION",
                        "entitytypeid": "PLACE_ADDRESS.E45",
                        "value_type": "e4f5bd2f-56b7-4b8d-ac48-7e6d90e530ae",
                        "alternate_entitytypeid": "DESCRIPTION_OF_LOCATION.E62"
                    }
                ]
            },
            "HERITAGE_RESOURCE.E18": {
                "FIELD_MAP": [
                    {
                        "field_name": "TYPE",
                        "entitytypeid": "HERITAGE_RESOURCE_TYPE.E55"
                    },
                    {
                        "field_name": "PRIMARY ADDRESS/LOCATION DESCRIPTION",
                        "entitytypeid": "PLACE_ADDRESS.E45",
                        "value_type": "e4f5bd2f-56b7-4b8d-ac48-7e6d90e530ae",
                        "alternate_entitytypeid": "DESCRIPTION_OF_LOCATION.E62"
                    },
                    {
                        "field_name": "DESIGNATIONS",
                        "entitytypeid": "TYPE_OF_DESIGNATION_OR_PROTECTION.E55"
                    }
                ]
            }
        }
    },
    "kml": {
        "NAME": "HistoricPlacesLA_Search_Results_Export",
        "SCHEMA": [
            {"field_name": "primary_name","source": "primaryname"},
            {"field_name": "other_names","source": "alternatename"},
            {"field_name": "arches_id","source": "entityid"},
            {"field_name": "arches_resource_type","source": "resource_name"},
            {"field_name": "geometry","source": "geometries"},
            {"field_name": "type","source": "field_map"},
            {"field_name": "primary_address_or_description","source": "field_map"},
            {"field_name": "designations","source": "field_map"}
        ],
        "RESOURCE_TYPES": {
            "ACTOR.E39": {
                "FIELD_MAP": [
                    {
                        "field_name": "type",
                        "entitytypeid": "ACTOR_TYPE.E55"
                    },
                    {
                        "field_name": "primary_address_or_description",
                        "entitytypeid": "PLACE_ADDRESS.E45",
                        "value_type": "e4f5bd2f-56b7-4b8d-ac48-7e6d90e530ae",
                        "alternate_entitytypeid": "DESCRIPTION_OF_LOCATION.E62"
                    }
                ]
            },
            "HERITAGE_RESOURCE.E18": {
                "FIELD_MAP": [
                    {
                        "field_name": "type",
                        "entitytypeid": "HERITAGE_RESOURCE_TYPE.E55"
                    },
                    {
                        "field_name": "primary_address_or_description",
                        "entitytypeid": "PLACE_ADDRESS.E45",
                        "value_type": "e4f5bd2f-56b7-4b8d-ac48-7e6d90e530ae",
                        "alternate_entitytypeid": "DESCRIPTION_OF_LOCATION.E62"
                    },
                    {
                        "field_name": "designations",
                        "entitytypeid": "TYPE_OF_DESIGNATION_OR_PROTECTION.E55"
                    }
                ]
            }
        }
    },
    "shp": {
        "NAME": "HistoricPlacesLA_Search_Results_Export",
        "SCHEMA": [
            {"field_name": "prime_name","source": "primaryname","data_type": "str","data_length": "128"},
            {"field_name": "arches_id","source": "entityid","data_type": "str","data_length": "128"},
            {"field_name": "resource","source": "resource_name","data_type": "str","data_length": "128"},
            {"field_name": "othernames","source": "alternatename","data_type": "str","data_length": "128"},
            {"field_name": "type","source": "field_map","data_type": "str","data_length": "128"},
            {"field_name": "address","source": "field_map","data_type": "str","data_length": "128"},
            {"field_name": "designatns","source": "field_map","data_type": "str","data_length": "128"}
        ],
        "RESOURCE_TYPES": {
            "ACTOR.E39": {
                "FIELD_MAP": [
                    {
                        "field_name": "type",
                        "entitytypeid": "ACTOR_TYPE.E55"
                    },
                    {
                        "field_name": "address",
                        "entitytypeid": "PLACE_ADDRESS.E45",
                        "value_type": "e4f5bd2f-56b7-4b8d-ac48-7e6d90e530ae",
                        "alternate_entitytypeid": "DESCRIPTION_OF_LOCATION.E62"
                    }
                ]
            },
            "HERITAGE_RESOURCE.E18": {
                "FIELD_MAP": [
                    {
                        "field_name": "type",
                        "entitytypeid": "HERITAGE_RESOURCE_TYPE.E55"
                    },
                    {
                        "field_name": "address",
                        "entitytypeid": "PLACE_ADDRESS.E45",
                        "value_type": "e4f5bd2f-56b7-4b8d-ac48-7e6d90e530ae",
                        "alternate_entitytypeid": "DESCRIPTION_OF_LOCATION.E62"
                    },
                    {
                        "field_name": "designatns",
                        "entitytypeid": "TYPE_OF_DESIGNATION_OR_PROTECTION.E55"
                    }
                ]
            }
        }
    }
}