Cinema Data -
a Linked Open Data initiative
Topic and Scope:
Cinemadata’s topic is access to cinemas by urban and rural populations in the early 20th century. The information is obtained from trade journals, directories, images, and first-person testimony. The project will test linked open data tools for exposing the data for public use.
The initial scope is
The Theaters Database—content which begins in the form of
three text files.
The Theaters Database
was created for a research project by Karan Sheldon while gathering information on the places in northern New England where movies were shown. Originally structured in ProCite flat file databases, one each for the states of Maine, New Hampshire, and Vermont—and an extra on Maine drive-ins. The information was organized by town and cinema.
For example, the town of Bingham, Maine, had four entries: one each for cinemas named Colby, Kennebec Hall Theatre, Paramount, and Robinson. Within Bingham's Kennebec Hall Theatre entry were fields to capture the building's address (unknown), the building's status at the time of the research (demolished), the owner(s), and a Notes field recording the capacity of the hall, the population reported by the particular directory, the management, type of shows. Other fields are Sources, Date opened and Date closed. A Photo field was added when still images had been located.
The data was derived from national theatrical directories, trade periodicals, local newspapers, business records, state directories, and information from a survey workbook administered by Sheldon with colleagues and students. The research project informed “Going to the Movies: A Social History of Motion Pictures in Maine Communities (1990)”, and a subsequent NEH-funded traveling exhibition, “Going to the Movies: A Century of Motion Picture Audiences in Northern New England (1996)”.
Once the data from
The Theaters Database
has been released as Open Data in the preliminary project, cinemadata.org will be expanded to a
Linked
Open Data Project through provision of incoming and outgoing links to the data set, enrichment of the data with digitization and cataloging of associated resources, and crowdsourcing of the newly expanded data set.
Several background resources:
» OPEN » embedded content: Ian Davis slideshare : "30 Minute Guide to RDF and Linked Data"
« CLOSE « embedded content: "30 Minute Guide to RDF and Linked Data"
Preliminary Project, intent and tasks:
intent: Convert
The Theaters Database
data in text files to RDF and publish as Open Data.
tasks:
-
a
rough, incomplete outline of the tasks involved with links to draft → complete tasks
-
understand content
-
clean data
-
determine metadata standards for project: metadata standards for cinemeadata.org
-
...
-
structure URIs for objects and concepts
-
ontologies/vocabularies: taxonomy and thesaurus as SKOS-RDF
-
develop concept scheme(s):
SKOS: “Motion Picture Theaters” concept scheme
This is a first-round draft thesaurus. More detailed connections need to be added as I move the document closer to final draft. Additional re-structuring after this preliminary portion of the project will bring together “motion picture theaters” terms as a SKOS “collection”.
-
VRA/COO field and sub-field values need to capture several concepts and distinctions key to the “theaters”, as enterprise, such as the Owner(s) field with values that record the name of the cinema owner, proprietor, or manager as distinct from Affiliated with — the field in which the corporate entity with which the cinema is associated, usually a circuit or cinema chain company. Examples include New England Theatres, Inc., and E.M. Loew's Theaters, Inc. is recorded.
Additional locally defined fields will augment the project Data Value Standards in order to record “Alternate Theater Name(s)”.
-
configure server to provide RDF content in the cinemadata.org “hash namespace”
examples:
• .rdf — RDF browser requests and SQL-like queries will result in this file/content being served
• .html — the human-readable, “dereferenced” form of the triples
-
express all cinema data fields and values as HTML5 and RDF/XML
scripts:
i)
a perl script (version 4) to convert data for each theater in the .csv file to an html document
the directory listing by theater name for Maine theaters
the directory listing by theater name for Vermont theaters
sample files:
Several changes will be made to the script and these files before they will be considered final copies.
ii)
perl script to convert each of the .csv files to RDF [pending VRA/CCO metadata decisions]
sample files:
-
Ensure that sources and variant forms of source titles are clearly cited [see spreadsheet of bibliography as comma separated values] and publish the project bibliography as RDF in The Bibliographic Ontology bibo namespace: http://purl.org/ontology/bibo/
-
Since it has become difficult to get installation of several applications on the hosted account, an Apache Server build for Collective Access, ARC2, and Graphite, which is an open source PHP Library, is now online. Eventually,
4store
, which is also installed, will be the triplestore for this project.
-
add dataset to
the Data Hub
(note:
about CKAN)
-
...
Full Project, intent and tasks:
intent: Provide context for data from
The Theaters Database
that has been published as open data.
tasks:
-
-
refine “motion picture theaters” vocabulary: resturcture terms as a SKOS “collection”
-
compile list of additional resources
-
catalog additional resources
-
develop & implement plan to digitize additional resources, catalog digital surrogates, and publish all new data as open data
-
determine other data sets to link to
-
design and implement crowdsourcing by making an open call for community contributions that will further
“contextualize” the “things” we've published as RDF.