Git Product home page Git Product logo

collections-assessment-eda's Introduction

Collections Assessment EDA

Notebooks for exploratory analysis of archival collections assessment data using Pandas.

Explanation

Assessments can hold valuable data about collections that can be leveraged in different contexts (strategic planning, surfacing hidden collections, etc.). This notebook uses basic Pandas functionality to clean and do an exploratory analysis of assessment data from an ArchivesSpace Assessment record list report, and to answer a few example questions about a fictional repository:

  • How are the collection survey sizes distributed?
  • How much material needs to be reformatted?
  • What kind of condition are these collections in?
  • What are the major at-risk formats in the assessed collections?
  • Which collections contain specific formats?

While the example analysis focuses on using the assessment data to inform digitization/reformatting decisions, it could just as easily focus on using the data to inform other areas of practice, such as processing to improve discoverability and preservation needs assessments. This notebook only scratches the surface of how Pandas can be used in scalable, reproducable solutions to advance data-informed decision-making and produce new knowledge from collections (meta)data.

Data Source

The record list report used in this notebook is populated with fake data. It is modeled on a vanilla assessment import template from the ArchiveSpace Assessment module, so it is should be fairly straightforward to repurpose for reports for real repositories. It can also be adapted to explore assessment data in ANY spreadsheet, not just those from ASpace - just modify the code to suit your column headers and data types.

Requirements

  • Pandas
  • Numpy

For an introduction to Pandas, the official documentation is a good place to start.

Can't I just do this in Excel and/or OpenRefine?

If all you need to do is clean up a few free-text fields and apply some filters, those tools are great! But programmatic cleaning and analysis with Pandas is much easier to replicate when new assessments are performed and when assessment records are updated, allowing for more efficient tracking of changes in collection conditions and progress toward strategic goals. This means you can use Pandas to do more with less time and effort.

collections-assessment-eda's People

Contributors

adgray987 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.