Git Product home page Git Product logo

federal-appellate-court-opinions-text-analysis's Introduction

Measuring Judge Ideology

Education Data Science Practicum Spring 2019

Project Started: 1/30/2019
Updated: 3/23/2019

Download and Run "Midterm_Sample" folder for analysis using a sample that you can run on your local device

Accomplished so far:

  1. Data collection:
    • gain all federal appellate court opinions
  2. Pre-Analysis:
    • made sure vector representation of opinions is possible (works for first circuit)
  3. Text Cleaning (lots of regex nightmares!):
    • need to build metadata, for each opinion:
      • authoring judge
      • year
      • court
    • identify documents that contain dissents/concurrences from other judges
    • identify documents that are per curiam
  4. Modeling:
    • average document vectors for each judge to get judge vectors
    • PCA on judge vectors to look for clusters (for 1st circuit)

Notes:

  1. Part (3) above is incomplete. JSON raw data's "year" is sparse. I need to extract year from the opinions.
  2. Once Part (3) is complete, de-mean documents by court and by year.
  3. How to de-mean by topic? I need topic labels. Ash and Chen won't share metadata with topic labels :(

Purpose:

There are lots of models that can be used to estimate ideal points of judges (i.e. Poole and Rosenthal, Martin-Quinn, Clinton, etc). These models use mostly voting records of the judges. The purpose of this study is to try ideal-point modeling using the text of opinions instead of voting behavior. The subjects of the study are federal judges in the Appellate Courts of the United States (i.e. 1st Circuit Court of Appeals).

Data:

Analysis:

Part one:

Some things to watch out for include era-effects, topic-effects and circuit effects.

  1. Start with one circuit court
  2. Learn to implement doc2vec using "fastTextR", "textTinyR"
  3. The idea is to apply this onto all federal appellate court opinions

Problem:

  • I need to extract "year" from the opinions. Where do I get topic? Should I model this?

Part two:

Now that I know I can implenent vector representations of these documents, need to clean text.

  1. Identify the judge who wrote the opinion (Metadata not available for all federal appellate court cases)
  2. Identify dissent, concurrence, per curiam
  3. Extract year from decisions
  4. Then, de-mean document vectors by year and circuit THEN average for each judge

federal-appellate-court-opinions-text-analysis's People

Watchers

James Cloos avatar Chansoo Song avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.