Git Product home page Git Product logo

person-name-annotator-example's Introduction

nlpsandbox

Home repository

person-name-annotator-example's People

Contributors

dependabot[bot] avatar github-actions[bot] avatar gkowalski avatar thomasyu888 avatar tschaffter avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mcw-bmi

person-name-annotator-example's Issues

Add s6

See date annotator example

Improve annotation speeds

I did a quick test:

import pandas as pd
import re

text = """Getting Started with Synapse Tom
This guide is for new users who are interested in learning about Synapse. You will learn fundamental Synapse features by performing some common tasks:

Create your own Project and add content to Synapse
Provide a Project description alongside your materials via the Synapse Wiki tools
Share your work with other Synapse users, Teams of users, or the public
What is Synapse?
Synapse is a collaborative research platform that allows individuals and teams to share, track, and discuss their data and analysis in projects. Synapse is built to work on the web. We provide access to Synapse features and services for programmers through a REST API, Python client, command line client, and R client.

Synapse hosts many research projects and resources. It also hosts crowdsourced competitions, including DREAM Challenges. Sage Bionetworks provides Synapse services free of charge to the scientific community through generous support from various funding sources.

Create Your Account
Anyone can browse public content on the Synapse web site. To download and create content, you will need to register for an account using an email address. You will receive an email message for verification to complete the registration process.
Tom
Getting Certified
Synapse is a data sharing platform approved for storing data from human subjects research. This requires special care and thought. To upload files, Sage Bionetworks requires you demonstrate awareness of privacy and security issues.

You can complete this by taking a Certification Quiz.

Making and Managing Projects in Synapse
Synapse Projects are online workspaces where researchers can collaborate and organize their work. Synapse supports all kinds of working groups: individuals, small teams, and large consortia.
Tom
To create a new Project:

Navigate to the User Menu and click on Projects.
Click the Create a New Project button. Tom
Decide on a unique name for your Project and click Save.
Your Projects dashboard stores your collection of Projects.

Read about Projects in the User Guide.

Synapse IDs
Synapse Projects are assigned a Synapse ID, a globally unique identifier used for reference with the format syn12345678. Often abbreviated to “synID”, the ID of an object never changes, even if the name does. The Synapse ID is always accessible in the URL and visible on the webpage.

Organizing Content in Files and Folders
Projects contain Files, which can be organized into Folders. Folders and Files also have their own unique Synapse IDs and can be moved within or between Projects. Uploaded files are stored within Synapse storage.

Use the Tools Menu to upload a file:

Navigate to the Files tab.
Use the Files Tools menu to select Add New Folder.
Decide on a Folder name and click Save.
Navigate into your new Folder and use the Folder Tools menu to select Upload or Link to a File.
Use the Browse button to select the file, or drag and drop it to upload, and click Save.
To explore other features available for Files and Folders, read about annotating Files, assigning DOIs, versioning, Provenance, and sharing settings.
"""

firstnames = pd.read_csv("firstnames.csv")

def test_re(text):
    annotations = []
    for name in firstnames['firstname']:
        matches = re.finditer(
            r'\b({})\b'.format(name), text, re.IGNORECASE
        )
        for match in matches:
            annotations.append(dict(
                start=match.start(),
                length=len(match[0]),
                text=match[0],
                confidence=95))
    return annotations


def test_re_in(text):
    annotations = []
    for name in firstnames['firstname']:
        if name.lower() in text.lower():
            matches = re.finditer(
                r'\b({})\b'.format(name), text, re.IGNORECASE
            )
            for match in matches:
                annotations.append(dict(
                    start=match.start(),
                    length=len(match[0]),
                    text=match[0],
                    confidence=95))
    return annotations

The times.

%timeit test_re(text)                                                                                                                                                       
23.2 s ± 473 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit test_re_in(text)                                                                                                                                    
3.07 s ± 53 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Update the names in data folder

By annotating a note, I found that there are a lot of firstnames/lastnames that would probably cause a lot of issues. For instance:

"me", "the", "on", "so", "per", "he", "D", "M", "G", "I", "weeks"

I find that these could potentially be problematic, especially "the", "on", "he"....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.