Git Product home page Git Product logo

harmonydata / harmony Goto Github PK

View Code? Open in Web Editor NEW
7.0 6.0 11.0 23.69 MB

The Harmony Python library: a research tool for psychologists to harmonise data and questionnaire items. Open source.

Home Page: https://harmonydata.ac.uk

License: MIT License

Python 58.12% Jupyter Notebook 41.88%
anxiety data-harmonisation data-harmonization data-science depression harmonisation harmonization harmony mental-health-catalogue natural-language-processing

harmony's People

Contributors

0x48piraj avatar evewcheng avatar olikelly00 avatar ollylucl avatar shahid-0 avatar sourface94 avatar woodthom2 avatar zaironjacobs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

harmony's Issues

Give a similarity function between questionnaires

Description

Eve Cheng has done some experiments with the Word Movers Distance algorithm which gives the distance between two sequences of sentence embeddings.

Can Harmony use this to say that the GAD-7 is e.g. 60% similar to the PHQ-9?

See Colab notebook:
https://github.com/harmonydata/experiments/blob/main/harmony_wmd_experiment.ipynb

We also have a demo of Harmony integrated with external data sources: https://harmonycataloguelookup.azurewebsites.net/

Source code is at: https://github.com/harmonydata/harmony_catalogue_lookup_dash

See mockup

https://github.com/harmonydata/hackathon/blob/main/instrument_level.png

Rationale

The use case would be:

as a research psychologist, I’ve got one small study here, one small study there. Individually they don’t give enough statistical power, but can they do it together? So can we combine Study A and Study B to get enough statistical power for my research question?

Word Movers Distance is a candidate but it's not necessarily how we solve it. It might be too slow for example.

Maybe a simple solution is just to have a threshold and we report the number of questions in Instrument A matching questions in Instrument B at that threshold.

Load instrument from URL

Description

User finds an instrument on the internet in PDF, HTML or other format. Can we allow them to paste the URL into Harmony?

Rationale

Convenient way to ingest instruments into the tool

Required python version/bound is not mentioned

Description

When I try to install the library I faced this #24 . I resolved that issue but it lead me to that issue because we didn't mentioned the required (min and max) version of python in setup.py/pyproject.toml.

Allow user to process data

Description

At the end of the Harmony user journey when the user exports results as xlsx ( https://youtu.be/CqAsrY74zNM ), can we generate either some Python code or a Colab or Jupyter notebook allowing them to analyse their datasets?

Rationale

This would be a nice feature to streamline the whole data harmonisation process.

We could offer the option to do it in the Web UI but by helping the user complete it on their machine we bypass some confidentiality issues (if user is not allowed to upload raw data to internet)

An open question: how can the user do a statistical analysis and incorporate the Harmony scores? Do we make a new variable which is e.g. 0.65 × Instr1Ques1 + 0.33 × Instr2Ques5 etc... and then do statistical tests on it???

Allow loading HTML file format

Description

Allow uploading a .html file with load_instruments_from_local_file, it should then remove all HTML tags etc. and create the instruments.

Rationale

Sometimes people have an instrument in HTML format.

Doesn't extract data from word file

Description

When I upload word document with survey items they are not all a extracted by Harmony. In the attached file Harmony doesn't read "legal marital status".
MCS all items english.docx

Environment

Provide details regarding the operating system, toolchain, and environment.

How to Reproduce

Expected Behavior

Harmony reads all items on the list

Integrate new non-spacy Pdf parsing into main Harmony

Description

We have a draft improvement to the PDF parsing logic. This will enable us to eliminate Spacy as a dependency.

The training code is here:
https://github.com/harmonydata/pdf-text-models-amol

The API modification is here
https://github.com/harmonydata/harmonyapi branch nospacy

The modification to the main python library is in

git clone -b updated_files_for_forntend https://github.com/Notysoty/harmony.git 

Please quality control this branch and then merge it into main in all repositories and remove spacy from all requirements.txt and toml files.

Rationale

Pdf extraction needs improvement

“marital status” and “mother status” not detected

Hi Thomas,

I just noticed an issue with something. Somehow when I try to upload the attached files Harmony doesn’t detect the two items on “martial status” and “mother status”. If there is a quick fix before pitch tomorrow it would be great to do it, if not, I’ll think of something to hide it.

files are:
MCS items english.zip
MCS items english.docx

https://mail.google.com/mail/u/0/#inbox/FMfcgzGwHLfLhGlSCslRsbpVFHGcKCxD

Thank you.

Bettina

Harmony should remove digits if every question starts with a digit

Description

If I upload a CSV file like this, Harmony puts digits at the start of each question

1 I feel nervous
2 I feel afraid

Environment

Web Harmony

How to Reproduce

Make file harmony.csv with content

1 I feel nervous
2 I feel afraid

Upload to web UI

You will see digits at the start of all questions

image

Expected Behavior

Digits should be removed

Remove empty items from MHC

from John: Thomas I think we might need a clean up of the MHC data there shouldn’t be a question in there with no text really?

Don't match to MHC items if similarity is too low

from BEttina

I just bumped into new Harmony feature and wanted to flag the below:

They are definitely different sentences…(lost my key, found my car) Is it because its not mental health related? I think with the new feature we need to rethink the linking to the catalogue, as the link to eating disorders doesn’t make sense. Is there a way to activate/deactivate when there is no overlap between uploaded items and catalogue?

ERROR: Could not build wheels for thinc, which is required to install pyproject.toml-based projects

Description

After cloning the repo I created the python and after that when I tried to install the libraries using pip install -r requirements.txt and pip install . I got this below error:

error: command '/usr/bin/gcc' failed with exit code 1
[end of output]
  
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for thinc
Failed to build thinc
ERROR: Could not build wheels for thinc, which is required to install pyproject.toml-based projects
[end of output]

Environment

OS: Ubuntu 23.04
Python: 3.12.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.