harmonydata / harmony Goto Github PK
View Code? Open in Web Editor NEWThe Harmony Python library: a research tool for psychologists to harmonise data and questionnaire items. Open source.
Home Page: https://harmonydata.ac.uk
License: MIT License
The Harmony Python library: a research tool for psychologists to harmonise data and questionnaire items. Open source.
Home Page: https://harmonydata.ac.uk
License: MIT License
Eve Cheng has done some experiments with the Word Movers Distance algorithm which gives the distance between two sequences of sentence embeddings.
Can Harmony use this to say that the GAD-7 is e.g. 60% similar to the PHQ-9?
See Colab notebook:
https://github.com/harmonydata/experiments/blob/main/harmony_wmd_experiment.ipynb
We also have a demo of Harmony integrated with external data sources: https://harmonycataloguelookup.azurewebsites.net/
Source code is at: https://github.com/harmonydata/harmony_catalogue_lookup_dash
See mockup
https://github.com/harmonydata/hackathon/blob/main/instrument_level.png
The use case would be:
as a research psychologist, I’ve got one small study here, one small study there. Individually they don’t give enough statistical power, but can they do it together? So can we combine Study A and Study B to get enough statistical power for my research question?
Word Movers Distance is a candidate but it's not necessarily how we solve it. It might be too slow for example.
Maybe a simple solution is just to have a threshold and we report the number of questions in Instrument A matching questions in Instrument B at that threshold.
User finds an instrument on the internet in PDF, HTML or other format. Can we allow them to paste the URL into Harmony?
Convenient way to ingest instruments into the tool
When I try to install the library I faced this #24 . I resolved that issue but it lead me to that issue because we didn't mentioned the required (min and max) version of python in setup.py/pyproject.toml
.
Very minor point but I wonder if there should be some sort of undefined output from API level for this sort of thing? two null strings being classifed as 100% similar - I mean they are, but it doesn't feel like it's the right response?https://harmonydata.ac.uk/app/#/model/OkY4K8AEetIBHboCPDWB
As an addition to PDF and Excel import. Requested by one user.
Many NGOs are gathering data in tools such as Google Forms
CSVs not handled on API side
the XLSX parser takes the first line, even through the downloadable template includes headers on the first line - I think given the example template uses headers it would make sense for the parser to discard the first line?
https://harmonydata.ac.uk/app/#/model/kG4Lo2T91ADYwtG7zUgE
At the end of the Harmony user journey when the user exports results as xlsx ( https://youtu.be/CqAsrY74zNM ), can we generate either some Python code or a Colab or Jupyter notebook allowing them to analyse their datasets?
This would be a nice feature to streamline the whole data harmonisation process.
We could offer the option to do it in the Web UI but by helping the user complete it on their machine we bypass some confidentiality issues (if user is not allowed to upload raw data to internet)
An open question: how can the user do a statistical analysis and incorporate the Harmony scores? Do we make a new variable which is e.g. 0.65 × Instr1Ques1 + 0.33 × Instr2Ques5 etc... and then do statistical tests on it???
Allow uploading a .html file with load_instruments_from_local_file, it should then remove all HTML tags etc. and create the instruments.
Sometimes people have an instrument in HTML format.
When I upload word document with survey items they are not all a extracted by Harmony. In the attached file Harmony doesn't read "legal marital status".
MCS all items english.docx
Provide details regarding the operating system, toolchain, and environment.
Harmony reads all items on the list
The questions when a different pdf is uploaded do no seem to refresh. A full page refresh seems to fix this issue.
We have a draft improvement to the PDF parsing logic. This will enable us to eliminate Spacy as a dependency.
The training code is here:
https://github.com/harmonydata/pdf-text-models-amol
The API modification is here
https://github.com/harmonydata/harmonyapi branch nospacy
The modification to the main python library is in
git clone -b updated_files_for_forntend https://github.com/Notysoty/harmony.git
Please quality control this branch and then merge it into main in all repositories and remove spacy from all requirements.txt
and toml
files.
Pdf extraction needs improvement
Hi Thomas,
I just noticed an issue with something. Somehow when I try to upload the attached files Harmony doesn’t detect the two items on “martial status” and “mother status”. If there is a quick fix before pitch tomorrow it would be great to do it, if not, I’ll think of something to hide it.
files are:
MCS items english.zip
MCS items english.docx
https://mail.google.com/mail/u/0/#inbox/FMfcgzGwHLfLhGlSCslRsbpVFHGcKCxD
Thank you.
Bettina
attached questions of two identical list of questions with the spanish one translated from english
If I upload a CSV file like this, Harmony puts digits at the start of each question
1 I feel nervous
2 I feel afraid
Web Harmony
Make file harmony.csv
with content
1 I feel nervous
2 I feel afraid
Upload to web UI
You will see digits at the start of all questions
Digits should be removed
from John: Thomas I think we might need a clean up of the MHC data there shouldn’t be a question in there with no text really?
See training data in https://github.com/harmonydata/pdf-questionnaire-extraction
Example CSV file from the UKLLC data: https://fastdatascience.z33.web.core.windows.net/ALSPAC1.csv
Harmony is not parsing this CSV correctly.
It has lost lots of the questions and it has broken on the items containing colon or slashes.
from BEttina
I just bumped into new Harmony feature and wanted to flag the below:
They are definitely different sentences…(lost my key, found my car) Is it because its not mental health related? I think with the new feature we need to rethink the linking to the catalogue, as the link to eating disorders doesn’t make sense. Is there a way to activate/deactivate when there is no overlap between uploaded items and catalogue?
After cloning the repo I created the python and after that when I tried to install the libraries using pip install -r requirements.txt
and pip install .
I got this below error:
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for thinc
Failed to build thinc
ERROR: Could not build wheels for thinc, which is required to install pyproject.toml-based projects
[end of output]
OS: Ubuntu 23.04
Python: 3.12.2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.