Light

datatogether / research Goto Github PK

View Code? Open in Web Editor NEW

91.0 91.0 11.0 7.44 MB

📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity

License: Creative Commons Attribution Share Alike 4.0 International

research's Introduction

Data Together

Data Together empowers people to create a decentralized civic layer for the web, leveraging community, trust, and shared interest to steward data they care about.

Find out about who we are, what we do, and how to get involved at https://datatogether.org/)!

Organizational structure

We maintain pretty light governance but commit to an annual in-person meeting and quarterly calls:

Quarterly Calls

Quarterly calls are held four times annually, for everyone, but especially Data Together partners to sync up on ongoing projects, what is going on in their organizations, and more.

📅 Once per quarter
▶️ Call Playlist: youtube.com/playlist?list=PLtsP3g9LafVul1gCctMYGm9sz5FUWr5bu

Working Openly

We have developed guidelines for working as an open project, these are all contained in this repo:

License

Data Together Documentation Materials are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

research's People

Contributors

Stargazers

Watchers

Forkers

mhucka machawk1 allenpg bysshe seamustuohy masterscott guozanhua iznd laansi kbits000 doehae

research's Issues

Need Section on Privacy

Let's start with this one:
http://www.dlib.org/dlib/january16/brunelle/01brunelle.html

Is this the place for other decentralized projects?

Hey there, thank you for submitting an issue!

We are trying to keep issues for feature requests and bug reports. Please
complete the following checklist before creating a new one:

feature request

I'd love to bring over some of the projects listed here datatogether/datatogether#3:

- indie web
- network commons (e.g. mesh nets, netCommons, commons-based licensing https://wiki.p2pfoundation.net/Network_Commons_License)
- p2p foundation
- digital justice / data justice / design justice 
    - https://civicquarterly.com/article/two-way-streets/
    - https://datajustice.github.io/report/
    - ttp://detroitdjc.org/
- community technology
    - http://detroitcommunitytech.org/learning-materials

Adding in non Archiver scraping web links

Looking up code syntax I found the following blog post and referenced github repo. I wondered if links such as the example below should be tracked as non Archiving scraping web links under research/web_scraping.

I'm not quite sure what the best format is for folks to add links, comment, and edit and don't have any sense of how frequently such a resource would be updated.

I'm interested in people's thoughts on 1) if this belongs in research/web_scraping or somewhere else and 2) how to go about a useful PR on the topic including preferred tracking format and any document organization.
cc @b5 @jeffreyliu @weatherpattern @mhucka

Example links:
http://blog.danwin.com/examples-of-web-scraping-in-python-3-x-for-data-journalists/
https://github.com/stanfordjournalism/search-script-scrape

Pre-processing coverage data for Data Visualizations

@mhucka has been exploring ways to facilitate visually drilling down into the coverage data (aka. public record of all the data held by participating orgs). Discussion of dataviz options here: https://github.com/datatogether/research/tree/master/data_visualization

This will inevitably require pre-processing of the data, partially because you often end up with situations where there are tens of thousands of items (ie. URLs) at a given layer of the navigation tree. In addition to pre-processing based on simple analysis of the content, such as running files through FITS to extract content types, there is clearly a need for deeper machine analysis. At the very least you could use entity extraction to identify patterns/topics within a corpus.

@mhucka has already been working on some of this. Let's rope in a few more people. @chrpr and @mejackreed come to mind.

The ETL pattern seems pretty applicable, and opens opportunities for experimenting with incorporating distributed data and distributed tools into machine analysis pipelines:

aggregate the essential info into a workable dataset (currently tracking info in a SQL database, eventually will be distributed)
analyze that dataset
write the analyzed/reformatted result (ie. to IPFS)
pass around a reference to the updated/processed/extended dataset (ie. IPFS hash)

Add Deep Web Crawling Research

Update code of conduct

After issue #16 in the roadmap repo is settled, we should update the code of conduct file in this repo.

Add BioFabric viz to docs

Don't wanna step on toes with edits, but don't wanna forget!

http://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html

cc: @shaqsingh @b5

Address handling of paywall'ed articles

In recent additions to this research, I added PDFs of articles that may in some cases be under paywall even if I managed to find them on the internet. We need to decide on a policy about including such PDFs. Some initial options that come to mind:

Don't worry about it
Remove the PDFs and link to the article websites and let readers sort out access
Remove the PDFs and link to whatever Google Scholar links to

Add README and Templates

Make sure this repo has the following files:

Readme README.md
- Repo Badges for: Github Project, Slack, License
- 1-3 sentence description of repository contents
- Getting Involved section
License -- LICENSE
Contributing Guidelines (minimal and pointing to org-wide) .github/CONTRIBUTING.md
Issue Template -- .github/ISSUE_TEMPLATE.md
GitHub Description from 1-3 sentence readme blurb

This issue forms part of a project-wide meta-issue

Decide how to construct a test suite

A test suite of archiving cases would be useful. The idea would be to collect a set of examples of websites to crawl, with different features and levels of complexity, to test crawler/archiving software tools. The cases would range from easy to hard. Test suites such as this are well-known, and employed in other efforts to demonstrate software compliance. One can also build a lot of tooling around test cases, including drivers and even controlled vocabularies to describe the different features being tested by different cases. (C.f. this test suite in an unrelated domain.)

Test suites for archivers is something that other groups have done to some extent, so an important question to address is how this effort would be situated in the broader space and how would it interact with other people's efforts.

Submit README pull requests for adding comparison research to 3rd-party projects

Slack context: https://archivers.slack.com/archives/C3ZNNHPT7/p1497412282080572

Example: xtuhcy/gecco#33

The idea would be to create a new team with access to the relevant repo with documention of the spreadsheet and explanation. Co-maintainers would get write access from that team, and also full write-access on spreadsheet (either bc it's world writeable or bc with invite).

Each co-maintainer should ideally be able to further give access to others, if need be, without going through us. (Does this sound alright?)

To Do

split resource into own repo (like the awesome list, but purely a wrapper for spreadsheet)? cc @mhucka
improve wording as per @mhucka suggestion
flesh out full list of projects to submit to (with PR links)

Add decentralized/data commons articles

@mhucka where should these go?

e.g., Contreras, J. L., & Reichman, J. H. (2015). Sharing by design: Data and decentralized commons. Science, 350(6266), 1312-1314.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.