Git Product home page Git Product logo

research's Introduction

Data Together

Data Together empowers people to create a decentralized civic layer for the web, leveraging community, trust, and shared interest to steward data they care about.

Find out about who we are, what we do, and how to get involved at https://datatogether.org/)!

Organizational structure

We maintain pretty light governance but commit to an annual in-person meeting and quarterly calls:

Quarterly Calls

Quarterly calls are held four times annually, for everyone, but especially Data Together partners to sync up on ongoing projects, what is going on in their organizations, and more.

๐Ÿ“… Once per quarter
โ–ถ๏ธ Call Playlist: youtube.com/playlist?list=PLtsP3g9LafVul1gCctMYGm9sz5FUWr5bu

Working Openly

We have developed guidelines for working as an open project, these are all contained in this repo:

License

Data Together Documentation Materials are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

research's People

Contributors

allenpg avatar b5 avatar dcwalk avatar frijol avatar jschell42 avatar machawk1 avatar mhucka avatar patcon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

research's Issues

Is this the place for other decentralized projects?

Hey there, thank you for submitting an issue!

We are trying to keep issues for feature requests and bug reports. Please
complete the following checklist before creating a new one:

  • feature request

I'd love to bring over some of the projects listed here datatogether/datatogether#3:

- indie web
- network commons (e.g. mesh nets, netCommons, commons-based licensing https://wiki.p2pfoundation.net/Network_Commons_License)
- p2p foundation
- digital justice / data justice / design justice 
    - https://civicquarterly.com/article/two-way-streets/
    - https://datajustice.github.io/report/
    - ttp://detroitdjc.org/
- community technology
    - http://detroitcommunitytech.org/learning-materials

Adding in non Archiver scraping web links

Looking up code syntax I found the following blog post and referenced github repo. I wondered if links such as the example below should be tracked as non Archiving scraping web links under research/web_scraping.

I'm not quite sure what the best format is for folks to add links, comment, and edit and don't have any sense of how frequently such a resource would be updated.

I'm interested in people's thoughts on 1) if this belongs in research/web_scraping or somewhere else and 2) how to go about a useful PR on the topic including preferred tracking format and any document organization.
cc @b5 @jeffreyliu @weatherpattern @mhucka

Example links:
http://blog.danwin.com/examples-of-web-scraping-in-python-3-x-for-data-journalists/
https://github.com/stanfordjournalism/search-script-scrape

Pre-processing coverage data for Data Visualizations

@mhucka has been exploring ways to facilitate visually drilling down into the coverage data (aka. public record of all the data held by participating orgs). Discussion of dataviz options here: https://github.com/datatogether/research/tree/master/data_visualization

This will inevitably require pre-processing of the data, partially because you often end up with situations where there are tens of thousands of items (ie. URLs) at a given layer of the navigation tree. In addition to pre-processing based on simple analysis of the content, such as running files through FITS to extract content types, there is clearly a need for deeper machine analysis. At the very least you could use entity extraction to identify patterns/topics within a corpus.

@mhucka has already been working on some of this. Let's rope in a few more people. @chrpr and @mejackreed come to mind.

The ETL pattern seems pretty applicable, and opens opportunities for experimenting with incorporating distributed data and distributed tools into machine analysis pipelines:

  1. aggregate the essential info into a workable dataset (currently tracking info in a SQL database, eventually will be distributed)
  2. analyze that dataset
  3. write the analyzed/reformatted result (ie. to IPFS)
  4. pass around a reference to the updated/processed/extended dataset (ie. IPFS hash)

Address handling of paywall'ed articles

In recent additions to this research, I added PDFs of articles that may in some cases be under paywall even if I managed to find them on the internet. We need to decide on a policy about including such PDFs. Some initial options that come to mind:

  1. Don't worry about it
  2. Remove the PDFs and link to the article websites and let readers sort out access
  3. Remove the PDFs and link to whatever Google Scholar links to

Add README and Templates

Make sure this repo has the following files:

  • Readme README.md
    • Repo Badges for: Github Project, Slack, License
    • 1-3 sentence description of repository contents
    • Getting Involved section
  • License -- LICENSE
  • Contributing Guidelines (minimal and pointing to org-wide) .github/CONTRIBUTING.md
  • Issue Template -- .github/ISSUE_TEMPLATE.md
  • GitHub Description from 1-3 sentence readme blurb

This issue forms part of a project-wide meta-issue

Decide how to construct a test suite

A test suite of archiving cases would be useful. The idea would be to collect a set of examples of websites to crawl, with different features and levels of complexity, to test crawler/archiving software tools. The cases would range from easy to hard. Test suites such as this are well-known, and employed in other efforts to demonstrate software compliance. One can also build a lot of tooling around test cases, including drivers and even controlled vocabularies to describe the different features being tested by different cases. (C.f. this test suite in an unrelated domain.)

Test suites for archivers is something that other groups have done to some extent, so an important question to address is how this effort would be situated in the broader space and how would it interact with other people's efforts.

Submit README pull requests for adding comparison research to 3rd-party projects

Slack context: https://archivers.slack.com/archives/C3ZNNHPT7/p1497412282080572

Example: xtuhcy/gecco#33

The idea would be to create a new team with access to the relevant repo with documention of the spreadsheet and explanation. Co-maintainers would get write access from that team, and also full write-access on spreadsheet (either bc it's world writeable or bc with invite).

Each co-maintainer should ideally be able to further give access to others, if need be, without going through us. (Does this sound alright?)

To Do

  • split resource into own repo (like the awesome list, but purely a wrapper for spreadsheet)? cc @mhucka
  • improve wording as per @mhucka suggestion
  • flesh out full list of projects to submit to (with PR links)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.