Git Product home page Git Product logo

Comments (3)

endomorphosis avatar endomorphosis commented on June 26, 2024

https://github.com/endomorphosis/opendatahack
https://www.encode.club/open-data-hack

from devgrants.

endomorphosis avatar endomorphosis commented on June 26, 2024

Updates
After the hackathon I had a number of people who had reached out to me, and who wanted to take on the task of making a KNN database on filecoin / ipfs. I am currently working on the entire stack of a consumer facing legal chat avatar system, and I cannot dedicate my full time to building an entirely novel KNN database in addition. However my brother thought it was a good idea that he wanted to learn to code, and I told him that going to community college is a waste of time. So I proposed to the people who contacted me, that I will be happy to oversee the project, and have them oversee my brother, and we will take no compensation, but otherwise they have posted their compensation here. I have told them that I expect each of them to purchase a GPU workstation or server as their local development environment, IPFS node, GPU cluster inference, containing 8 cores 128GB ram, a Nvidia 3090, 2TB OS SSD 2tb NVME cache and 16TB of mirrored spinning disk drive. I estimate that it would take 1 full year to complete this proposal, and that milestones should be placed quarterly. Therefore I propose a total budget of $200k with $50k paid quarterly.

Persons:

Team Lead:
Benjamin Jay Barber (Endomorphosis)
Responsibilities: Algorithm design, and supervision, and emergency coding needs.
Salary expectations (0) unless I have to take production software engineering role

Programmers:

J.G Wentworth:
Responsibilities: Documentation, HTML/CSS landing pages
Salary expectations (0) novice student, family member to Benjamin Barber

GrandZero:
Responsibilities: IPFS, geo replication / routing for query endpoints.
Salary expectations $4k / mo

Hansel:
Responsibilities: Implement KNN , ANN algorithms, openai / hugging face api integrations
Salary expectations $4k / mo

Twinstar:
Responsibilities: Database design
Salary expectations $4k /mo.

Swimmer:
Responsibilities: Integration, ANN algorithms, embeddings
Salary expectations: $4k / mo

Consultants (as needed):

Mwni (Blockchain expert, runs a DEX) (partner of endomorphosis in Hallucinate LLC)
Salary expectations (hourly) $150 hourly

Danukeru (ML expert, runs Colocation and ML services business ‘Social Grep’)
Salary expectations (hourly) $150 hourly

CITATIONS:
#923 (comment)

#1662

https://discuss.ipfs.tech/t/programmatically-set-cids/15220/9

https://github.com/endomorphosis/opendatahack

https://youtu.be/Xq-VEPJ4Rhg?si=J3qTKZ90TH8hI3Hq&t=5369

https://www.blog.encode.club/open-data-hack-powered-by-filecoin-prizewinners-and-summary-dcdf52059867

DESIGN
Phase 1:
The design of the database is to start with a noSQL based database / structure. Here we will start by implementing an IPFS bridge to huggingface datasets. Once we can import and export data from IPFS to / from the hugging face datasets library, we will focus on indexing the dataset using an embedding library, depending on the datatypes in the dataset, to generate a vector representation. There will be a digest of the vector representations and the associated members of the dataset generated, which can be exported to IPFS, and also imported on load. Finally there will be a query method to query datasets, given a specific hugging face dataset, with an arbitrary input type (image, sound, etc), where it will be converted to text and the text rewritten for the prompt engineering that will go into generating the search embeddings.

Phase 2:
A complete web service that will automate the caching, indexing of any arbitrary, querying, of any dataset from hugging face on IPFS, given some amount that has been sent to the address, with some sort of smart contract.

Phase 3:
Attempt full ACID compliance with locks, journals, indexes, and other methods to make sure that there is never any inconsistency within the database, to make sure that every query is full, updated, and complete doing something like this. However we ought to be able to join a traditional SQL query with a KNN sort.

Phase 4:
We host the entirety of the Free.law dataset on IPFS, and create a query service for search to be done on that database, either with cloudflare embeddings api service or with filecoin if they get a good GPU inference network going in time.

from devgrants.

ErinOCon avatar ErinOCon commented on June 26, 2024

Hi @endomorphosis, this issue has been closed. If you have an interest in submitting a project for review in the future, be sure to use one of our applicable templates. This will ensure the proposal follows the correct review pipeline.

If you have outstanding questions for our team, please contact us at [email protected].

from devgrants.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.