Comments (3)
https://github.com/endomorphosis/opendatahack
https://www.encode.club/open-data-hack
from devgrants.
Updates
After the hackathon I had a number of people who had reached out to me, and who wanted to take on the task of making a KNN database on filecoin / ipfs. I am currently working on the entire stack of a consumer facing legal chat avatar system, and I cannot dedicate my full time to building an entirely novel KNN database in addition. However my brother thought it was a good idea that he wanted to learn to code, and I told him that going to community college is a waste of time. So I proposed to the people who contacted me, that I will be happy to oversee the project, and have them oversee my brother, and we will take no compensation, but otherwise they have posted their compensation here. I have told them that I expect each of them to purchase a GPU workstation or server as their local development environment, IPFS node, GPU cluster inference, containing 8 cores 128GB ram, a Nvidia 3090, 2TB OS SSD 2tb NVME cache and 16TB of mirrored spinning disk drive. I estimate that it would take 1 full year to complete this proposal, and that milestones should be placed quarterly. Therefore I propose a total budget of $200k with $50k paid quarterly.
Persons:
Team Lead:
Benjamin Jay Barber (Endomorphosis)
Responsibilities: Algorithm design, and supervision, and emergency coding needs.
Salary expectations (0) unless I have to take production software engineering role
Programmers:
J.G Wentworth:
Responsibilities: Documentation, HTML/CSS landing pages
Salary expectations (0) novice student, family member to Benjamin Barber
GrandZero:
Responsibilities: IPFS, geo replication / routing for query endpoints.
Salary expectations $4k / mo
Hansel:
Responsibilities: Implement KNN , ANN algorithms, openai / hugging face api integrations
Salary expectations $4k / mo
Twinstar:
Responsibilities: Database design
Salary expectations $4k /mo.
Swimmer:
Responsibilities: Integration, ANN algorithms, embeddings
Salary expectations: $4k / mo
Consultants (as needed):
Mwni (Blockchain expert, runs a DEX) (partner of endomorphosis in Hallucinate LLC)
Salary expectations (hourly) $150 hourly
Danukeru (ML expert, runs Colocation and ML services business ‘Social Grep’)
Salary expectations (hourly) $150 hourly
CITATIONS:
#923 (comment)
https://discuss.ipfs.tech/t/programmatically-set-cids/15220/9
https://github.com/endomorphosis/opendatahack
https://youtu.be/Xq-VEPJ4Rhg?si=J3qTKZ90TH8hI3Hq&t=5369
DESIGN
Phase 1:
The design of the database is to start with a noSQL based database / structure. Here we will start by implementing an IPFS bridge to huggingface datasets. Once we can import and export data from IPFS to / from the hugging face datasets library, we will focus on indexing the dataset using an embedding library, depending on the datatypes in the dataset, to generate a vector representation. There will be a digest of the vector representations and the associated members of the dataset generated, which can be exported to IPFS, and also imported on load. Finally there will be a query method to query datasets, given a specific hugging face dataset, with an arbitrary input type (image, sound, etc), where it will be converted to text and the text rewritten for the prompt engineering that will go into generating the search embeddings.
Phase 2:
A complete web service that will automate the caching, indexing of any arbitrary, querying, of any dataset from hugging face on IPFS, given some amount that has been sent to the address, with some sort of smart contract.
Phase 3:
Attempt full ACID compliance with locks, journals, indexes, and other methods to make sure that there is never any inconsistency within the database, to make sure that every query is full, updated, and complete doing something like this. However we ought to be able to join a traditional SQL query with a KNN sort.
Phase 4:
We host the entirety of the Free.law dataset on IPFS, and create a query service for search to be done on that database, either with cloudflare embeddings api service or with filecoin if they get a good GPU inference network going in time.
from devgrants.
Hi @endomorphosis, this issue has been closed. If you have an interest in submitting a project for review in the future, be sure to use one of our applicable templates. This will ensure the proposal follows the correct review pipeline.
If you have outstanding questions for our team, please contact us at [email protected].
from devgrants.
Related Issues (20)
- Next Step Microgrant: Founder, 0xFreePlay HOT 1
- SitYEA - AI Helth app HOT 2
- BONOVAC Ecosystem HOT 1
- Forta Network | Tooling & Infrastructure RFP HOT 1
- Filecoin x Unreal Engine 5 HOT 1
- Migrated HOT 1
- CrossFi DEFI & Stable Coin on FVM HOT 1
- Open Grant Proposal: ByteAI HOT 1
- motoDEX HOT 1
- Motodex HOT 1
- Burrito Wallet HOT 1
- DAOsign HOT 1
- Next Step Microgrant: FilSnap maintenance and new MetaMask UI feature planning HOT 1
- Test
- Supply Chain - Product Metadata Sharing HOT 2
- Unstoppable NFT HOT 4
- "DNA Network applies for an open grant" HOT 1
- Tamarin's De-identification of unstructured data: Filecoin Open Grant Application HOT 1
- Layer-1 blockchain as the foundation for a public “Property Identity Data System HOT 2
- RiMP HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from devgrants.