Comments (2)
The goal for bulk is to remain a lightweight tool that just takes care of the "bulk annotation" bit. If you want to cluster the data yourself upfront and use those clusters as colors then you can totally already do that. Just as long as the resulting .csv
file has and x
and y
column. One challenge with this approach is that bulk will only ever allow two dimensions to be drawn, so you'll also need to think about dimensionality reduction.
In my experience clustering is a very hard problem to get right because it's very hard to "know" if you've done clustering well. There's not a metric like "accuracy" that you can use in hindsight to help you compare approaches.
Instead, I prefer to just lower the dimensionality of an embedded dataset to eye-ball if clusters appear. If they do, and I can confirm by inspecting, then I attach a label. From here it's a classification problem, which allows me to circumvent the need for clustering.
from bulk.
Feel free to tell me if I misinterpreted the request, but as-is I don't think this library needs to concern itself with clustering. That's something you're free to do upfront in a notebook as you prepare a csv file for this tool.
from bulk.
Related Issues (20)
- Add extras with UMAP alternatives. HOT 2
- bulk image takes a while to load - write docs HOT 5
- Add import statement to tutorial on text HOT 1
- bulk text not rendering HOT 3
- Help creating pipeline HOT 3
- Add a "demo mode" HOT 1
- Request for a new feature: Option to choose the size of the UMAP 2D fig HOT 1
- Add `bulk embed` command.
- Request: Inverse selection HOT 2
- warn when column is missing HOT 3
- Add `info` command to help folks debug.
- Add 'bulk utils to-phrases'
- Rethink `--keywords` HOT 1
- Request: More efficent usage of browser space. HOT 2
- segmentation fault HOT 3
- Request: Save as HTML HOT 2
- something relies on X-display - not sure what HOT 8
- if a keyword is not there (in any of the datapoints) - exception HOT 7
- Next version HOT 1
- New release version to pypi HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bulk.