breck7 / research Goto Github PK
View Code? Open in Web Editor NEWI moved this folder. Keeping this repo up for archival purposes only.
Home Page: https://github.com/breck7/breckyunits.com/tree/main/research
I moved this folder. Keeping this repo up for archival purposes only.
Home Page: https://github.com/breck7/breckyunits.com/tree/main/research
Imagine Ohayo actually gets good at some point and it'd be fun to use it to explore all the world's data. Cloud latency would be bad, so buying an external SSD or SD card preloaded with information might be neat. What would you want on there?
Some ideas:
All of sci-hub 55TB
All of Wikipedia English Text + Media + Edits 10TB
All of Wikipedia Media 30GB
Can someone make a tree document containing all the unicode characters with info for each? Probably want to use TreeBase?
Continue the exploration of Tree based dimension reducers
Feedback from Garth.
Add a "Visualize" button to Designer. Let's pivot away from orbs and double down on lego blocks.
Vis button should make a lego blocks style visualization. Could perhaps shrink font-size to fit a word in a cell. Could use either the 3RD dimension (z axis) to see what cells are extending.
For 3rd dimension perhaps Mahjong would be a good analogy:
A Tree Language that compiles to this (or implements it's own logic)
https://github.com/d3/d3-format would be nice.
It would have the advantage that it would be self-documenting.
Could make a 3D visualization of a TCF app changes over time. For replaying history, etc.
From conversations with X and F folks, it seems like perhaps trees are not so essential about what we are doing here, and perhaps Tree Notation is just as subset of a bigger field of Spatial Notations. Grid notation, cube notation, etc...
That hunch has come up a lot, but some recent experiments seem to be hinting at that more...
Let's have a simple numeric DSL that supports prefix/infix/postfix all at the same time.
Potentially could use blank lines to separate calculations.
Should perform calculations on vectors and/or matrices.
Perhaps if there's an operator on a line than the line is treated as a vector
If a line is just numbers it's treated as a matrix perhaps, and the search is on for an adjacent or parent/child operator for performing an op on the matrix (should have operators for common matrix operations)
a rough example:
3 32 2 3 +
2
2
32
23
+
20 21 23
32 12 32
23 32 1
Paraphrasing from a discussion:
When the conversation hits a certain depth, enforce a new grammar. Only numbers and/or links are allowed, perhaps. The reason is cutting off flame wars misses out on great antifragile opportunities. Sometimes deep in a flame war is when someone post the most valuable statistic or link. Perhaps the language could be called "chill". It could allow what used to be flame wars to continue in a more level headed, scientific discussion.
For converting OHDSI CDM let's just dump the SQL schema and write a method to gen a grammar file.
We need to look beyond the 1 letter, overloaded, ill-defined, unrigorous mathematics we currently use and teach.
You could build a Tree Language that defines all of mathematics in a robust way, in a single (long) document.
Let's make a Tree Language that compiles to "ts/d.ts" files. The compilation should be Type information only, not any Javascript code. The TypeScript should probably compile to a blank file.
Then we can write this file in this new Tree Language:
https://github.com/treenotation/jtree/blob/master/core/jTreeTypes.ts
While working on https://github.com/treenotation/dauscore ran into this idea.
How about a checklist like language. You would define a grammar like the below. Then you could just copy/paste the grammar into a new file and edit that file directly. It would also compile to a web form. And as you update the grammar, updating the programs wouldn't be a total pain.
Basics
This [synthesized|real] dataset is titled [string]
The data is [synthesized|real]
The project link is [url]
Specifications
The specLink is [url]
There are [integer] types/tables/kinds of entities
There are [integer] columns
There are [integer] columnTypes
The column domains [are|are not] appropriately reduced
Data
The downloadLink is [url]
The fileSize is [number] [byteUnit]
There are [integer] rows
The estimated number of potential rows in the domain is listed as [integer]
Tests
It [passes|fails] the copy/paste test
It [passes|fails] the goToDefinition test
It [passes|fails] the 1 click synth test
It [passes|fails] the cellCheck test
It [passes|fails] the download test
It [passes|fails] basic sanity tests
It [passes|fails] the reproduce from rawData test
Timeliness
The measurements span [number] [months|days|years|hours|minutes|seconds|other]
Future Work
The project lists [integer] missing columns that would be nice to have named [string[]]
Accuracy
The dataset contains [integer] missing cells
The dataset contains [integer] cellType errors
Provenance
The data collection steps [are|are not] listed
The machines used to gather the dataset [are|are not] listed
The batches and potential batch effects [are|are not] listed
Potential bias in the dataset [is|is not] explained
The rawData [is|is not] available
The specifications and code needed to process the raw data [are|are not] available
Auditing
The dataset [is|is not] version controlled
All rows and cells [are|are not] blameable
All authors and editors [are|are not] listed
Accessibility
The price to download the data and specs is [number]
A login [is|is not] required to download the data
The data [is|is not] released to the public domain
A direct link to the data [is|is not] available
Formats
The dataset [is|is not] available in CSV
The dataset [is|is not] available in TSV
The dataset [is|is not] available in SQL
The dataset [is|is not] available in JSON
The dataset [is|is not] available in ApacheArrow
The dataset [is|is not] available in XML
Schema Design
Format noise [has|has not been] minimized
The schema [is|is not isomorphic]
Normalization
The dataset contains [integer] duplicated rows
The dataset contains [integer] duplicated cells
Joinability
Common specified grammars and standards [are|are not] used
External grammars [are|are not] listed and specified
Data [is|is not] easy to join on orthogonal datasets```
If a test fails in CI, wouldn't it be cool if you could just click a link and a webpage popped open that was streaming a browser with dev tools open and a breakpoint or debugger statement right at the point of failure? It seems right now often times the pre-debug setup steps take like ~5 minutes or so:
What if you just clicked a link and went straight from step 1 to step 5.
It seems like https://pernos.co is on track to do that.
One strand: Tree source code - Other strand: Grammars/cell type info bound to source code.
Add some type of Normalized mini-language to facilitate more general to/from sql tables.
For now let's have ability to generate multiple CSVS from 1 tree, 1 csv table for each relevant table.
Basically turning a graph into a DAG and vice versa.
Lots of ways this can be done. Some introduce new syntax and some just introduce some conventions.
One idea is below. There is a root table and everything must be reachable from that. The Table names cannot contain a space. The table names come first like in a prefix tree language. The uniqueId is a combination of cell 1 after the table name, joined to the end of any preceding uniqueIds. Columns cannot contain newlines. Not sure if there would be ambiguities in determining what should be a new table vs what should just be an attribute column. Most of the time should be easy to tell, but sometimes there maybe should be a new table (not just an attribute) but hard to tell from the small given subset. Not sure if that's a realistic problem.
test normalized
arrange
require ../index.js jtree.TreeNode
constructWithParagraph
person joesmith
firstName joe
lastName smith
contactInfo office
type office
phone 123-456
contactInfo mobile
type mobile
phone 222-222
friendship janeDoe
friendship somedude
person somedude
person janeDoe
job 1
title boss
contactInfo home
phone 321-231
friendship joesmith
toNormalizedDelimited
assertParagraphIs
person
id,firstName,lastName
joesmith,joe,smith
somedude,,
janeDoe,,
contactInfo
id,type,phone
joesmith office,office,123-456
joesmith mobile,mobile,222-222
janeDoe home,,321-231
friendship
id
joesmith janeDoe
joesmith somedude
janeDoe joesmith
job
id,title
janeDoe 1,boss
Tree Designer today already works somewhat decent on iPad or iPhone, but you could probably go a step further with cells (more of a spreadsheet like UI) to make a great programming experience on the iPad.
A glaring whole in the demonstration languages so far is a complete lisp.
Let's make a good Tree Language that someone could use instead of Bel (or scheme or Clojure or common lisp or arc etc) etc...
toTilesMethod(windowCharWidth, windowCharHeight)
=> split a tree document into tiles
[] Make it even easier to make such vis.
Perhaps have a demo of all 12 of the items listed:
http://grammarware.github.io/parsing/
https://github.com/marak/Faker.js might be able to make program syntehsis/worldwidetypes better.
Tree Object Notation.
Dug is a demo Tree Language showing one approach to a JSON isomorphism (https://jtree.treenotation.org/designer/#standard%20dug).
Another approach would be to create a keyword + suffix based language where the type comes from the suffix. For example:
resultMap
nameString jtree
versionString 30.0.0
descriptionBlockString
Simplify your code with Tree Notation. This jtree package
includes a Tree Notation parser, compiler-compiler, and virtual
machine for Tree Languages, as well as sample languages, implemented
in TypeScript.
prettierMap
useTabsBoolean false
tabWidthInt 2
semisBoolean false
printWidthInt 160
Might be a fun experiment to compare complexity of different implementations via a standardized AST form....Something like, convert source code in language to some IR (TreeSitter, LLVM, etc) with a representation in a Tree Language. Then you can do simple complexity comparisons.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.