Git Product home page Git Product logo

yorkie's People

Contributors

blurfx avatar chacha912 avatar chromato99 avatar computerphilosopher avatar cozitive avatar daclouds avatar dc7303 avatar dependabot[bot] avatar devleejb avatar dongjins avatar eithea avatar emplam27 avatar gmlwo530 avatar hackerwins avatar highcloud100 avatar jongwooo avatar joohojang avatar joonhyukchoi avatar justicehui avatar krapie avatar loganstone avatar moongyu1 avatar ppeeou avatar raararaara avatar sejongk avatar tedkimdev avatar umi0410 avatar wonjerry avatar yaminyam avatar yoonkijin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yorkie's Issues

Securing Yorkie

Background

When we introduce Yorkie to a service, server and clients can be placed on untrusted networks such as the Internet.

Solution

  1. We should provide an API to put an authentication token in the client.
  2. The server should provide an authentication hook or callback.
  3. We should provide TLS communication.

Add more metrics related to PushPull API

What would you like to be added:

Yorkie uses Prometheus for metrics reporting. The simplest way to see the available metrics is to cURL the metrics endpoint localhost:11102/metrics. The format is described here.

We need to add more metrics related to PushPull API.

Name Description Type
pushpull_response_seconds Response time of PushPull API. Histogram
pushpull_received_changes The number of changes included in a request pack in PushPull API. Counter
pushpull_sent_changes The number of changes included in a response pack in PushPull API. Counter
pushpull_snapshot_duration_seconds The creation time of snapshot in PushPull API. Histogram
pushpull_snapshot_bytes The number of bytes of Snapshot. Counter

Why is this needed:

The metrics can be used for real-time monitoring and debugging.

Got 'panic: offset should be less than or equal to length' when inserting emoji

What happened:
I got a panic error in the agent when trying to insert emoji in the Quill example.

What you expected to happen:
No error occurred.

How to reproduce it (as minimally and precisely as possible):
Try to insert/copy-paste an emoji, then repeat it again 1 more time.

Anything else we need to know?:
Below are the logs snippet:

Attaching to docker_yorkie_1
yorkie_1  | 2020-12-01T13:22:53.728Z	INFO	connected, URI: mongodb://mongo:27017, DB: yorkie-meta
yorkie_1  | 2020-12-01T13:22:53.729Z	INFO	serving API on 9090
yorkie_1  | 2020-12-01T13:23:07.182Z	INFO	stream "/api.Yorkie/WatchDocuments" => ok
yorkie_1  | 2020-12-01T13:23:07.347Z	INFO	RPC : "/api.Yorkie/ActivateClient" 3.6611ms
yorkie_1  | 2020-12-01T13:23:07.385Z	INFO	RPC : "/api.Yorkie/AttachDocument" 13.0884ms
yorkie_1  | 2020-12-01T13:23:07.436Z	INFO	PUSH: '5fc643bb3e80833e393fe425' pushes 1 changes into 'examples$quill', rejected 0 changes, serverSeq: 0 -> 1, cp: serverSeq=1, clientSeq=1
yorkie_1  | 2020-12-01T13:23:07.442Z	INFO	RPC : "/api.Yorkie/PushPull" 12.6078ms
yorkie_1  | 2020-12-01T13:23:07.449Z	INFO	SNAP: 'examples$quill', serverSeq:1 6.3741ms
yorkie_1  | 2020-12-01T13:23:34.218Z	INFO	PUSH: '5fc643bb3e80833e393fe425' pushes 1 changes into 'examples$quill', rejected 0 changes, serverSeq: 1 -> 2, cp: serverSeq=2, clientSeq=2
yorkie_1  | 2020-12-01T13:23:34.229Z	INFO	RPC : "/api.Yorkie/PushPull" 15.6009ms
yorkie_1  | 2020-12-01T13:23:34.241Z	INFO	SNAP: 'examples$quill', serverSeq:2 11.547ms
yorkie_1  | 2020-12-01T13:23:36.263Z	INFO	PUSH: '5fc643bb3e80833e393fe425' pushes 1 changes into 'examples$quill', rejected 0 changes, serverSeq: 2 -> 3, cp: serverSeq=3, clientSeq=3
yorkie_1  | 2020-12-01T13:23:36.271Z	INFO	RPC : "/api.Yorkie/PushPull" 11.3318ms
yorkie_1  | 2020-12-01T13:23:36.277Z	INFO	SNAP: 'examples$quill', serverSeq:3 5.9041ms
yorkie_1  | 2020-12-01T13:23:37.293Z	INFO	PUSH: '5fc643bb3e80833e393fe425' pushes 1 changes into 'examples$quill', rejected 0 changes, serverSeq: 3 -> 4, cp: serverSeq=4, clientSeq=4
yorkie_1  | 2020-12-01T13:23:37.300Z	INFO	RPC : "/api.Yorkie/PushPull" 11.0282ms
yorkie_1  | 2020-12-01T13:23:37.305Z	ERROR	[0:0:00:0 {} ""][3:1:25:0 {} "🎁"][1:1:25:0 {} "
yorkie_1  | "]
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RGATreeSplit).splitNode
yorkie_1  | 	/app/pkg/document/json/rga_tree_split.go:299
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RGATreeSplit).findNodeWithSplit
yorkie_1  | 	/app/pkg/document/json/rga_tree_split.go:270
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RGATreeSplit).edit
yorkie_1  | 	/app/pkg/document/json/rga_tree_split.go:371
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RichText).Edit
yorkie_1  | 	/app/pkg/document/json/rich_text.go:191
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/operation.(*RichEdit).Execute
yorkie_1  | 	/app/pkg/document/operation/rich_edit.go:59
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/change.(*Change).Execute
yorkie_1  | 	/app/pkg/document/change/change.go:48
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document.(*InternalDocument).applyChanges
yorkie_1  | 	/app/pkg/document/internal_document.go:204
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document.(*InternalDocument).ApplyChangePack
yorkie_1  | 	/app/pkg/document/internal_document.go:108
yorkie_1  | github.com/yorkie-team/yorkie/yorkie/packs.storeSnapshot
yorkie_1  | 	/app/yorkie/packs/pack_service.go:367
yorkie_1  | github.com/yorkie-team/yorkie/yorkie/packs.PushPull.func1
yorkie_1  | 	/app/yorkie/packs/pack_service.go:119
yorkie_1  | github.com/yorkie-team/yorkie/yorkie/backend.(*Backend).AttachGoroutine.func1
yorkie_1  | 	/app/yorkie/backend/backend.go:101
yorkie_1  | panic: offset should be less than or equal to length
yorkie_1  | 
yorkie_1  | goroutine 56 [running]:
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RGATreeSplit).splitNode(0xc000442240, 0xc00042eaa0, 0x2, 0x20)
yorkie_1  | 	/app/pkg/document/json/rga_tree_split.go:300 +0x265
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RGATreeSplit).findNodeWithSplit(0xc000442240, 0xc000424c10, 0xc0004421c0, 0x40f1b0, 0x140e800)
yorkie_1  | 	/app/pkg/document/json/rga_tree_split.go:270 +0xc5
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RGATreeSplit).edit(0xc000442240, 0xc000424bf0, 0xc000424c10, 0xc000423d70, 0xf3d9a0, 0xc0004423e0, 0xc0004421c0, 0xc00008ac30, 0xc000400700)
yorkie_1  | 	/app/pkg/document/json/rga_tree_split.go:371 +0x5a
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/json.(*RichText).Edit(0xc0004440c0, 0xc000424bf0, 0xc000424c10, 0xc000423d70, 0xc00041988c, 0x4, 0x0, 0xc0004421c0, 0x13bef28, 0xc6fa20)
yorkie_1  | 	/app/pkg/document/json/rich_text.go:191 +0x205
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/operation.(*RichEdit).Execute(0xc0004306c0, 0xc0004423a0, 0xc0004440f0, 0xc00041f380)
yorkie_1  | 	/app/pkg/document/operation/rich_edit.go:59 +0xd8
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document/change.(*Change).Execute(0xc000430700, 0xc0004423a0, 0xcee780, 0xb7decc)
yorkie_1  | 	/app/pkg/document/change/change.go:48 +0x70
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document.(*InternalDocument).applyChanges(0xc000430940, 0xc000010bd8, 0x1, 0x1, 0x7f130c5fba30, 0x0)
yorkie_1  | 	/app/pkg/document/internal_document.go:204 +0x66
yorkie_1  | github.com/yorkie-team/yorkie/pkg/document.(*InternalDocument).ApplyChangePack(0xc000430940, 0xc00026fdf0, 0xc000418ec9, 0x5)
yorkie_1  | 	/app/pkg/document/internal_document.go:108 +0x325
yorkie_1  | github.com/yorkie-team/yorkie/yorkie/packs.storeSnapshot(0xf3b420, 0xc00003a018, 0xc0002a48c0, 0xc0000d5280, 0x0, 0xc000378500)
yorkie_1  | 	/app/yorkie/packs/pack_service.go:367 +0x32e
yorkie_1  | github.com/yorkie-team/yorkie/yorkie/packs.PushPull.func1()
yorkie_1  | 	/app/yorkie/packs/pack_service.go:119 +0x3d1
yorkie_1  | github.com/yorkie-team/yorkie/yorkie/backend.(*Backend).AttachGoroutine.func1(0xc0002a48c0, 0xc0004239e0)
yorkie_1  | 	/app/yorkie/backend/backend.go:101 +0x55
yorkie_1  | created by github.com/yorkie-team/yorkie/yorkie/backend.(*Backend).AttachGoroutine
yorkie_1  | 	/app/yorkie/backend/backend.go:99 +0x157

Environment:

  • Operating system: macOS
  • Browser and version: Google Chrome 87.0.4280.67
  • Yorkie version (use yorkie version): master branch
  • Yorkie JS SDK version: master branch

Update to version Go 1.16

Description:

Go 1.16 has recently been released. We need to check about changes from 1.13 to 1.16 and reflect the changes on the Yorkie.

Why:

We can reflect on the latest Go improvements and keep an up-to-date code.

Add MoveOperation

Recently We added a kanban board example in the demo page. And we have a plan to implement the drag-and-drop feature with Array.insertAfter.

https://yorkie.dev/demo

Array.insertAfter adds a new item after the given position, so when the users drag-and-drop at the same time, the items may be duplicated. This problem is explained in figure 2 of the paper below.

Screen Shot 2020-04-08 at 15 21 25

From https://martin.kleppmann.com/papers/list-move-papoc20.pdf

As described in the paper, I think we can solve this problem by creating a MoveOperation that changes the position of an existing item with last-writer wins(LWW) policy.

Support Multi-User Undo/Redo

Description:

Written by @hyemmie

About multi-user undo/redo

When specifying Undo/Redo functionality, the following criteria could be considered.

  1. How to select an action to cancel? ⇒ linear vs selective
  • linear: Users can't choose which actions to undo, and by default, they can undo and redo in reverse order, starting with the most recent action. If user want to cancel a specific action that is not the most recent, user need to cancel all of them, starting with the most recent and ending with that action.
  • selective: Users can randomly select an action to cancel among all actions and undo and redo only that action.
  1. Does a single user have permission to undo/redo all actions in collaborative editing environment? ⇒ local vs global
  • local: Users can only undo and redo their own actions.
  • global: Users can undo and redo the actions of all participants in a collaborative edit.
  Local Global
Linear Undo one’s own action in reverse chronological order from newest to oldest. (ex. Automerge, Google Docs, Figma) Undo all users' actions in reverse chronological order from newest to oldest . (ex. CodePair)
Selective Select and undo any of one’s own actions. (ex. Azurite) Select and undo any of all users’ actions.

For Yorkie, we don't need selective functionality at this point, but rather the ability for users to undo/redo only their own work.

Discussion

To provide multi-user (”local-linear”) undo/redo, we can start with the following tasks.

  1. Implement data structure for multi-user (”local-linear”) undo/redo.
    This task could be done in two ways.
  • Implement Individual data structure for PoC and integrate it with Yorkie’s RGATreeSplit.
  • Implement additional undo/redo feature to current RGATreeSplit.
  1. Design the overall architecture.
    We'll need to combine ideas on how to manage and store tasks separately for each user, how to integrate with your current database, and more into a single architecture.

Here are some reference implementations you might want to check out.
Reference:

Why:

To eliminate the user experience of canceling another user's operation, Yorkie needs to support user-specific undo/redo.

duplicate key error collection: yorkie-meta.changes

The following error occurred during the demonstration of the codemirror example yesterday.

bulk write error: [{[{E11000 duplicate key error c…149cedbfd3e3c03'), server_seq: 835 }}]}, {<nil>}]

Cause & Background

PushPull saves changes sent by the client then returns changes that the client did not receive in server. After processing PushPull, the server saves changes, the document, the client in the DB in order. After saving the changes, the final seq of the changes is saved in the document. We didn't use transactions here. Because we intended to overwrite changes even if the server failed to update seq in the document after saving the changes.

Solution

We should use upsert instead of Insert in the logic below for overwriting changes.

https://github.com/hackerwins/yorkie/blob/master/yorkie/backend/mongo/client.go#L289

Provide Authorization Webhook

What would you like to be added:

Provide authorization WebHook.

Webhook is an HTTP POST that is called when something happens. When specified, Authorization Webhook causes Agent to query an outside REST service when determining user privileges.

We can specify the webhook by the --authorization-webhook=SOME_URL flag:

./bin/yorkie agent --authorization-webhook=http://localhost:3000/auth-hook

And we can pass tokens when creating a client.

yorkie.createClient('http://api.yorkie.dev', {
  token: SOME_TOKEN,
});

When the client sends every request, it passes the token to Agent. The Agent who receives the token calls WebHook before processing the requests.

We need to define the payload of the webhook:

// request
{
  token: '',
  documentAttributes: [{
    key: 'col1$doc1',
    verb: 'r' // 'r' or 'rw'
  }, {
    key: 'col1$doc2',
    verb: 'rw'
  }]
}

// response
{
  allowed: false,
  reason: 'user does not have read access to the documents'
}

AuthorizationWebhook

flags description default
authorization-webhook URL of remote service to query. null
authorization-methods Target methods to check authorization e.g: AttachDocument,DetachDocument,WatchDocuments,PushPull all
authorization-cache-ttl The duration to cache 'authorized' responses from the webhook authorizer. 0s

References

Why is this needed:

  • We can determine user privileges for documents.

Introduce Snapshot

Problem

Yorkie allows each client to exchange operations with the server. This causes a performance problem when a client attaches a document that has many accumulated operations.

Operations vs Snapshot in counter example:

  • operation: (+1), (+1), (+1), (+1), (+1), (+1), (+1)
  • snapshot: 7

Solution

The server stores snapshot at specific intervals then send a snapshot when the client attaches a document.

Support time travel

Yorkie internally saves a history of all changes in documents. This enables time travel which is looking at the document state at past points in time.

The current state of the Yorkie system is as follows:

Past to Present
For now, we have not introduced inverse operations yet, we can only apply operations from the past to the present.

Screen Shot 2021-06-13 at 3 24 56 PM

Changes and Snapshots

Currently, Client only has a snapshot of the current state and changes that have not been sent to Agent. Since client data is not enough, we need to use Snapshots and Changes stored on Agent for time travel.

For example, if Client A wants to go back to S7, it needs S5, C6, and C7.

Screen Shot 2021-06-13 at 3 40 50 PM

Proposal

We can think of a document as a pointer to a specific location in a Change. Using this, we can create two features, fork and peek.

Screen Shot 2021-06-15 at 3 20 12 AM

const doc1 = client.createDocument('docs', 'doc1');

...

// serverSeq is the position in the order of the last change stored in Agent.
const serverSeq = doc1.getCheckpoint().getServerSeqAsString(); // for example: 4

// peek returns a snapshot of the change at a specific location from Agent.
const snapshot = await client.peek(doc1, serverSeq.divide(2));

// Create a new document that forked from a specific location in doc1.
const doc2 = await client.fork('docs', 'doc2', doc1, serverSeq.divide(2)); // for example: 2
doc2.update((root) => {
  root.hello = 'world';
});

Modify to use default values ​​when there is no metrics field in config

We recently added a feature to serve Prometheus metrics. If there is no metrics field in the config after this commit, we are getting the error below.

{
  "RPC": {
    "Port": 9090,
    "CertFile": "",
    "KeyFile": ""
  },
  "Mongo": {
    "ConnectionTimeoutSec": 5,
    "ConnectionURI": "mongodb://localhost:27017",
    "YorkieDatabase": "yorkie-meta",
    "PingTimeoutSec": 5
  },
  "Backend": {
    "SnapshotThreshold": 500
  }
}

image

If the user doesn't specify the metrics field, we can handle it in one of two ways to avoid errors.

  • Run the metrics handler with the default port value
  • Do not execute metrics handler

Primitive's DeepCopy() method does not actually make a deep copy

While analyzing the code, I noticed that the DeepCopy() method of the Primitive structure is not actually doing a deep copy.
Is this intended code because there is no problem with this?
Or can problems arise?

If we need to modify this code, can I pull request it?

Here I attach the test code that I ran temporarily.

package json_test

import (
	"fmt"
	"testing"

	"github.com/stretchr/testify/assert"
	"github.com/yorkie-team/yorkie/pkg/document/json"
	"github.com/yorkie-team/yorkie/pkg/document/time"
)

func TestPrimitive(t *testing.T) {
	t.Run("DeepCopy test", func(t *testing.T) {
		origin := json.NewPrimitive(10, time.NewTicket(0, 0, time.InitialActorID))
		copied := origin.DeepCopy()

		originAddr := fmt.Sprintf("%p", origin)
		copiedAddr := fmt.Sprintf("%p", copied)
		assert.NotEqual(t, originAddr, copiedAddr)
	})
}
--- FAIL: TestPrimitive (0.00s)
    --- FAIL: TestPrimitive/DeepCopy_test (0.00s)
        primitive_test.go:20: 
                Error Trace:    primitive_test.go:20
                Error:          Should not be: "0xc00010db00"
                Test:           TestPrimitive/DeepCopy_test
FAIL
FAIL    github.com/yorkie-team/yorkie/pkg/document/json 0.401s
FAIL

Build binaries for environments when releasing a new version

What would you like to be added:

Build binaries for environments when releasing a new version.

  • yorkie-v0.1.4-darwin-amd64.zip
  • yorkie-v0.1.4-linux-amd64.tar.gz
  • yorkie-v0.1.4-linux-arm64.tar.gz
  • yorkie-v0.1.4-linux-ppc64le.tar.gz
  • yorkie-v0.1.4-windows-amd64.zip

etcd seems to build with this script.

Why is this needed:

  • Users can download and use Yorkie without building for their environment.

SDKs for various environments

Description:

We provide Go Client and JS SDK. It would be good to implement the SDK in the following order.

  • yorkie: Go client
  • yorkie-js-sdk
  • yorkie-rust-sdk
  • yorkie-ios-sdk
  • yorkie-android-sdk

Why:

Users can use Yorkie in a variety of environments.

distributed pubSub

Hi.
I am starting with gRPC, and while looking at how to implement subscription, I stumbled upon your package. it is a great resource and thank you for making it open source.

As a next step, I too want to know how can I keep track if subscription and the associated stream if there is more than one node. I see this comment here

// TODO: Temporary PubSub.
// - We will need to replace this with distributed pubSub.

can you give me some pointer how would you approach this?
I plan to run my app on kubernetes, and not sure how can can pods communicate to know where is the subscription. Should there also be a central registry.

Thank you.

Improve Sync method of Client

While analyzing the code, I noticed that the Sync method of Client works synchronously inside a loop.
I thought if sync method work asynchronously, it will work much faster.

So I tested it through a prototype. (Link)
And through this test, it showed more than 3 times higher performance when operated asynchronously.
(Test Code)

## Deliberately manipulated the test failure to view the log.
## It is not affecting the current test cases.

--- FAIL: TestClientAndDocument (8.87s)
    --- FAIL: TestClientAndDocument/Synchronous_and_asynchronous_performance_comparison_of_Sync_method_test (0.47s)
        client_test.go:913: Async : 83606000 nanosecond
        client_test.go:921: Sync : 286075000 nanosecond
        client_test.go:922: 
                Error Trace:    client_test.go:922
                Error:          Not equal: 
                                expected: true
                                actual  : false

However, I am not sure if it is safe to operate asynchronously.
What do you think about this? I need advice.

The events exposed when subscribing a client are different from `yorkie-js-sdk`

@habibrosyad informed us that the events exposed in #108 PR discussion are different. 👍
I tested the actual Watch() event, you can see that the contents are different.

# js:
Object{name: 'documents-watching-peer-changed', value: Object{test-col$Can watch documents-1609554095519: [...]}}

# go:
{EventType:documents-watched Keys:[0xc0002a45e0] Err:<nil>}

It might be a good idea to solve this problem before implementing the peer awareness feature.

Improve Peer Awareness's metadata to be updatable

What would you like to be added:

Realtime collaborative editors usually need to propagate volatile data, such as cursor position to other peers. This type of data needs to be propagated, but not stored.

For now, we implemented Peer Awareness to propagate the metadata of the client, but updates are not possible. We can cover this scenario by improving the Peer Awareness's metadata so that it can be updated.

Why is this needed:

The cost of storing data is expensive, so it is economical to propagate this type of data but not store it.

Remove panic from server code

Currently, panic is called when an exception such as an error occurs in the server code. The server goes down, but it is easier to grasp the error situation.

However, in the production environment, it should be changed to return an error instead of panic so that the server does not stop.

Garbage collection

CRDT only changes flag, the tombstone when an element is deleted to avoid breaking when concurrent editing occurs. Even if the user deletes an element it still takes up space in memory. So after a certain point, we have to delete elements the tombstone marked.

If all clients attached to the document, have received all the changes for a particular checkpoint, we can delete the elements that have deleted before that checkpoint.

The DB stores checkpoints to keep track of the point of the change received by clients.

Adjust location of unit tests to suit the scope of responsibility

Currently, Yorkie's test coverage is relatively low at 56%. This is due to the go cover's coverage measurement method. The default measurement method is for each test to analyze only the package being tested(https://golang.org/cmd/go/#hdr-Testing_flags, Using -coverpkg applies coverage analysis in each test to packages matching the patterns). Some packages are being tested outside.

So @dc7303 suggested increasing coverage by adjusting the location of unit tests to suit the scope of responsibility of the package.

Delete a node from a heap by in-place way

PriorityQueue.Release was intended to delete the given node from the heap. It was difficult to find and delete the target node from the heap, so we added all nodes to the new heap excluding that node.

And @mojosoeun introduced a way to delete in-place in the JS SDK. It turns out that deleting a node from a heap in-place way is easier to solve than we thought.

I would like to Go to introduce this method as well.

For more about deleting a node from a heap:
http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/9-BinTree/heap-delete.html

Remove obsolete clients

Obsolete clients are clients that will not be used again. If the client's last access time is older than a certain threshold, we can determine whether it is obsolete or not. The threshold may vary by applications or services, so it should be set as an option by the user.

The reasons for deleting obsolete clients are:

  • We can prevent the clients collection from growing
  • If a document is attached by the obsolete client, we cannot purge removed elements that didn't pull by the obsolete client in the garbage collection

Relevant ticket: #3

Introducing limits

What would you like to be added:

Yorkie has no limits yet. It would be nice to add limits as below.

Key Name Description
max-document-bytes Maximum document size Yorkie saves snapshots for quick document rebuilds. If the snapshot of the document gets bigger, it may exceed the limit of DB. The size of one field must not exceed the maximum size of a document.

Why is this needed:

The service can be stably provided with restrictions that match the characteristics.

Support HA servers in active-active manner

Currently Yorkie only provides a single server. Servers can always be down, we need to set up the HA servers. Unlike other servers, Yorkie will receive many edit operations, so we will configure it in an active-active manner.

If we handle the following tasks, It seems like we can achieve set up HA servers.

  • Introduce distributed lock with lease
  • Introduce distributed PubSub

Where is updatedAt used?

While implementing Increase operation(#12), I became curious about the updatedAt property of data type in json package.

I expected this for this property. For example,

err := doc.Update(func(root *proxy.ObjectProxy) error {
			root.SetNewText("k1")
				.Edit(0, 0, "ABCD")
			text := root.GetText("k1")
			u := text.UpdatedAt()
			assert.NotNil(t, u)
			return nil
	})

As in this example, I thought that updatedAt would be modified when a modification occurs, but updatedAt is nil.
And I looked for where the UpdateAt() and SetUpdatedAt() methods are used, but I couldn't find them.

Is this missing implementation? Or am I wrong about updatedAt?

It seems that GC of Array and Object is not working properly.

I am currently implementing GC of Text and RichText.
However, there was a problem with the implementation.

The problem is that container objects such as root and element are copied and used inside Yorkie.
So using objects in 'operation.Execute' and the object used by the client are different ones.
(Here, the object used by the client means the object returned to root.GetXXX () in the callback of 'document.Update')

And for this reason, it is difficult to track the garbage nodes to be collected.

In this process, I wondered if the GC previously implemented in Array and Object is working properly.
So I analyzed it.

For the modifications below, refer to the changes marked with ####. And you can test with the modified code.

The changes below were made to keep a history.
You can check it more quickly by referring to my work branch. This branch may be removed later.

#### Add doc.Update in last line of 'garbage collection test'
+++ b/pkg/document/document_test.go
@@ -126,6 +126,14 @@ func TestDocument(t *testing.T) {
 
 		assert.Equal(t, 4, doc.GarbageCollect(time.MaxTicket))
 		assert.Equal(t, 0, doc.GarbageLen())
+		fmt.Println("-------- delete array and garbage collect ---------")
+
+		err = doc.Update(func(root *proxy.ObjectProxy) error {
+			arr := root.GetArray("2")
+			fmt.Printf("[Test.arr] %p | %s \n", arr, arr.Marshal())
+			root.Delete("1")
+			return nil
+		}, "deletes 2")
 	})
 
 	t.Run("garbage collection test 2", func(t *testing.T) {
 	
 	


#### Node.isRemoved() check logic comment processing in '(rht *RHTPriorityQueueMap) Get()' function
#### Fix not to check isRemove() in '(rht *RHTPriorityQueueMap) Elements()'
+++ b/pkg/document/json/rht_pq_map.go
@@ -81,9 +81,9 @@ func (rht *RHTPriorityQueueMap) Get(key string) Element {
 	}
 
 	node := queue.Peek().(*RHTPQMapNode)
-	if node.isRemoved() {
-		return nil
-	}
+	//if node.isRemoved() {
+	//	return nil
+	//}
 	return node.elem
 }
 
@@ -141,9 +141,11 @@ func (rht *RHTPriorityQueueMap) Elements() map[string]Element {
 		if queue.Len() == 0 {
 			continue
 		}
-		if node := queue.Peek().(*RHTPQMapNode); !node.isRemoved() {
-			members[node.key] = node.elem
-		}
+		//if node := queue.Peek().(*RHTPQMapNode); !node.isRemoved() {
+		//	members[node.key] = node.elem
+		//}
+		node := queue.Peek().(*RHTPQMapNode)
+		members[node.key] = node.elem
 	}
 
 	return members




#### Add code to check address and status of removedElementPair elements in 'GarbageCollecㅅ()'
+++ b/pkg/document/json/root.go
@@ -17,6 +17,8 @@
 package json
 
 import (
+	"fmt"
+
 	"github.com/yorkie-team/yorkie/pkg/document/time"
 )
 
@@ -96,6 +98,8 @@ func (r *Root) GarbageCollect(ticket *time.Ticket) int {
 
 	for _, pair := range r.removedElementPairMapByCreatedAt {
 		if pair.elem.RemovedAt() != nil && ticket.Compare(pair.elem.RemovedAt()) >= 0 {
+			fmt.Printf("[root.GarbageCollect.parent] %p | %s\n", pair.parent, pair.parent.Marshal())
+			fmt.Printf("[root.GarbageCollect.elem] %p | %s\n", pair.elem, pair.elem.Marshal())
 			pair.parent.Purge(pair.elem)
 			count += r.garbageCollect(pair.elem)
 		}




#### Add output code to check parent and element address and status when Edit Operation is executed
+++ b/pkg/document/operation/remove.go
@@ -17,6 +17,8 @@
 package operation
 
 import (
+	"fmt"
+
 	"github.com/yorkie-team/yorkie/pkg/document/json"
 	"github.com/yorkie-team/yorkie/pkg/document/time"
 )
@@ -46,6 +48,8 @@ func (o *Remove) Execute(root *json.Root) error {
 	case *json.Object:
 		elem := parent.DeleteByCreatedAt(o.createdAt, o.executedAt)
 		root.RegisterRemovedElementPair(parent, elem)
+		fmt.Printf("[remove.Execute.parent] %p | %s\n", parent, parent.Marshal())
+		fmt.Printf("[remove.Execute.elem] %p | %s\n", elem, elem.Marshal())
 	case *json.Array:
 		elem := parent.DeleteByCreatedAt(o.createdAt, o.executedAt)
 		root.RegisterRemovedElementPair(parent, elem)
 		
 		
 		
 		
#### After calling'GarbageCollect()', it is necessary to check the state of the object used by the client,
#### so add the code to the'GetArray()' function
+++ b/pkg/document/proxy/object_proxy.go
@@ -17,6 +17,7 @@
 package proxy
 
 import (
+	"fmt"
 	time2 "time"
 
 	"github.com/yorkie-team/yorkie/pkg/document/change"
@@ -174,6 +175,7 @@ func (p *ObjectProxy) GetObject(k string) *ObjectProxy {
 }
 
 func (p *ObjectProxy) GetArray(k string) *ArrayProxy {
+	fmt.Printf("[object_proxy.GetArray.Object] %p | %s\n", p.Object, p.Object.Marshal())
 	elem := p.Object.Get(k)
 	if elem == nil {
 		return nil

If you run the 'garbage collection test' test of 'document_test.go' after modification, the following log is displayed.

[remove.Execute.parent] 0xc00000fb20 | {"1":1,"2":[1,2,3],"3":3}
[remove.Execute.elem] 0xc00000fec0 | [1,2,3]
[root.GarbageCollect.parent] 0xc00000fb20 | {"1":1,"2":[1,2,3],"3":3}
[root.GarbageCollect.elem] 0xc00000fec0 | [1,2,3]
-------- delete array and garbage collect ---------
[object_proxy.GetArray.Object] 0xc00000fba0 | {"1":1,"2":[1,2,3],"3":3}
[Test.arr] 0xc0002d4910 | [1,2,3] 
[remove.Execute.parent] 0xc00000fb20 | {"1":1,"3":3}
[remove.Execute.elem] 0xc000223530 | 1
....

The important thing in this log is after 'delete array and garbage collect' is printed.
Object_proxy.GetArray.Object and Test.arr are objects used by the client.
The object used by the client still has a value of "2".
However, for the object used in the remove operation, you can see that the value "2" has been removed by the garbage collector.

I don't know if it's intended to behave like this.
However, it doesn't seem like a complete garbage collection.
I'm curious about your opinion.

Change Document.GarbageCollect to private method

func (d *Document) GarbageCollect(ticket *time.Ticket) int {

Currently, Document.GarbageCollect() is exposed to the outside. This method is not provided to the user, it needs to be hidden.

We are moving the responsibility for testing GarbageCollect to the JSON package through issue #118.
And I think Document.GarbageCollect() can be hidden when document_test.go is cleaned up. What do you think? I'm curious about your opinion.

Add flags that can set all configurations of Agent

What would you like to be added:

Currently, users can only set the configurations by passing the config file flag(-c). We need to add flags to agent cmd so that we can set all the configurations of the agent individually.

Why is this needed:

It would be convenient if users could directly set the agent's configuration as flags in a deployment environment such as Docker Compose or Kubernetes.

Add increase operation for number types to use as a counter

Increase operation is an operation that can be used when implementing functions such as the Facebook Like button. We can also use Increase operation as Decrease operation if we handle negative values.

The following are examples of APIs provided to users.

err = doc.Update(func(root *proxy.ObjectProxy) error {
	root.SetInteger("age", 36)
	root.GetInteger("age").Increase(1)
	return nil
}, "The day after your birthday")

The operation is relatively easy to implement because there is no conflict between operations created between different clients. And I think implementing Increase operation will help you understand Yorkie's overall data flow.

It would be helpful to refer to other operations to implement the operation.
https://github.com/yorkie-team/yorkie/tree/master/pkg/document/operation

Change `XXXTimeoutSec` flags in CLI to duration

What would you like to be added:

Change XXXTimeoutSec flags in CLI to duration so that users can input the flags in 5s or 5m.

yorkie/pkg/cli/agent.go

Lines 50 to 51 in 1200755

conf.Mongo.ConnectionTimeoutSec = time.Duration(mongoConnectionTimeoutSec)
conf.Mongo.PingTimeoutSec = time.Duration(mongoPingTimeoutSec)

# AS-IS
yorkie agent --mongo-connection-timeout-sec 60

# TO-BE
yorkie agent --mongo-connection-timeout 1m

We can easily convert it using ParseDuration.

Why is this needed:

  • Users can set durations in flags more conveniently.

request: write contribution.md

There is no contribution guideline yet, so we'd better make it.

Example: When committing a new code, write the test code together.

Replace ActorID type with UUID or Long instead of String to reduce the size of CRDT meta

The sizes of the types:

Name Binary Size String Size Features
[UUID] 16 bytes 36 chars configuration free, not sortable
[shortuuid] 16 bytes 22 chars configuration free, not sortable
[Snowflake] 8 bytes up to 20 chars needs machine/DC configuration, needs central server, sortable
[ObjectID] 12 bytes 24 chars <-- currently we use configuration free, sortable
xid 12 bytes 20 chars configuration free, sortable
uint64 8 bytes N/A configuration free, needs central server, sortable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.