marceloboeira / bojack Goto Github PK

View Code? Open in Web Editor NEW

105.0 8.0 13.0 423 KB

🐴 The unreliable key-value store

Home Page: http://medium.com/@marceloboeira/why-you-should-build-your-own-nosql-database-9bbba42039f5

License: MIT License

Crystal 98.45% Makefile 1.55%

bojack crystal database redis nosql ruby data-storage storage data-store store

bojack's People

Contributors

Stargazers

Watchers

Forkers

joaocv3 hugoabonizio mauricioabreu gitter-badger veelenga dscottboggs iamsingularity chipper1 moonjoaws stjordanis 0culsymaresji vishalmcf

bojack's Issues

[Build Error] console.cr

I'm having trouble compiling bojack for my system:
Linux lawliet 4.8.0-51-generic #54~16.04.1-Ubuntu SMP Wed Apr 26 16:00:28 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Error in src/bojack/bootstrap.cr:3: instantiating 'BoJack::CLI:Class#run(Array(String))'

BoJack::CLI.run(ARGV)
            ^~~

in src/bojack/cli.cr:10: instantiating 'Commander::Command:Class#new()'

      cli = Commander::Command.new do |command|
                               ^~~

in src/bojack/cli.cr:10: instantiating 'Commander::Command:Class#new()'

      cli = Commander::Command.new do |command|
                               ^~~

in src/bojack/cli.cr:71: instantiating 'Commander::Commands#add()'

        command.commands.add do |command|
                         ^~~

in src/bojack/cli.cr:71: instantiating 'Commander::Commands#add()'

        command.commands.add do |command|
                         ^~~

in src/bojack/cli.cr:91: instantiating 'BoJack::Console:Class#new(String, (Int32 | Int64))'

            BoJack::Console.new(options.string["hostname"], options.int["port"]).start
                            ^~~

instance variable '@client' of BoJack::Console must be BoJack::Client, not Nil

Error: instance variable '@client' is initialized inside a begin-rescue, so it can potentially be left uninitialized if an exception is raised and rescued
Makefile:6: recipe for target 'build' failed
make: *** [build] Error 1

The compiler alerts about a problem with:
@client = BoJack::Client.new(@hostname, @PORT) ## src/bojack/console.cr ##

Since it was initialized inside a begin, the compiler understands that it can eventually be used without being initialized, it pretty much ignores having an exit -1 in the rescue.

Only to solve the problem and move on, I changed the line @client = BoJack::Client.new(@hostname, @PORT) to BoJack::Client.new(@hostname, @PORT) in order to first test the connection and after the rescue, assuming that at that point the request was successful, I created the variable with the same parameters used in the test: @client = BoJack::Client.new(@hostname, @PORT)

For example:

  def initialize(@hostname : String = "127.0.0.1", @port : Int8 | Int16 | Int32 | Int64 = 5000)
      begin
        BoJack::Client.new(@hostname, @port)
      rescue exception
        puts exception.message
        exit -1
      end

      @client = BoJack::Client.new(@hostname, @port)
    end

This solved my problem despite knowing that it isn't the best solution.
So, anyway, I hope this helps in some way.

[Server] Unhandled exception (Missing key)

The set command needs 2 params, key and value. Any attempt to execute it without the params result on this:
e.g.: set a

Server log

Unhandled exception:
Missing hash key: :value (KeyError)
[4516704689] *raise<KeyError>:NoReturn +81
[4516754682] *Hash(Symbol, Array(String) | String)@Hash(K, V)#[]<Symbol>:(Array(String) | String) +3818
[4516734926] ~procProc(Nil)@./src/bojack/server.cr:29 +9566
[4516567462] *Fiber#run:(Int64 | Nil) +54

Already working on it

refactor: Command factory algorithm O(1) instead of O(n)

Hey @marceloboeira

I understood what you tried to reach on 555a0e4 commit. You wanted to left the command to identify itself, hold his keyword avoiding the long switch condition. I think it makes the things a little magic and you may forget to add it when you are creating a new command. (it has already happened :P a8d3403)

The common implementation of a Factory Pattern generally uses a Map to hold all the instances. I think it makes the algorithm cleaner and better cause cost O(1) instead of the O(n). Also, it makes the factory hold the whole logic. Thus if you need to know which keyword calls a specific command is easier to find in one place.

So I propose the follow implementation:

COMMANDS = {
  "get" => Bojack::Commands::Get,
  "set" => Bojack::Commands::Set,
  "delete" => Bojack::Commands::Delete,
  "size" => Bojack::Commands::Size
}

def self.from(keyword) : Bojack::Commands::Command?
   COMMANDS[keyword].new if COMMANDS.exists?(keyword)
   nil
end

What do you think @mauricioabreu, @hugoabonizio ?

[Refactor] Singleton Logger

Create a BoJack::Logger singleton class, that accepts the same options as the STD logger, however holds an instance of the Logger which can be accessed over every part of the project without the need of injecting the dependency for every component that needs access to it.

Probably this class knows about the formatter, and the file outputs, implemented on #22 #23 #26.

This way all this code goes to a unique class, instead of the workarounds we have currently. 👍

Unhandled "Connection reset by peer" in runtime

Error reading file: Connection reset by peer (Errno)
0x103ae3250: *raise<Errno>:NoReturn at ??
0x103b321c6: *TCPSocket+@IO::FileDescriptor#unbuffered_read<Slice(UInt8)>:Int64 at ??
0x103b3133d: *TCPSocket+@IO#gets:(String | Nil) at ??
0x103b3101f: ~procProc(Nil)@src/bojack/event_loop/message.cr:10 at ??
0x103ae5dcd: *Fiber#run:(IO::FileDescriptor | Nil) at ??

Use waffle.io to manage issues/tasks/roadmap/releases

I am 100% for using waffle, but would like your suggestion/input on that.

Please: @cristianoliveira / @mauricioabreu / @hugoabonizio / @joaocv3

If you are not familiar with, please: http://waffle.io

Handle Exceptions level (Runtime, Fatal)

In order to orchestrate the flow and prevent any unhandled error, it is important to create levels of possible errors.

BoJack::Exceptions::Runtime -> Runtime exception should return an error for the client, but does not affect any other client or requests. Examples: Invalid/Missing parameters,

BoJack::Exceptions::Fatal -> Unexpected low level exceptions, close/kill signals. This one closes the connection with the current socket and report the error.

Request timeouts

Feature

Use a global request timeouts setup

That would prevent any connection lock because of a Request is stuck.

Implementation

Currently:

Main Loop
  -> Message
     -> Channel(Request)

Channel Loop
   -> Request
     -> Command
        -> Response

After:

Main Loop
  -> Message
     -> Channel(Request)

Channel Loop
   -> Timeout Handler
     -> Request
       -> Command
         -> Response

Then the loop on the channel waits for the Request to finish or raise an exception if runs out of time.

Changes

Initially, the timeout setup will be defined only on server startup and will be shared among all connections:
bojack server [params] --timeout <value>

At some point it might be interesting to have a specific timeout per client, making it more flexible.
bojack console --timeout <value>

Or, as a command:
timeout <value>

@hugoabonizio @mauricioabreu @joaocv3 any thouths?

BUG: 100% CPU usage

Hi! First of all, congratulations on this project, I love BoJack Horseman show too!

I was writing a client and I think I found a bug on the server when the connections isn't properly closed. The server keeps trying to read from socket inside the loop and the CPU usage gets 100%. Probably break the loop when the request is nil will fix this problem.

[Bug] File `logo` not found

After compiled, BoJack looks for the logo file in the current folder, if not available raises an exception.

Solution: Don't use a file to read the logo from, use from the source file.

[Logger] New format

Currently the logger format is not very nice. We should make it something more human readable.
[bojack][host:port][%DATE%][%LEVEL%] Message

It is a suggestion, so I ask for the contributors to manifest they thoughts for the default logger message pattern.

[Logger] Set log level (severity)

This is already a Crystal::Logger feature, to define the severity, but we must provide a public API in order to define the severity of the log output.

e.g.:
bojack server --log-level debug || bojack server --log-level 0
bojack server --log-level info || bojack server --log-level 1
bojack server --log-level warn || bojack server --log-level 2
bojack server --log-level error || bojack server --log-level 3
bojack server --log-level fatal || bojack server --log-level 4

Reference: https://crystal-lang.org/api/0.18.7/Logger/Severity.html

Add increment command

Add a safe way to increment counters with BoJack.

Our "users" should not have to deal with concurrency when trying to create a safe counter with BoJack.

e.g.:

counter = client.get("counter")

if counter
  counter = counter.to_i
  counter += 1
  client.set("counter", counter)
end

That is completely unsafe if you have concurrent access, imagine this in a multi-threaded web-server.

For this I purpose the command increment $key so the users can rely upon us to increment the key value by 1.

Important:

Only valid keys can be incremented, to be valid the key MUST already exist and be MUST be able to be casted to Integer, meaning that it cannot be an Array or a non-numerical String.

[CLI] Handle Connection refused

When the server is not available we raise an error, where we could handle and show only a message instead of the stack.

feat: configurable server port

Would be great if we can set the port using a settings file.

Concurrency management

As discussed #17

It seems that we are not correctly implementing the "secure" routine for the TCP connection/request handler.

https://crystal-lang.org/docs/guides/concurrency.html

Currently we only spawn new connections and new requests, but we are not entirely sure if they do run, when do they run, or even how do they run. If one of those raises an error we are just going to ignore it probably. (before we were crashing everything for any sort of error, now we just ignore as an Unhandled exception). Also if the program finishes or crashes we don't make sure to handle or close incoming requests.

The next step should be implementing something with Fibers and Channels, to achieve a safe-way of handling the concurrency.

One or more fibers will handle the BoJack::Memory and the new connections, as also a channel will handle the requests itself, since we don't need to share the memory between then but only access the memory instance.

Crystal's guide already has a very nice example, with TCP Sockets:

require "socket"

channel = Channel(String).new
server = TCPServer.new
spawn do
  socket = server.accept

 while request = socket.gets
    channel.send(request)
  end
end

loop do
   request = channel.receive
   # handle the request
   ...
end

I believe this way we can achive something more reliable, because we will create something similar to a queue to handle the requests, even though we may hold a connection open for a long time on very huge concurrent access, at least we ensure its execution.

Refactor commands lookup/params

The way we lookup for commands and handle params today is a bit messy, and not scalable.

Mostly because we have created a contract for the command, establishing that every command needs the memory, a key and a value. Which was valid at some point but not anymore.

That causes several problems, among them we have some 'lost' commands here, the one who don't match the signature:

  params = Bojack::Params.from(request)
  bjcommand = Bojack::Command.from(params.command)

  if bjcommand
    response = bjcommand.execute(memory, params.key, params.value)
    socket.puts(response)
  elsif params.command == "ping"
    socket.puts("pong")
  elsif params.command == "close"
    socket.puts("closing...")
    socket.close
    break
  else
    socket.puts("error: '#{params.command}' is not a valid command")
  end

For further development I believe that handling this is important, new commands, for instance delete *, time ... will also not match the pattern, which in the future will lead to a huge effort on making this scalable.

My suggestion is to create an open structure to transport params from the server's 'network layer' to any given command. The command itself should declare internally its dependencies (params) and validate on runtime for missing or invalid params. This way we can manage in a more elegant way.

[Logger] Define output

Currently the logger is hardcode to the STDOUT, which may be not a very good option for every user, what about defining this in the CLI?

e.g.: bojack server --log ./shared/logs/bojack.log

BoJack would have to create a new file, with the given prefix and add a timestamp:

e.g.: ./shared/logs/bojack_201609201010.log

Strange behavior when reusing connection

I'm getting a strange result when benchmarking BoJack Client, the performance when I reuse the same TCPSocket is much worse than opening a new one for each command. The code is here.

             user     system      total        real
shared   0.000000   0.030000   0.030000 (  7.918069)
new      0.000000   0.010000   0.010000 (  0.010263)

I noticed that @marceloboeira is making some tests on branch benchmarks, are you having the same problem?

BoJack Client 🎉

BoJack Client is going to be the default client to access BoJack servers. It will be a built in client, on the same binary as the server.

Usage examples:
bojack client <server:127.0.0.1> <port:5000>

It should have a flow very much like the telnet, but should be a extensible wrapper, so we can evolve with syntax, autocomplete, colors and such in a near future.

References:

https://github.com/greyblake/crystal-icr

@hugoabonizio is going to handle the main development, however any pull-request is very much welcome.

This is going to be a blocker for the first release of BoJack.

Add TTL to keys

I think we should implement a Time To Live to keys. Adding a timeout to a key is an important feature to use BoJack as a caching backend, for example.

Increment isn't thread-safe

The increment command isn't safe between multiple connections by now. Since Crystal doesn't let users manage threads for IO operations the correct term can be concurrent-safe, but the thruth is that there are no mutex access for writing data yet. Running this sample:

require "bojack-client"

client = BoJack::Client.new
client.set "a", 1
1000.times do
  spawn do
    client.increment "a"
  end
end
puts client.get "a"

The result that I get is something between 320 ~ 360. Maybe this is related to #19.

[CLI] Hostname / Help conflict flag

Both help and hostname flags have the same "short" version to -h.

Usually the priority is for the help command.

We can either change the param name, or the short name for the flag.

RESP support

Hi!

Following the idea in #39, I was playing with RESP protocol parsing and I think that it would be pretty easy to add a Redis compatible mode to BoJack!

We can start the server with a bojack server --resp flag to start a server that understands Redis protocol. I've made some work here: https://github.com/hugoabonizio/resp.cr

WDYT @marceloboeira?

Format logs more atomically

Use something more like 2018-01-24 22:08:09.382 (utc always) instead of 2018-01-24 22:08:09 +01:00

Installer/Packages

This is a relevant thing we need for the first release.

Raw install
Brew package
Linux/Ubuntu?

Client always connect with 127.0.0.1:5000

The client code sets 127.0.0.1:5000 as default connection address instead of the given params from CLI.

Inconsistent error message for pop command

set a 1
1
append a 2
["1", "2"]
pop a
2
pop a
1
pop a
error: 'a' is not a valid key
get a
[]

When a has no elements, returns that a is an invalid key. However if you use the get command with the same key the return is an empty array, proving that the key IS valid, but empty.

The error message should be different when is a list is empty.

Next step: a GUI?

Hello @marceloboeira :)

If you plan to make bojack production ready, you may want to also support RESP so you will get a full compatible GUI like Redsmin that will offer bojack deployments free out-of-the-box real-time monitoring and alerting (on top of administrative tasks depending on your support of Redis commands) :)

For instance tile38 will be doing this to get a free monitoring system from day one :)

BoJack client shard

We need a special repository for the BoJack client shard, so that Crystal developers can install only the client, as we already have the Python client.

Client only, the console should remain under this repository.

@hugoabonizio If you want to be in charge of this you could create the bojack.cr repository that would be a client for BoJack, move the content from this repository to yours and also use the shard here to point to the Client class.

If you do so, I would like to ask to also rename the CLI command, from client to console to avoid any future misunderstandings.

I would like to as both @mauricioabreu and @hugoabonizio that the clients follow the release number of the server, so it is easy to identify compatible versions.