Git Product home page Git Product logo

971-robot-code's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

971-robot-code's Issues

OOM killer leaks messages

If the OOM killer kills a process, all it's handles get leaked. This is because the OOM killer doesn't run the robust futex cleanup to free the messages.

It actually looks like /proc/pid/stat has a start time for the thread. When opening the queue, we could look for any PIDs which are nonzero, see if they exist, and then check that the start time matches. If it does, it's pretty much guaranteed to be the same process, and is actually running.

Better timing report consumption

Create one or both of:

  • Command line tool to filter timing reports based on application name and display them (with various possibilities for how it is displayed--at a minimum just dumping the straight JSON).
  • Plot that breaks out timing report information in a useful manner.

Timestamp shared memory messages on observation/receipt

Currently, messages sent on shared memory channels are timestamped prior to actually being "sent" (https://github.com/frc971/971-Robot-Code/blob/master/aos/ipc_lib/lockless_queue.cc#L1028). While we should have the guarantees in place to ensure that messages do not get sent with out-of-order timestamps on a given channel, the current ordering does mean that a process listening on multiple channels could plausibly observe messages across channels out-of-order (which isn't supposed to happen).

If we instead figured out an appropriate way to have the first observer of a message timestamp it (this "observer" would be the first of any fetchers or watchers, as well as something that would run immediately after the send actually went through), then because of how the event processing happens on the listening side, it should no longer be possible to observe out of order events.

Add config validator rule

And a rule that creates a test to confirm that, for a given config:

  • Remote timestamp channels are all specified and specified correctly.
  • Logs for any given node are fully self-consistent (i.e., won't require --skip_missing_forwarding_entries)--note that this should be configurable, because you may only care about this for a few nodes.

Support configuring watchers

Sometimes it is helpful to only process the latest message with a watcher, or to disable "die when you get behind" behavior. Add configuration support to watchers to enable all this.

JSON to flatbuffer parsing error messages are hard to see

People consistently seem to struggle with identifying failures associated with the JSON->flatbuffers code. Some of this may be that the messages tend to not stand out much in the program output. Part of it is also that failures of the code tend to show up as segfaults rather than some sort of more coherent error message.

Ran out of signals

We have a watcher which blocks forever (different bug). This makes it so the event loop isn't able to process signals.

RT signals queue up. RLIMIT_SIGPENDING is the limit per process, which defaults to somewhere around 7k of them.

Once this happens, we are unable to wake up any process, and everything gets triggered off timers. We need a way to not accumulate signals forever.

JSON config should not accept timestamp_logger_nodes that different from source_node

Guided by Austin-- I was looking at the following code out of y2022_roborio.json:

{ "name": "/drivetrain", "type": "frc971.control_loops.drivetrain.Output", "source_node": "roborio", "frequency": 400, "max_size": 80, "num_senders": 2, "logger": "LOCAL_AND_REMOTE_LOGGER", "logger_nodes": [ "imu" ], "destination_nodes": [ { "name": "imu", "priority": 5, "timestamp_logger": "LOCAL_AND_REMOTE_LOGGER", "timestamp_logger_nodes": [ "imu" ], "time_to_live": 5000000 } ] },
The timestamp_logger_nodes should be the source_node, "roborio", rather than currently as "imu". Our json creation/merging shouldn't accept other choices than the source_node.

Build breaks when libbz2-dev is installed

cargo_raze__pcre fails to build. Digging in, the key lines in bazel-out/k8-opt/bin/external/cargo_raze__pcre/pcre_foreign_cc/CMake.log are:

-- Found BZip2: /usr/lib/x86_64-linux-gnu/libbz2.so (found version "1.0.8") 
-- Looking for BZ2_bzCompressInit
-- Looking for BZ2_bzCompressInit - not found
-- Found ZLIB: /dev/shm/bazel-sandbox.f856700afb979694adb03c9da2b7c0b9dacd7824b0655f8fe6ce9ad53a8e0492/linux-sandbox/1656/execroot/org_frc971/bazel-out/k8-opt/bin/external/cargo_raze__pcre/pcre.ext_build_deps/lib/libzlib.a (found version "1.2.11") 
-- Found Readline: /usr/include
-- Could not find OPTIONAL package Editline

We then explode with:

[ 83%] Building C object CMakeFiles/pcregrep.dir/pcregrep.c.o
^[[1m/dev/shm/bazel-sandbox.f856700afb979694adb03c9da2b7c0b9dacd7824b0655f8fe6ce9ad53a8e0492/linux-sandbox/1656/execroot/org_frc971/external/cargo_raze__pcre/pcregrep.c:69:10: ^[[0m^[[0;1;31mfatal error: ^[[0m^[[1m
      'bzlib.h' file not found^[[0m
#include <bzlib.h>
^[[0;1;32m         ^~~~~~~~~
^[[0m1 error generated. 
make[2]: *** [CMakeFiles/pcregrep.dir/build.make:76: CMakeFiles/pcregrep.dir/pcregrep.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:116: CMakeFiles/pcregrep.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Somehow, it looks like cmake is leaking outside the sandbox and finding things on the host when being built. Hmmm.

AOS <-> ROS bridge

Create some sort of bridge to allow bridging ROS nodes to AOS, to accommodate people who want to be able to experiment with both.

Exactly how this will look is not entirely clear, but the goal should be to be relatively easy to get working, rather than focusing on getting every single possible feature working.

Faster DARE solver

I thought you may be interested that I contributed a new DARE solver to WPILib that's 2.8x faster than SLICOT, the solver you use now, based on roboRIO benchmarking on a 5 state, 2 input differential drive LQR problem.

The gist of the algorithm from DOI 10.1080/00207170410001714988 is:

#include <Eigen/Cholesky>
#include <Eigen/Core>
#include <Eigen/LU>

template <int States, int Inputs>
Eigen::Matrix<double, States, States> DARE(
    const Eigen::Matrix<double, States, States>& A,
    const Eigen::Matrix<double, States, Inputs>& B,
    const Eigen::Matrix<double, States, States>& Q,
    const Eigen::Matrix<double, Inputs, Inputs>& R) {
  // [1] E. K.-W. Chu, H.-Y. Fan, W.-W. Lin & C.-S. Wang
  //     "Structure-Preserving Algorithms for Periodic Discrete-Time
  //     Algebraic Riccati Equations",
  //     International Journal of Control, 77:8, 767-788, 2004.
  //     DOI: 10.1080/00207170410001714988
  //
  // Implements SDA algorithm on p. 5 of [1] (initial A, G, H are from (4)).
  using StateMatrix = Eigen::Matrix<double, States, States>;

  StateMatrix A_k = A;
  StateMatrix G_k = B * R.llt().solve(B.transpose());
  StateMatrix H_k;
  StateMatrix H_k1 = Q;

  do {
    H_k = H_k1;

    StateMatrix W = StateMatrix::Identity() + G_k * H_k;
    auto W_solver = W.lu();

    StateMatrix V_1 = W_solver.solve(A_k);

    // Solve V₂Wᵀ = Gₖ for V₂
    //
    // V₂Wᵀ = Gₖ
    // (V₂Wᵀ)ᵀ = Gₖᵀ
    // WV₂ᵀ = Gₖᵀ
    // V₂ᵀ = W.solve(Gₖᵀ)
    // V₂ = W.solve(Gₖᵀ)ᵀ
    StateMatrix V_2 = W_solver.solve(G_k.transpose()).transpose();

    G_k += A_k * V_2 * A_k.transpose();
    H_k1 = H_k + V_1.transpose() * H_k * A_k;
    A_k *= V_1;
  } while ((H_k1 - H_k).norm() > 1e-10 * H_k1.norm());

  return H_k1;
}

The preconditions necessary for convergence are:

  1. Q is symmetric positive semidefinite
  2. R is symmetric positive definite
  3. The (A, B) pair is stabilizable
  4. The (A, C) pair where Q = CᵀC is detectable

The paper proves convergence under weaker conditions, but it seems to involve solving a generalized eigenvalue problem with the QZ algorithm. SLICOT and Drake use that to solve the whole problem, so it seemed too expensive to bother attempting.

The precondition checks turned out to be 50-60% of the total algorithm runtime, so WPILib exposed a function that skips them if the user knows they'll be satisfied. This would be a good candidate for your Kalman filter error covariance init code, since a comment in there mentioned it didn't use the DARE solver because it had unnecessary checks.

Here's WPILib's impl, which supports static sizing (for performance) and dynamic sizing (for JNI) and throws exceptions on precondition violations.
https://github.com/wpilibsuite/allwpilib/blob/main/wpimath/src/main/native/include/frc/DARE.h

I'd recommend std::expected instead of exceptions for your use case.

aos_dump doesn't respect maps

If you have /camera -> /pi1/camera on pi1, aos_dump /camera complains and doesn't respect the map.

This is a bit subtle, since we really would like aos_dump to be able to subscribe to any channel for debugging. But, it would also be nice to have aos_dump properly respect remaps.

Year 2018 arm feedforward

Hey,

I'm Gabor from team 114 and we're looking at your arm feed forward code to try to implement it in our robot. It's in the y2018/control_loops/python/arm_trajectory.py file. Can you please explain us a few things?

  1. What are the G1 and G2 constants? I think this is the gear ratio of the motors, but I'm not 100% sure.

The following questions are not essential to be answered, but if you have time, we'd appreciate if you do

  1. What is the difference between the K2 and K4 matrices? I understand what the other matrices do, but K2 and K4 are both multiplied by omega - which would imply that both are used to relate torque to omega. But I don't understand why you would need two matrices for the same thing. Can you please clarify what the exact purpose of these matrices are?
  2. Can you please clarify how exactly the constant matrices are calculated and what the values in it are?

@AustinSchuh @platipus25 I'm pinging you because it seems that you are the contributors to this file, so I assume you know the implementation details.

Thanks in advance,
Gabor and team 114

Support logging remote timestamps on third-party node

If we have a system with nodes a, b, and logger, and are only running a logger on logger, then currently it is not possible to be log timestamps for messages sent between a and b. So if there is a channel that is forwarded from a to b and the logger, then if you attempt to replay a log from the perspective of b you will not get any messages replayed on that channel since the logger didn't log the timestamps for the message arrivals on b, so doesn't know when the messages actually arrived :(.

This means being able to specify arbitrary nodes in the timestamp_logger_nodes field for a connection, see:

// If the corresponding delivery timestamps for this channel are logged
// remotely, which node should be responsible for logging the data. Note:
// for now, this can only be the source node. Empty implies the node this
// connection is connecting to (i.e. name).
timestamp_logger_nodes:[string] (id: 2);

Dynamic flag-setting in starterd

Allow specifying flags as part of the starter RPC definition. Unsure exactly how this should manage override of flags specified in the AOS config. This is very helpful when wanting to start applications in the same environment as they would see under starterd but with slightly changed flags.

irq_affinity should report top results

This would be incredibly handy to be able to debug what is happening when something goes wrong to see what else is happening in the system. 1hz is plenty (as we dredge through /proc to see how things are going), or lower frequency when we are dredging through.

We should also put the scheduler + affinity + priority in that report too, along with memory usage.

Dropping sent-too-fast messages in LogReader makes unreadable logs

We currently drop replayed messages if they get sent too fast when replaying a log. This is itself somewhat dubious behavior, but also creates an issue where if you create a log of the replay, then you can get errors like

F0402 12:20:39.001713 8892 logfile_utils.cc:1506] Check failed: result.timestamp == monotonic_remote_time ({.boot=0, .time=160.591261903sec} vs. {.boot=0, .time=160.623109070sec}) : Queue index matches, but timestamp doesn't. Please investigate!

Avoid double-sending on replayed channels

Add a check to logger to ensure that applications in log replay aren't sending on channels that are also getting replayed (i.e., all the relevant channels from the log are remapped).

Should pretty much just be a matter of doing this TODO:

// TODO(james): Enable exclusive senders on LogReader to allow us to
// ensure we are remapping channels correctly.
event_loop_unique_ptr_ = node_event_loop_factory_->MakeEventLoop(
"log_reader", {NodeEventLoopFactory::CheckSentTooFast::kNo,
NodeEventLoopFactory::ExclusiveSenders::kNo});

and then seeing what blows up.

Proxy EventLoop

An EventLoop implementation which creates multiple EventLoops that are all scheduled via one underlying EventLoop would allow combining multiple applications in a single process. They must all be on the same node, and can communicate with each other and the outside world as normal. This would provide a similar API to SimulatedEventLoopFactory which can create new EventLoops on demand.

Some tricky things to keep in mind:

  • Make sure watchers and senders from all the proxied EventLoops work with each other
  • Multiple senders on the same channel in multiple proxied EventLoops. TimingReports end up doing this.
  • Timers and fetchers can mostly be used directly. Should tack on something in the name to help decipher TimingReports though (each one will be reported twice, once with a longer name in the proxy EventLoop and once with just the given name for the proxied EventLoop)

Sandbox escape: libpcre3

To reproduce:

$ bazel build //documentation/tutorials:create-a-new-autonomous
INFO: Analyzed target //documentation/tutorials:create-a-new-autonomous (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/steple/source/971-Robot-Code/documentation/tutorials/BUILD:14:13: Executing genrule //documentation/tutorials:create-a-new-autonomous failed: (Exit 127): bash failed: error executing command (from target //documentation/tutorials:create-a-new-autonomous) /bin/bash -c ... (remaining 1 argument skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
external/pandoc/usr/bin/pandoc: error while loading shared libraries: libpcre.so.3: cannot open shared object file: No such file or directory
Target //documentation/tutorials:create-a-new-autonomous failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.881s, Critical Path: 0.01s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

sudo apt install libpcre3 will fix this on Ubuntu 23.10.

Better C++ flatbuffers API

The current flatbuffers C++ API requires building up messages inside-out. It should be possible to use the C++ stack for this instead, to allow the C++ code to build up messages from the outside in (which is often more natural).

More concretely, this means generating C++ classes for each flatbuffer struct which hold values for each field (and track which ones are set). Nested structs should be contained in their parent objects, because a major use case is calling functions which return a nested object, and there's no other easy place to store these objects. Then, these objects can do a depth-first traversal of the C++ object graph to actually write the flatbuffer (aka each object writes out its children, tracking the resulting offsets in local variables, then writes out itself and returns the offset to its parent).

I think writing the buffer should be done in a offset_t Write method (or similar), which is passed the FlatBufferBuilder. The top-level will normally be called via a templated wrapper type that holds onto the fbb with a destructor which calls Write on the top-level object and then Finish, to keep everything in outside-in order.

An alternative would be holding the fbb in each object, and then having their destructors write it out, but that means more stack space and makes it impossible to decide to skip writing it out later (for example, build up a sub object and then realize it's not actually needed, without taking up any space in the final buffer). Doing it in the destructor also means the parent object has to keep track of where the offsets to all its children are coming from.

Need to think through handling arrays of primitives. These can be big and variable-sized, neither of which interacts well with putting them on the stack. At the same time, allocating them immediately unlike other objects makes for a confusing API. Maybe provide APIs for both?

Handling arrays of objects is tricky. ArrayWriter<T> StartArray(int max_size) would work well for many cases, with convenience void CreateArray(span<const T>) when the temporary storage is managed externally. However, there's no place to stash C++ pointers to the intermediate objects. The flatbuffers array needs to be placed after those objects in the buffer, and C++ pointers are larger than offsets on 64-bit platforms. Forcing the user to allocate that array externally goes against making this API nice and easy to use, but that's the best I can think of right now. void CreateArray(span<const T*>), with an extra level of indirection, could be handy but also looks like a big foot-gun with dangling references.

Do we need to manage shared subobjects? Writing them out redundantly is easy, but not helpful for space efficiency. Maybe use a bit in the bitmask to track whether it's been written out, and make a union for all the variable storage which gets overwritten to the offset it was written to?

This will increase stack usage, which may be undesirable. It's probably worth using a bitmask in the generated code to track which fields are set, rather than using std::optional or a separate bool for each one.

Copying sub-objects could become expensive. It should be possible to structure this so that RVO (return value optimization) constructs the sub-objects in place for the common case of a function returning an entire sub-object.

These classes end up looking similar to the existing TableT, but without storing data in the objects. Do we want to expose reading the fields (and checking if they're set) and/or building them from a const Table*?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.