frc971 / 971-robot-code Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
When merging configs, if the application names match, we concatenate the node lists. If there are duplicates, we preserve them. We should instead deduplicate them.
If the OOM killer kills a process, all it's handles get leaked. This is because the OOM killer doesn't run the robust futex cleanup to free the messages.
It actually looks like /proc/pid/stat has a start time for the thread. When opening the queue, we could look for any PIDs which are nonzero, see if they exist, and then check that the start time matches. If it does, it's pretty much guaranteed to be the same process, and is actually running.
Create one or both of:
When we hit https://github.com/frc971/971-Robot-Code/blob/master/aos/events/simulated_event_loop.cc#L1463 it typically seems to be people creating EventLoop
s outside of the LogReader::OnStart
. Confirm the nature of the issue here and improve the error message (and/or consider changing when EventLoop creation is even allowed) so that users know what to do when they see that error.
Currently, messages sent on shared memory channels are timestamped prior to actually being "sent" (https://github.com/frc971/971-Robot-Code/blob/master/aos/ipc_lib/lockless_queue.cc#L1028). While we should have the guarantees in place to ensure that messages do not get sent with out-of-order timestamps on a given channel, the current ordering does mean that a process listening on multiple channels could plausibly observe messages across channels out-of-order (which isn't supposed to happen).
If we instead figured out an appropriate way to have the first observer of a message timestamp it (this "observer" would be the first of any fetchers or watchers, as well as something that would run immediately after the send actually went through), then because of how the event processing happens on the listening side, it should no longer be possible to observe out of order events.
The current API is cumbersome, confusing, and prone to human error.
And a rule that creates a test to confirm that, for a given config:
--skip_missing_forwarding_entries
)--note that this should be configurable, because you may only care about this for a few nodes.Sometimes it is helpful to only process the latest message with a watcher, or to disable "die when you get behind" behavior. Add configuration support to watchers to enable all this.
People consistently seem to struggle with identifying failures associated with the JSON->flatbuffers code. Some of this may be that the messages tend to not stand out much in the program output. Part of it is also that failures of the code tend to show up as segfaults rather than some sort of more coherent error message.
Provide a method that returns, for a given node/application combination, all the valid ways to refer to a given channel.
I've seen evidence that we don't enforce that channel names start with /. We should fix that.
We have a watcher which blocks forever (different bug). This makes it so the event loop isn't able to process signals.
RT signals queue up. RLIMIT_SIGPENDING is the limit per process, which defaults to somewhere around 7k of them.
Once this happens, we are unable to wake up any process, and everything gets triggered off timers. We need a way to not accumulate signals forever.
aos_starter is designed to allow it to take an executable name instead of just the application name. But if there are multiple applications with the same executable, the behavior is ambiguous.
Guided by Austin-- I was looking at the following code out of y2022_roborio.json:
{ "name": "/drivetrain", "type": "frc971.control_loops.drivetrain.Output", "source_node": "roborio", "frequency": 400, "max_size": 80, "num_senders": 2, "logger": "LOCAL_AND_REMOTE_LOGGER", "logger_nodes": [ "imu" ], "destination_nodes": [ { "name": "imu", "priority": 5, "timestamp_logger": "LOCAL_AND_REMOTE_LOGGER", "timestamp_logger_nodes": [ "imu" ], "time_to_live": 5000000 } ] },
The timestamp_logger_nodes should be the source_node, "roborio", rather than currently as "imu". Our json creation/merging shouldn't accept other choices than the source_node.
cargo_raze__pcre
fails to build. Digging in, the key lines in bazel-out/k8-opt/bin/external/cargo_raze__pcre/pcre_foreign_cc/CMake.log
are:
-- Found BZip2: /usr/lib/x86_64-linux-gnu/libbz2.so (found version "1.0.8")
-- Looking for BZ2_bzCompressInit
-- Looking for BZ2_bzCompressInit - not found
-- Found ZLIB: /dev/shm/bazel-sandbox.f856700afb979694adb03c9da2b7c0b9dacd7824b0655f8fe6ce9ad53a8e0492/linux-sandbox/1656/execroot/org_frc971/bazel-out/k8-opt/bin/external/cargo_raze__pcre/pcre.ext_build_deps/lib/libzlib.a (found version "1.2.11")
-- Found Readline: /usr/include
-- Could not find OPTIONAL package Editline
We then explode with:
[ 83%] Building C object CMakeFiles/pcregrep.dir/pcregrep.c.o
^[[1m/dev/shm/bazel-sandbox.f856700afb979694adb03c9da2b7c0b9dacd7824b0655f8fe6ce9ad53a8e0492/linux-sandbox/1656/execroot/org_frc971/external/cargo_raze__pcre/pcregrep.c:69:10: ^[[0m^[[0;1;31mfatal error: ^[[0m^[[1m
'bzlib.h' file not found^[[0m
#include <bzlib.h>
^[[0;1;32m ^~~~~~~~~
^[[0m1 error generated.
make[2]: *** [CMakeFiles/pcregrep.dir/build.make:76: CMakeFiles/pcregrep.dir/pcregrep.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:116: CMakeFiles/pcregrep.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
Somehow, it looks like cmake is leaking outside the sandbox and finding things on the host when being built. Hmmm.
Create some sort of bridge to allow bridging ROS nodes to AOS, to accommodate people who want to be able to experiment with both.
Exactly how this will look is not entirely clear, but the goal should be to be relatively easy to get working, rather than focusing on getting every single possible feature working.
I thought you may be interested that I contributed a new DARE solver to WPILib that's 2.8x faster than SLICOT, the solver you use now, based on roboRIO benchmarking on a 5 state, 2 input differential drive LQR problem.
The gist of the algorithm from DOI 10.1080/00207170410001714988 is:
#include <Eigen/Cholesky>
#include <Eigen/Core>
#include <Eigen/LU>
template <int States, int Inputs>
Eigen::Matrix<double, States, States> DARE(
const Eigen::Matrix<double, States, States>& A,
const Eigen::Matrix<double, States, Inputs>& B,
const Eigen::Matrix<double, States, States>& Q,
const Eigen::Matrix<double, Inputs, Inputs>& R) {
// [1] E. K.-W. Chu, H.-Y. Fan, W.-W. Lin & C.-S. Wang
// "Structure-Preserving Algorithms for Periodic Discrete-Time
// Algebraic Riccati Equations",
// International Journal of Control, 77:8, 767-788, 2004.
// DOI: 10.1080/00207170410001714988
//
// Implements SDA algorithm on p. 5 of [1] (initial A, G, H are from (4)).
using StateMatrix = Eigen::Matrix<double, States, States>;
StateMatrix A_k = A;
StateMatrix G_k = B * R.llt().solve(B.transpose());
StateMatrix H_k;
StateMatrix H_k1 = Q;
do {
H_k = H_k1;
StateMatrix W = StateMatrix::Identity() + G_k * H_k;
auto W_solver = W.lu();
StateMatrix V_1 = W_solver.solve(A_k);
// Solve V₂Wᵀ = Gₖ for V₂
//
// V₂Wᵀ = Gₖ
// (V₂Wᵀ)ᵀ = Gₖᵀ
// WV₂ᵀ = Gₖᵀ
// V₂ᵀ = W.solve(Gₖᵀ)
// V₂ = W.solve(Gₖᵀ)ᵀ
StateMatrix V_2 = W_solver.solve(G_k.transpose()).transpose();
G_k += A_k * V_2 * A_k.transpose();
H_k1 = H_k + V_1.transpose() * H_k * A_k;
A_k *= V_1;
} while ((H_k1 - H_k).norm() > 1e-10 * H_k1.norm());
return H_k1;
}
The preconditions necessary for convergence are:
The paper proves convergence under weaker conditions, but it seems to involve solving a generalized eigenvalue problem with the QZ algorithm. SLICOT and Drake use that to solve the whole problem, so it seemed too expensive to bother attempting.
The precondition checks turned out to be 50-60% of the total algorithm runtime, so WPILib exposed a function that skips them if the user knows they'll be satisfied. This would be a good candidate for your Kalman filter error covariance init code, since a comment in there mentioned it didn't use the DARE solver because it had unnecessary checks.
Here's WPILib's impl, which supports static sizing (for performance) and dynamic sizing (for JNI) and throws exceptions on precondition violations.
https://github.com/wpilibsuite/allwpilib/blob/main/wpimath/src/main/native/include/frc/DARE.h
I'd recommend std::expected instead of exceptions for your use case.
If you have /camera
-> /pi1/camera
on pi1, aos_dump /camera
complains and doesn't respect the map.
This is a bit subtle, since we really would like aos_dump to be able to subscribe to any channel for debugging. But, it would also be nice to have aos_dump properly respect remaps.
Hey,
I'm Gabor from team 114 and we're looking at your arm feed forward code to try to implement it in our robot. It's in the y2018/control_loops/python/arm_trajectory.py
file. Can you please explain us a few things?
G1
and G2
constants? I think this is the gear ratio of the motors, but I'm not 100% sure.The following questions are not essential to be answered, but if you have time, we'd appreciate if you do
K2
and K4
matrices? I understand what the other matrices do, but K2
and K4
are both multiplied by omega - which would imply that both are used to relate torque to omega. But I don't understand why you would need two matrices for the same thing. Can you please clarify what the exact purpose of these matrices are?@AustinSchuh @platipus25 I'm pinging you because it seems that you are the contributors to this file, so I assume you know the implementation details.
Thanks in advance,
Gabor and team 114
If we have a system with nodes a
, b
, and logger
, and are only running a logger on logger
, then currently it is not possible to be log timestamps for messages sent between a
and b
. So if there is a channel that is forwarded from a
to b
and the logger
, then if you attempt to replay a log from the perspective of b
you will not get any messages replayed on that channel since the logger didn't log the timestamps for the message arrivals on b
, so doesn't know when the messages actually arrived :(.
This means being able to specify arbitrary nodes in the timestamp_logger_nodes
field for a connection
, see:
971-Robot-Code/aos/configuration.fbs
Lines 28 to 32 in e7c7e58
@yimmy13 observed that by using const aos::Node*
s to identify nodes we tend to create issues in log replay (namely, the node pointer was unstable across calls to `RemapLoggedChannel*). Can we make this less confusing to developers?
Allow specifying flags as part of the starter RPC definition. Unsure exactly how this should manage override of flags specified in the AOS config. This is very helpful when wanting to start applications in the same environment as they would see under starterd but with slightly changed flags.
Jim was having toruble copying TargetEstimate
's. Not sure why.
Edit:
https://github.com/frc971/971-Robot-Code/blob/master/aos/flatbuffer_merge.cc#L682
This would be incredibly handy to be able to debug what is happening when something goes wrong to see what else is happening in the system. 1hz is plenty (as we dredge through /proc to see how things are going), or lower frequency when we are dredging through.
We should also put the scheduler + affinity + priority in that report too, along with memory usage.
LogReader::Register will skip any setting up of callbacks when a node has no log files with data in them. This leads to LogReader getting stuck indefinitely when calling Run() in this scenario.
We currently drop replayed messages if they get sent too fast when replaying a log. This is itself somewhat dubious behavior, but also creates an issue where if you create a log of the replay, then you can get errors like
F0402 12:20:39.001713 8892 logfile_utils.cc:1506] Check failed: result.timestamp == monotonic_remote_time ({.boot=0, .time=160.591261903sec} vs. {.boot=0, .time=160.623109070sec}) : Queue index matches, but timestamp doesn't. Please investigate!
It would be super helpful to see what went wrong in the log. We don't always have easy access to stdout/err of starterd.
Add a check to logger to ensure that applications in log replay aren't sending on channels that are also getting replayed (i.e., all the relevant channels from the log are remapped).
Should pretty much just be a matter of doing this TODO:
971-Robot-Code/aos/events/logging/log_reader.h
Lines 409 to 413 in 890c249
If multiple messages all configured to be forwarded are sent at the same time in simulation (in the same callback), they don't get forwarded.
I noticed you were using the initialzer_list
version. We added an override to pass a vector to make things cleaner.
An EventLoop implementation which creates multiple EventLoops that are all scheduled via one underlying EventLoop would allow combining multiple applications in a single process. They must all be on the same node, and can communicate with each other and the outside world as normal. This would provide a similar API to SimulatedEventLoopFactory which can create new EventLoops on demand.
Some tricky things to keep in mind:
To reproduce:
$ bazel build //documentation/tutorials:create-a-new-autonomous
INFO: Analyzed target //documentation/tutorials:create-a-new-autonomous (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/steple/source/971-Robot-Code/documentation/tutorials/BUILD:14:13: Executing genrule //documentation/tutorials:create-a-new-autonomous failed: (Exit 127): bash failed: error executing command (from target //documentation/tutorials:create-a-new-autonomous) /bin/bash -c ... (remaining 1 argument skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
external/pandoc/usr/bin/pandoc: error while loading shared libraries: libpcre.so.3: cannot open shared object file: No such file or directory
Target //documentation/tutorials:create-a-new-autonomous failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.881s, Critical Path: 0.01s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully
sudo apt install libpcre3
will fix this on Ubuntu 23.10.
The current flatbuffers C++ API requires building up messages inside-out. It should be possible to use the C++ stack for this instead, to allow the C++ code to build up messages from the outside in (which is often more natural).
More concretely, this means generating C++ classes for each flatbuffer struct which hold values for each field (and track which ones are set). Nested structs should be contained in their parent objects, because a major use case is calling functions which return a nested object, and there's no other easy place to store these objects. Then, these objects can do a depth-first traversal of the C++ object graph to actually write the flatbuffer (aka each object writes out its children, tracking the resulting offsets in local variables, then writes out itself and returns the offset to its parent).
I think writing the buffer should be done in a offset_t Write
method (or similar), which is passed the FlatBufferBuilder. The top-level will normally be called via a templated wrapper type that holds onto the fbb with a destructor which calls Write
on the top-level object and then Finish
, to keep everything in outside-in order.
An alternative would be holding the fbb in each object, and then having their destructors write it out, but that means more stack space and makes it impossible to decide to skip writing it out later (for example, build up a sub object and then realize it's not actually needed, without taking up any space in the final buffer). Doing it in the destructor also means the parent object has to keep track of where the offsets to all its children are coming from.
Need to think through handling arrays of primitives. These can be big and variable-sized, neither of which interacts well with putting them on the stack. At the same time, allocating them immediately unlike other objects makes for a confusing API. Maybe provide APIs for both?
Handling arrays of objects is tricky. ArrayWriter<T> StartArray(int max_size)
would work well for many cases, with convenience void CreateArray(span<const T>)
when the temporary storage is managed externally. However, there's no place to stash C++ pointers to the intermediate objects. The flatbuffers array needs to be placed after those objects in the buffer, and C++ pointers are larger than offsets on 64-bit platforms. Forcing the user to allocate that array externally goes against making this API nice and easy to use, but that's the best I can think of right now. void CreateArray(span<const T*>)
, with an extra level of indirection, could be handy but also looks like a big foot-gun with dangling references.
Do we need to manage shared subobjects? Writing them out redundantly is easy, but not helpful for space efficiency. Maybe use a bit in the bitmask to track whether it's been written out, and make a union for all the variable storage which gets overwritten to the offset it was written to?
This will increase stack usage, which may be undesirable. It's probably worth using a bitmask in the generated code to track which fields are set, rather than using std::optional
or a separate bool
for each one.
Copying sub-objects could become expensive. It should be possible to structure this so that RVO (return value optimization) constructs the sub-objects in place for the common case of a function returning an entire sub-object.
These classes end up looking similar to the existing TableT
, but without storing data in the objects. Do we want to expose reading the fields (and checking if they're set) and/or building them from a const Table*
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.