opentensor / bittensor-subnet-template Goto Github PK
View Code? Open in Web Editor NEWTemplate Design for a Bittensor subnetwork
License: MIT License
Template Design for a Bittensor subnetwork
License: MIT License
I am encountering issues while attempting to set up the subnet template locally, as outlined in your documentation (https://github.com/opentensor/bittensor-subnet-template/blob/main/docs/running_on_staging.md). I have run into a couple of challenges that I'd like to bring to your attention for guidance or resolution.
Non-existent Branch for User-Creation: In the documentation, step 4 instructs to switch to the 'user-creation' branch under the subnets section. However, I've observed that this branch does not exist in the repository. This has led me to bypass step 4 entirely, but I am unsure if this impacts the subsequent steps or the overall setup.
Faucet Token Minting Error: Following the steps and proceeding to step 9, which involves minting tokens from the faucet, I encountered an error. The specific error message is as follows: Failed: Error: {'type': 'Module', 'name': 'FaucetDisabled', 'docs': []}
. This error suggests an issue with the faucet module, perhaps indicating that it is disabled or not functioning as expected.
Edge case in weight setting that only occurs when there's a single non-zero weight:
weight_utils.process_weights_for_netuid
, non-zero weight indices are extracted by calling np.argwhere(weights > 0).squeeze()
. (line 142)
len
on it).non_zero_weight_idx
is used to index into uids
and weights
, which again produces an array scaler (lines 143-144)len
on non_zero_weights
, which fails in the case of it being an array scaler (line 168)Send rewards back to miners for fine-tuning.
This will safely close down the miner, but won't print the full traceback. I actually found the traceback to be quite useful. For example, when a miner is stuck on a particular function/line, the traceback helps identify what the issue was. Can we keep the traceback from this operation?
Originally posted by @Eugene-hu in #26 (comment)
Use sphinx to generate docs automatically based on docstrings.
The issue is deeply discussed on community discord:
https://discord.com/channels/799672011265015819/1240229792348110920
In short: boosting or setting weights via btcli doesn't works on local environment.
To reproduce follow all the steps up to the last one in these docs
https://github.com/opentensor/bittensor-subnet-template/blob/main/docs/running_on_staging.md
The issue persists on multiple different OS environments and multiple users experienced the issue.
Hello!
I was wondering if you could produce some documentations on how we can test miners and validators without having to run a full staging subtensor. Something with quicker dev feedback loops would greatly speed up development time and help make subnet more competitives
Create a gating model in a Mixture of Experts (MoE) architecture using PyTorch. We can implement a soft gating mechanism where the weights act as probabilities for selecting different experts. We can use the Gumbel-Softmax trick to sample from the categorical distribution with temperature, making the sampling process differentiable.
This should be part of the validator, as most subnets will want some kind of automatic routing mechanism without having to reinvent the wheel.
import torch
import torch.nn as nn
import torch.nn.functional as F
class GatingModel(nn.Module):
def __init__(self, input_dim, num_experts, temperature=1.0):
super(GatingModel, self).__init__()
self.num_experts = num_experts
self.temperature = temperature
# Gating network
self.gating_network = nn.Sequential(
nn.Linear(input_dim, num_experts),
nn.Softmax(dim=-1) # Softmax along the expert dimension
)
def forward(self, input):
# Calculate gating probabilities
gating_probs = self.gating_network(input)
# Gumbel-Softmax sampling for discrete selection
gumbel_noise = torch.rand_like(gating_probs)
gumbel_noise = -torch.log(-torch.log(gumbel_noise + 1e-20) + 1e-20) # Gumbel noise
logits = (torch.log(gating_probs + 1e-20) + gumbel_noise) / self.temperature
selected_experts = F.softmax(logits, dim=-1)
# Weighted sum of expert outputs
output = torch.sum(selected_experts.unsqueeze(-1) * input.unsqueeze(-2), dim=-2)
return output, selected_experts
# Example usage
input_dim = 10
num_experts = 5
temperature = 0.1
# Create a GatingModel
gating_model = GatingModel(input_dim, num_experts, temperature)
# Generate dummy input
input_data = torch.randn(32, input_dim)
# Forward pass through the gating model
output, selected_experts = gating_model(input_data)
# The 'output' is the final output of the MoE, and 'selected_experts' is the one-hot vector indicating which experts were selected for each example.
This import imports from loguru, yet when you install the repo, loguru does not get installed as it's not in the requirements.txt file.
There are examples of subnets with unrealistic (read: expensive) hardware requirements. In many cases, this makes mining and/or validating economically impractical. We should help subnet owns to define a ramp up for the requirements so that members of the subnet can be eased in.
Lots of confusion about how to run subtensor locally & also the meaning and significance of various chains.
@Eugene-hu @rajkaramchedu let's plan for improving this
Following the running_on_staging.md
.
Blockchain node runs. No errors except one:
Error while running root epoch: "Not the block to update emission values."
Miner log has errors:
2024-01-18 07:44:15.992 | DEBUG | axon | <-- | 827 B | Dummy | 5FvVLxkyoKsLmnXAqZaap1Bp6HxLSzzg6p58DEiwd1Eva8q7 | 127.0.0.1:52292 | 200 | Success
2024-01-18 07:44:15.993 | ERROR | NotVerifiedException: Not Verified with error: Signature mismatch with 3542913964592439.5FvVLxkyoKsLmnXAqZaap1Bp6HxLSzzg6p58DEiwd1Eva8q7.5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD.d6cd9c9c-b5cc-11ee-9ac9-a8a1591a7c9f.a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a and 0x7a9ee7e33388e0d410692f08470c2fbafc4168072606b58e4fed369e29808c2e84b4ad66bac5fba2f421093de5177b96cf63c0596793dc3f14be8fbfe80df388
Validator log has errors:
024-01-17 15:17:02.118 | ERROR | Error during validation 'Validator' object has no attribute 'moving_averaged_scores'
Traceback (most recent call last):
File "/home/bittensor-subnet-template/template/base/validator.py", line 141, in run
self.sync()
File "/home/bittensor-subnet-template/template/base/neuron.py", line 121, in sync
self.set_weights()
File "/home/bittensor-subnet-template/template/base/validator.py", line 220, in set_weights
self.moving_averaged_scores, p=1, dim=0
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Validator' object has no attribute 'moving_averaged_scores'
After commenting out set_weights
method, I see these messages:
2024-01-18 07:44:31.871 | DEBUG | dendrite | --> | 3068 B | Dummy | 5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD | 77.237.52.204:8091 | 0 | Success
2024-01-18 07:44:31.876 | DEBUG | dendrite | <-- | 3221 B | Dummy | 5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD | 77.237.52.204:8091 | 503 | Service at 77.237.52.204:8091/Dummy unavailable.
Important!
I'm running both neurons on a single node.
We want to enable validators and miners to see the data created by themselves and other entities within subnets by default.
This will support network monitoring and analytics and help to create a standard of quality for subnets. Of course, subnet developers can and should improve upon the default behaviour but we will at least ensure that there is some telemetry from day zero.
To do this we will first implement basic logging for miners and validators:
The end of each forward pass will call log_event
, which will write the following data to a local log file as a dict of lists:
Each validator query will call log_event
with information such as
We can also introduce a specific event that is logged when weight setting occurs.
/base/validator.py:313: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
i get the following error in update_scores
method
i suppose it's tensor-related warning - will make a PR soon
The most important change between typehints, >python 3.9 typehints will actually break python 3.8. So we should be aware of this issue. My recommendation is that we should start depreciating our use of python 3.8 in favour of 3.9 or 3.10.
Originally posted by @Eugene-hu in #26 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.