opentensor / bittensor-subnet-template Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 95.0 511 KB

Template Design for a Bittensor subnetwork

License: MIT License

Python 92.62% Shell 7.38%

bittensor-subnet-template's People

Contributors

Stargazers

Watchers

bittensor-subnet-template's Issues

Can't mint tokens from faucet locally

I am encountering issues while attempting to set up the subnet template locally, as outlined in your documentation (https://github.com/opentensor/bittensor-subnet-template/blob/main/docs/running_on_staging.md). I have run into a couple of challenges that I'd like to bring to your attention for guidance or resolution.

Non-existent Branch for User-Creation: In the documentation, step 4 instructs to switch to the 'user-creation' branch under the subnets section. However, I've observed that this branch does not exist in the repository. This has led me to bypass step 4 entirely, but I am unsure if this impacts the subsequent steps or the overall setup.
Faucet Token Minting Error: Following the steps and proceeding to step 9, which involves minting tokens from the faucet, I encountered an error. The specific error message is as follows: Failed: Error: {'type': 'Module', 'name': 'FaucetDisabled', 'docs': []}. This error suggests an issue with the faucet module, perhaps indicating that it is disabled or not functioning as expected.

process_weights_for_netuid breaks when there is only 1 non-zero weight

Edge case in weight setting that only occurs when there's a single non-zero weight:

In weight_utils.process_weights_for_netuid , non-zero weight indices are extracted by calling np.argwhere(weights > 0).squeeze(). (line 142)
- In the case where there is only a single weight with a non-zero value, this returns an array scalar (which is an unsized object, i.e. you cannot call len on it).
The resulting variable non_zero_weight_idx is used to index into uids and weights, which again produces an array scaler (lines 143-144)
Downstream code calls len on non_zero_weights, which fails in the case of it being an array scaler (line 168)

Toy recreation of the issue:

error while running validator with `neuron.axon_off` flag

Reimplement backward

Send rewards back to miners for fine-tuning.

Reintroduce tracebacks for KeyboardInterrupt exceptions

          This will safely close down the miner, but won't print the full traceback. I actually found the traceback to be quite useful. For example, when a miner is stuck on a particular function/line, the traceback helps identify what the issue was. Can we keep the traceback from this operation?

Originally posted by @Eugene-hu in #26 (comment)

Proper test mockups for entire template

These should mirror the tests in bittensor proper
These should be comprehensive and complete
These should have coverage over all major functionality

Autodocs

Use sphinx to generate docs automatically based on docstrings.

'netuid is not the root network' on staging

The issue is deeply discussed on community discord:
https://discord.com/channels/799672011265015819/1240229792348110920
In short: boosting or setting weights via btcli doesn't works on local environment.

To reproduce follow all the steps up to the last one in these docs
https://github.com/opentensor/bittensor-subnet-template/blob/main/docs/running_on_staging.md

The issue persists on multiple different OS environments and multiple users experienced the issue.

docs on testing and developer environment without having to run subtensor locally

Hello!

I was wondering if you could produce some documentations on how we can test miners and validators without having to run a full staging subtensor. Something with quicker dev feedback loops would greatly speed up development time and help make subnet more competitives

Add MoE Gating model base

Create a gating model in a Mixture of Experts (MoE) architecture using PyTorch. We can implement a soft gating mechanism where the weights act as probabilities for selecting different experts. We can use the Gumbel-Softmax trick to sample from the categorical distribution with temperature, making the sampling process differentiable.

This should be part of the validator, as most subnets will want some kind of automatic routing mechanism without having to reinvent the wheel.

import torch
import torch.nn as nn
import torch.nn.functional as F

class GatingModel(nn.Module):
    def __init__(self, input_dim, num_experts, temperature=1.0):
        super(GatingModel, self).__init__()

        self.num_experts = num_experts
        self.temperature = temperature

        # Gating network
        self.gating_network = nn.Sequential(
            nn.Linear(input_dim, num_experts),
            nn.Softmax(dim=-1)  # Softmax along the expert dimension
        )

    def forward(self, input):
        # Calculate gating probabilities
        gating_probs = self.gating_network(input)

        # Gumbel-Softmax sampling for discrete selection
        gumbel_noise = torch.rand_like(gating_probs)
        gumbel_noise = -torch.log(-torch.log(gumbel_noise + 1e-20) + 1e-20)  # Gumbel noise
        logits = (torch.log(gating_probs + 1e-20) + gumbel_noise) / self.temperature
        selected_experts = F.softmax(logits, dim=-1)

        # Weighted sum of expert outputs
        output = torch.sum(selected_experts.unsqueeze(-1) * input.unsqueeze(-2), dim=-2)

        return output, selected_experts

# Example usage
input_dim = 10
num_experts = 5
temperature = 0.1

# Create a GatingModel
gating_model = GatingModel(input_dim, num_experts, temperature)

# Generate dummy input
input_data = torch.randn(32, input_dim)

# Forward pass through the gating model
output, selected_experts = gating_model(input_data)

# The 'output' is the final output of the MoE, and 'selected_experts' is the one-hot vector indicating which experts were selected for each example.

add loguru to requirements.txt

bittensor-subnet-template/template/utils/config.py

Line 23 in 332a417

from loguru import logger

This import imports from loguru, yet when you install the repo, loguru does not get installed as it's not in the requirements.txt file.

Guide for SN owners to ease people into their subnets

There are examples of subnets with unrealistic (read: expensive) hardware requirements. In many cases, this makes mining and/or validating economically impractical. We should help subnet owns to define a ramp up for the requirements so that members of the subnet can be eased in.

Improve docs for subtensor

Lots of confusion about how to run subtensor locally & also the meaning and significance of various chains.

@Eugene-hu @rajkaramchedu let's plan for improving this

Vanilla local setup on Ubuntu fails

Following the running_on_staging.md.

Blockchain node runs. No errors except one:

Error while running root epoch: "Not the block to update emission values."

Miner log has errors:

2024-01-18 07:44:15.992 | DEBUG | axon | <-- | 827 B | Dummy | 5FvVLxkyoKsLmnXAqZaap1Bp6HxLSzzg6p58DEiwd1Eva8q7 | 127.0.0.1:52292 | 200 | Success
2024-01-18 07:44:15.993 | ERROR | NotVerifiedException: Not Verified with error: Signature mismatch with 3542913964592439.5FvVLxkyoKsLmnXAqZaap1Bp6HxLSzzg6p58DEiwd1Eva8q7.5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD.d6cd9c9c-b5cc-11ee-9ac9-a8a1591a7c9f.a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a and 0x7a9ee7e33388e0d410692f08470c2fbafc4168072606b58e4fed369e29808c2e84b4ad66bac5fba2f421093de5177b96cf63c0596793dc3f14be8fbfe80df388

Validator log has errors:

024-01-17 15:17:02.118 | ERROR | Error during validation 'Validator' object has no attribute 'moving_averaged_scores'
Traceback (most recent call last):
File "/home/bittensor-subnet-template/template/base/validator.py", line 141, in run
self.sync()
File "/home/bittensor-subnet-template/template/base/neuron.py", line 121, in sync
self.set_weights()
File "/home/bittensor-subnet-template/template/base/validator.py", line 220, in set_weights
self.moving_averaged_scores, p=1, dim=0
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Validator' object has no attribute 'moving_averaged_scores'

After commenting out set_weights method, I see these messages:

2024-01-18 07:44:31.871 | DEBUG | dendrite | --> | 3068 B | Dummy | 5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD | 77.237.52.204:8091 | 0 | Success
2024-01-18 07:44:31.876 | DEBUG | dendrite | <-- | 3221 B | Dummy | 5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD | 77.237.52.204:8091 | 503 | Service at 77.237.52.204:8091/Dummy unavailable.

Important!
I'm running both neurons on a single node.

Out of the box logging for validators and miners

Overview

We want to enable validators and miners to see the data created by themselves and other entities within subnets by default.

This will support network monitoring and analytics and help to create a standard of quality for subnets. Of course, subnet developers can and should improve upon the default behaviour but we will at least ensure that there is some telemetry from day zero.

To do this we will first implement basic logging for miners and validators:

Validators

The end of each forward pass will call log_event, which will write the following data to a local log file as a dict of lists:

UIDs queried
responses
- deserialized outputs
- status codes
- response time
rewards

Miners

Each validator query will call log_event with information such as

Validator info
- UID
- hotkey
- stake
Blacklist return value
Priority return value
Forward return value
Forward run time

We can also introduce a specific event that is logged when weight setting occurs.

tensor warning in base validator

/base/validator.py:313: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).

i get the following error in update_scores method
i suppose it's tensor-related warning - will make a PR soon

Depreciate support for python 3.8

          The most important change between typehints, >python 3.9 typehints will actually break python 3.8. So we should be aware of this issue. My recommendation is that we should start depreciating our use of python 3.8 in favour of 3.9 or 3.10.