Git Product home page Git Product logo

bittensor-subnet-template's People

Contributors

alex-drocks avatar crazydevlegend avatar eugene-hu avatar gitphantomman avatar gus-opentensor avatar ibraheem-opentensor avatar ifrit98 avatar jameszwifter avatar nimaaghli avatar opendansor avatar rajkaramchedu avatar shibshib avatar steffencruz avatar synthpolis avatar the-mx avatar unconst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bittensor-subnet-template's Issues

Can't mint tokens from faucet locally

I am encountering issues while attempting to set up the subnet template locally, as outlined in your documentation (https://github.com/opentensor/bittensor-subnet-template/blob/main/docs/running_on_staging.md). I have run into a couple of challenges that I'd like to bring to your attention for guidance or resolution.

  1. Non-existent Branch for User-Creation: In the documentation, step 4 instructs to switch to the 'user-creation' branch under the subnets section. However, I've observed that this branch does not exist in the repository. This has led me to bypass step 4 entirely, but I am unsure if this impacts the subsequent steps or the overall setup.

  2. Faucet Token Minting Error: Following the steps and proceeding to step 9, which involves minting tokens from the faucet, I encountered an error. The specific error message is as follows: Failed: Error: {'type': 'Module', 'name': 'FaucetDisabled', 'docs': []}. This error suggests an issue with the faucet module, perhaps indicating that it is disabled or not functioning as expected.

process_weights_for_netuid breaks when there is only 1 non-zero weight

Edge case in weight setting that only occurs when there's a single non-zero weight:

  • In weight_utils.process_weights_for_netuid , non-zero weight indices are extracted by calling np.argwhere(weights > 0).squeeze(). (line 142)
    • In the case where there is only a single weight with a non-zero value, this returns an array scalar (which is an unsized object, i.e. you cannot call len on it).
  • The resulting variable non_zero_weight_idx is used to index into uids and weights, which again produces an array scaler (lines 143-144)
  • Downstream code calls len on non_zero_weights, which fails in the case of it being an array scaler (line 168)

Toy recreation of the issue:
Screenshot 2024-06-06 at 2 40 15 PM

Reintroduce tracebacks for KeyboardInterrupt exceptions

          This will safely close down the miner, but won't print the full traceback. I actually found the traceback to be quite useful. For example, when a miner is stuck on a particular function/line, the traceback helps identify what the issue was. Can we keep the traceback from this operation?

Originally posted by @Eugene-hu in #26 (comment)

Autodocs

Use sphinx to generate docs automatically based on docstrings.

'netuid is not the root network' on staging

The issue is deeply discussed on community discord:
https://discord.com/channels/799672011265015819/1240229792348110920
In short: boosting or setting weights via btcli doesn't works on local environment.

To reproduce follow all the steps up to the last one in these docs
https://github.com/opentensor/bittensor-subnet-template/blob/main/docs/running_on_staging.md

The issue persists on multiple different OS environments and multiple users experienced the issue.
image

Add MoE Gating model base

Create a gating model in a Mixture of Experts (MoE) architecture using PyTorch. We can implement a soft gating mechanism where the weights act as probabilities for selecting different experts. We can use the Gumbel-Softmax trick to sample from the categorical distribution with temperature, making the sampling process differentiable.

This should be part of the validator, as most subnets will want some kind of automatic routing mechanism without having to reinvent the wheel.

import torch
import torch.nn as nn
import torch.nn.functional as F

class GatingModel(nn.Module):
    def __init__(self, input_dim, num_experts, temperature=1.0):
        super(GatingModel, self).__init__()

        self.num_experts = num_experts
        self.temperature = temperature

        # Gating network
        self.gating_network = nn.Sequential(
            nn.Linear(input_dim, num_experts),
            nn.Softmax(dim=-1)  # Softmax along the expert dimension
        )

    def forward(self, input):
        # Calculate gating probabilities
        gating_probs = self.gating_network(input)

        # Gumbel-Softmax sampling for discrete selection
        gumbel_noise = torch.rand_like(gating_probs)
        gumbel_noise = -torch.log(-torch.log(gumbel_noise + 1e-20) + 1e-20)  # Gumbel noise
        logits = (torch.log(gating_probs + 1e-20) + gumbel_noise) / self.temperature
        selected_experts = F.softmax(logits, dim=-1)

        # Weighted sum of expert outputs
        output = torch.sum(selected_experts.unsqueeze(-1) * input.unsqueeze(-2), dim=-2)

        return output, selected_experts

# Example usage
input_dim = 10
num_experts = 5
temperature = 0.1

# Create a GatingModel
gating_model = GatingModel(input_dim, num_experts, temperature)

# Generate dummy input
input_data = torch.randn(32, input_dim)

# Forward pass through the gating model
output, selected_experts = gating_model(input_data)

# The 'output' is the final output of the MoE, and 'selected_experts' is the one-hot vector indicating which experts were selected for each example.

Guide for SN owners to ease people into their subnets

There are examples of subnets with unrealistic (read: expensive) hardware requirements. In many cases, this makes mining and/or validating economically impractical. We should help subnet owns to define a ramp up for the requirements so that members of the subnet can be eased in.

Vanilla local setup on Ubuntu fails

Following the running_on_staging.md.

Blockchain node runs. No errors except one:

Error while running root epoch: "Not the block to update emission values."

Miner log has errors:

2024-01-18 07:44:15.992 | DEBUG | axon | <-- | 827 B | Dummy | 5FvVLxkyoKsLmnXAqZaap1Bp6HxLSzzg6p58DEiwd1Eva8q7 | 127.0.0.1:52292 | 200 | Success
2024-01-18 07:44:15.993 | ERROR | NotVerifiedException: Not Verified with error: Signature mismatch with 3542913964592439.5FvVLxkyoKsLmnXAqZaap1Bp6HxLSzzg6p58DEiwd1Eva8q7.5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD.d6cd9c9c-b5cc-11ee-9ac9-a8a1591a7c9f.a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a and 0x7a9ee7e33388e0d410692f08470c2fbafc4168072606b58e4fed369e29808c2e84b4ad66bac5fba2f421093de5177b96cf63c0596793dc3f14be8fbfe80df388

Validator log has errors:

024-01-17 15:17:02.118 | ERROR | Error during validation 'Validator' object has no attribute 'moving_averaged_scores'
Traceback (most recent call last):
File "/home/bittensor-subnet-template/template/base/validator.py", line 141, in run
self.sync()
File "/home/bittensor-subnet-template/template/base/neuron.py", line 121, in sync
self.set_weights()
File "/home/bittensor-subnet-template/template/base/validator.py", line 220, in set_weights
self.moving_averaged_scores, p=1, dim=0
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Validator' object has no attribute 'moving_averaged_scores'

After commenting out set_weights method, I see these messages:

2024-01-18 07:44:31.871 | DEBUG | dendrite | --> | 3068 B | Dummy | 5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD | 77.237.52.204:8091 | 0 | Success
2024-01-18 07:44:31.876 | DEBUG | dendrite | <-- | 3221 B | Dummy | 5E7yNCEsJZdFKRV4qeZcdYuDX54tJ2ysoK8uPXd1ZUVQ2eQD | 77.237.52.204:8091 | 503 | Service at 77.237.52.204:8091/Dummy unavailable.

Important!
I'm running both neurons on a single node.

Out of the box logging for validators and miners

Overview

We want to enable validators and miners to see the data created by themselves and other entities within subnets by default.

This will support network monitoring and analytics and help to create a standard of quality for subnets. Of course, subnet developers can and should improve upon the default behaviour but we will at least ensure that there is some telemetry from day zero.

To do this we will first implement basic logging for miners and validators:

Validators

The end of each forward pass will call log_event, which will write the following data to a local log file as a dict of lists:

  • UIDs queried
  • responses
    • deserialized outputs
    • status codes
    • response time
  • rewards

Miners

Each validator query will call log_event with information such as

  • Validator info
    • UID
    • hotkey
    • stake
  • Blacklist return value
  • Priority return value
  • Forward return value
  • Forward run time

We can also introduce a specific event that is logged when weight setting occurs.

tensor warning in base validator

/base/validator.py:313: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).

i get the following error in update_scores method
i suppose it's tensor-related warning - will make a PR soon

Depreciate support for python 3.8

          The most important change between typehints, >python 3.9 typehints will actually break python 3.8. So we should be aware of this issue. My recommendation is that we should start depreciating our use of python 3.8 in favour of 3.9 or 3.10.

Originally posted by @Eugene-hu in #26 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.