Git Product home page Git Product logo

stc's Introduction

Standard Template Construct

Welcome, developer! You've arrived at the repository for STC, the library, search engine and AI tooling offering free access to academic knowledge and works of fictional literature.

STC | Help Center

Getting Started

  • Explore our search features at Web STC, or through one of the Telegram bots listed in the bio of our channel (not an ad, just a safety)
  • Discover how to set up your own STC instance, enabling you to enjoy the same search capabilities in your local environment
  • Learn about how to access large corpus of high-quality scholarly texts using Python and use them in AI apps

Details

In essence, STC is a search engine Summa coupled with databanks. These databanks reside on IPFS in a format that allows for searching without necessitating the download of the entire dataset. The search engine library can function as a standalone server, an embeddable Python library (requiring no additional software!), and a WASM-compiled module that can be used in a browser. Last way allows to embed search engine in a static site that further can be deployed over IPFS too. This is how Web STC is live.

Putting everything to IPFS allows you to open STC in your browser or on your server and avoid the use of centralized servers that may lose or censor data.

Components

  • Web STC is a browser-based interface with embedded search engine that can be entirely deployed on IPFS and used in browsers
  • GECK is a Python library and Bash tool for setting up and interacting with STC programmatically
  • Cybrex AI library pairs STC with AI tools such as OpenAI or free LLM for processing stored data
  • STC Hub API is plain API for accessing scholarly publications by their DOIs through kubo command line tools or even through HTTP.
  • Telegram Nexus Bot allows users to access STC via Telegram, one of the most popular messaging platforms.

Roadmap

Part Task Description
Library Stewardship
✅ Assimilation of LibGen corpus Transition of all items to nexus_science
🚧 Assimilation of SciMag corpus Significant task of transferring scimag corpus to IPFS
✅ Structured content Enhance GROBID extraction (headers + content) and store content in structured_content JSON column. Extract entities for cross-linking in Web STC
🚧 Implementing classification (articles, books)
Web STC
UX improvement STC often requires loading of large data chunks, currently reflected only by a spinner. The UX needs improvement. Following structured content implementation, we can highlight headers and generate cross-links in abstracts/content
Enhancing availability Further testing needed on diverse devices and networks
Bookshelf STC has all tools for generating bookshelves that may offer users high-quality suggestions on read.
Cybrex AI
First-class support of local LLM Extensive testing of prompts with documents is required to identify the smallest model capable of efficiently executing QA and summarization tasks. Most 13-15B models are currently failing (quantized, on CPU)
Building an embeddings dataset The goal is to build a comprehensive dataset with DOIs and document embeddings. Currently, the Instructor XL model appears most promising, but further testing is necessary
Refining and fixing metadata (cleaning content) Areas for improvement include: detected language, tags, keywords, automated abstracts, Dewey classification
Build QA on local LLM Such a system should be independently operable and also accessible via Telegram.
Fine-tuning LLMs on STC
Distribution
Building STC Box Develop and maintain a definitive guide and scripts for replicating and launching STC on compact devices like PI computers or TV Boxes
Global replication The goal is to replicate STC (including the search database and papers) a minimum of 100 times across at least 30 countries
Establishing Frontier Outposts Investigate strategies to replicate STC on an orbiting satellite or another planet in the solar system (Mars or Europa preferred)
Communities
Forming Science Communities on Telegram Initiate the first version of Telegram-based forums focusing on specific scientific topics
Addressing Copyright Issues Organize more activities aimed at challenging the copyright laws for scholarly and educational writings

stc's People

Contributors

izzortsi avatar the-superpirate avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stc's Issues

Node implementation?

Has anyone developed a node implementation that can search and download by doi?

Imidazole–Monoethanolamine-Based Deep Eutectic Solvent for Carbon Dioxide Capture: A Combined Experimental and Molecular Dynamics Investigation

Imidazole (IMI) and monoethanolamine (MEA) are mixed in various molar ratios to form a nonionic deep eutectic solvent (DES). This DES shows promising application for carbon dioxide (CO2) capture. Solubility of CO2 in the DES was directly related to changes in pressure while being inversely proportional to change in temperature. The highest CO2 loading of 0.711 mol CO2/mol DES was obtained at 30 °C, 10 bar and for a DES molar ratio of 1:4. Interestingly, upon addition of 50 vol % (47.62 wt %) water to the DES, the absorption capacity of the DES was almost doubled to 1.357 mol CO2/mol DES. The calculated Henry’s constant value and the negative CO2 absorption enthalpy indicate a strong interaction between the DES and a low regeneration energy requirement. Nonreactive molecular dynamics (MD) simulations were performed to investigate the local microstructure of IMI and MEA in neat and wet DES and the various key interactions responsible for CO2 absorption identified. The potential of mean force-based free energy MD calculations indicated that in the presence of water, the DES shows increased CO2 physisorption, consistent with our experimental results. The inclusion of water in the DES weakens the inter- and intramolecular interactions between MEA and IMI, which is observed from the reduction in peak heights for the various pairwise interactions obtained from molecular dynamics simulations. The weakening of the inter- and intramolecular hydrogen-bonding interactions in MEA and IMI in the presence of water results in the exposure of the amine and hydroxyl sites on MEA and the annular NH nitrogen group in IMI, thereby enabling such sites to interact favorably with CO2 and result in increased absorption. This fundamental study should open many avenues for more indepth investigations involving IMI/MEA-based DES and their potential selective absorption of other flue gases.

Database error

On web site for STC error:
DatabaseClosedError: NotFoundError Table MetaDb not part of transaction NotFoundError: Table MetaDb not part of transaction NotFoundError: Table MetaDb not part of transaction NotFoundError: Table MetaDb not part of transaction

installation problem with "summa-embed"

During the installation process, the build for the "summa-embed" package fails with a Rust compilation error. This error appears to be related to the Rust code compilation process, specifically during the build of the "summa-core" component. The error message suggests a problem with the native library build through cargo, leading to a non-zero exit status.

Error Details:
Compiling summa-core v0.17.18 (/tmp/pip-install-8yw6q_9t/summa-embed_7e4bfb0350794254be97a96cde62bd59/local_dependencies/summa-core)
error: could not compile summa-core (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit status: 101": PYO3_ENVIRONMENT_SIGNATURE="cpython-3.9-64bit" PYO3_PYTHON="/usr/bin/python3.9" PYTHON_SYS_EXECUTABLE="/usr/bin/python3.9" "cargo" "rustc" "--release" "--features" "pyo3/extension-module" "--manifest-path" "/tmp/pip-install-8yw6q_9t/summa-embed_7e4bfb0350794254be97a96cde62bd59/Cargo.toml" "--message-format" "json" "--lib"
warning: associated type documentsStream should have an upper camel case name
--> /tmp/pip-install-8yw6q_9t/summa-embed_7e4bfb0350794254be97a96cde62bd59/target/release/build/summa-proto-cf39bd6a57042ac9/out/summa.proto.rs:2949:14
|
2949 | type documentsStream: futures_core::Stream<
| ^^^^^^^^^^^^^^^ help: convert the identifier to upper camel case: DocumentsStream
|
= note: #[warn(non_camel_case_types)] on by default

  warning: 1 warning emitted


  error: couldn't read local_dependencies/summa-core/src/lib.rs: No such file or directory (os error 2)


  error: aborting due to previous error


  Error: command ['maturin', 'pep517', 'build-wheel', '-i', '/usr/bin/python3.9', '--compatibility', 'off'] returned non-zero exit status 1 
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for summa-embed
Failed to build summa-embed
ERROR: Could not build wheels for summa-embed, which is required to install pyproject.toml-based projects

Environment:
Python version: 3.9
Docker container: hopeful_bohr
Operating System: Windows 11

Steps to Reproduce:
Clone the "cybrex" package repository.
Follow the installation instructions using Python 3.9 within a Docker container.
Observe the compilation failure during the build of the "summa-embed" package.

Expected Outcome:
Successful compilation and installation of the "summa-embed" package, allowing seamless usage of the "cybrex" package.

I have a bug,

I can't download any more papers. Bot says i have a bug.
What do I do?

Broken/missing file, but unable to report/request

The file for https://libstc.cc/#/nexus_science/id.nexus_id:26dss13ixr6rrsg53lbk4auwt is broken or at least inaccessible.

A telegram nexus bot returns "Oops! Something goes wrong and we are trying hard to revive. Please, try a little bit later." There is no way to report this type of issue to the telegram bot or to request the file from the nexus defiler bot. Ongoing for weeks now.

Update 4/18/2024: The download link seems to have been fixed, but I'm not sure what was the cause of the issue, and other document objects seem to have been affected by this issue (and now resolved), so I will wait to close this issue until there's more information

pyo3_asyncio.RustPanic: rust future panicked

Hi,

Thanks for creating such powerful creative tool.
I followed the post on Reddit to use the STC.

I can download a file successfully with this command last week:
stc-tools - download doi:10.1177/1745691612459058 file.pdf

But today, I failed to run it and got pyo3_asyncio.RustPanic: rust future panicked error.

(stc) PS E:\> stc-tools - download doi:10.1177/1745691612459058 file.pdf
INFO: Setting up indices: /ipns/standard-template-construct.org/data/nexus_science/...
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Tantivy(IoError(Custom { kind: Other, error: "Unsuported dictionary version, expected 1, found 0" }))', summa-embed-py\src\lib.rs:84:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "E:\conda\miniconda3\envs\stc\Scripts\stc-tools.exe\__main__.py", line 7, in <module>
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\cli.py", line 94, in main
    fire.Fire(stc_tools_cli, name='stc-tools')
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\fire\core.py", line 689, in _CallAndUpdateTrace
    component = loop.run_until_complete(fn(*varargs, **kwargs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\cli.py", line 64, in download
    results = await self.search(query, index_name=index_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\cli.py", line 43, in search
    await self.setup()
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\client.py", line 38, in setup
    description = await self.index_registry.add({'config': {'remote': {
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_asyncio.RustPanic: rust future panicked

Environment

  • OS: Windows 10
  • stc-tools version: 1.0.10

Implement classification for papers

Motivation

Classification of papers is essential task, it solves two tasks: creation of navigational menu in bot and web and also allows to cherry-pick papers on specific topic for mass processing.

The task suggests creation of classifier that takes publication metadata and derives a list of highly likely classes for the record.

Classification approach

https://www.frontiersin.org/articles/10.3389/frma.2023.1149834/full

This approach is described in the paper but have no any sources. One way can be reaching authors and requesting sources for kick-starting implementation

Technical description

What is needed: library that accepts paper description by the dict of the following format

authors: List[{first: str, given: str, name: str}]}
abstract?: str
content?: str
id: {dois: List[str]}
issued_at?: int
languages: List[str]
metadata?: {container_title?: str, publisher?: str}
tags?: List[str]
title: str

and returns SciNobo class for the paper. Fields are more precisely described in the schema. Consider all fields except title and abstract as absent most of times.

How to Start

pip install stc-geck
geck - documents

You will receive a stream of documents that is a subject of the task.

Broken link

The repo's provided link to your search engine website has a typo.

STC doesn't work in firefox based browsers.

Hey,
although Safari seems to work just fine, in firefox, librewolf and TOR I just can't get past the loading screen.

Screenshot 2023-08-05 at 23 06 48

Plugins are disabled.

The proper functionality would be great since I am not willing to switch to another browser.

Thanks, keep up your work.

Frozen upon loading

It will not list papers for me. I have tried different devices. I also tried on public gateway an Local IPFS.
image

Unable to seed - make the list public?

Hi,
while trying to help this project, everytime I get just:
wait, we are generating file...
Oops! Something goes wrong and we are trying hard to revive. Please, try a little bit later.

This need to work. I am willing to give this project some TB, but i need something to share first.

How about making the IPFS list a littlebit more public?..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.