nexus-stc / stc Goto Github PK

Distributed free search engine and AI tools that grant access to knowledge

Home Page: http://standard-template-construct.org

TypeScript 56.61% Python 33.33% Jupyter Notebook 1.84% Dockerfile 0.05% JavaScript 0.10% HTML 0.09% Shell 0.05% Vue 7.65% SCSS 0.28%

books database ipfs knowledge scholarly-articles summa

stc's Introduction

Standard Template Construct

Welcome, developer! You've arrived at the repository for STC, the library, search engine and AI tooling offering free access to academic knowledge and works of fictional literature.

STC | Help Center

Getting Started

Explore our search features at Web STC, or through one of the Telegram bots listed in the bio of our channel (not an ad, just a safety)
Discover how to set up your own STC instance, enabling you to enjoy the same search capabilities in your local environment
Learn about how to access large corpus of high-quality scholarly texts using Python and use them in AI apps

Details

In essence, STC is a search engine Summa coupled with databanks. These databanks reside on IPFS in a format that allows for searching without necessitating the download of the entire dataset. The search engine library can function as a standalone server, an embeddable Python library (requiring no additional software!), and a WASM-compiled module that can be used in a browser. Last way allows to embed search engine in a static site that further can be deployed over IPFS too. This is how Web STC is live.

Putting everything to IPFS allows you to open STC in your browser or on your server and avoid the use of centralized servers that may lose or censor data.

Components

Web STC is a browser-based interface with embedded search engine that can be entirely deployed on IPFS and used in browsers
GECK is a Python library and Bash tool for setting up and interacting with STC programmatically
Cybrex AI library pairs STC with AI tools such as OpenAI or free LLM for processing stored data
STC Hub API is plain API for accessing scholarly publications by their DOIs through kubo command line tools or even through HTTP.
Telegram Nexus Bot allows users to access STC via Telegram, one of the most popular messaging platforms.

Roadmap

Part	Task	Description
Library Stewardship
	✅ Assimilation of LibGen corpus	Transition of all items to `nexus_science`
	🚧 Assimilation of SciMag corpus	Significant task of transferring scimag corpus to IPFS
	✅ Structured content	Enhance GROBID extraction (headers + content) and store content in structured_content JSON column. Extract entities for cross-linking in Web STC
	🚧 Implementing classification (articles, books)
Web STC
	UX improvement	STC often requires loading of large data chunks, currently reflected only by a spinner. The UX needs improvement. Following structured content implementation, we can highlight headers and generate cross-links in abstracts/content
	Enhancing availability	Further testing needed on diverse devices and networks
	Bookshelf	STC has all tools for generating bookshelves that may offer users high-quality suggestions on read.
Cybrex AI
	First-class support of local LLM	Extensive testing of prompts with documents is required to identify the smallest model capable of efficiently executing QA and summarization tasks. Most 13-15B models are currently failing (quantized, on CPU)
	Building an embeddings dataset	The goal is to build a comprehensive dataset with DOIs and document embeddings. Currently, the Instructor XL model appears most promising, but further testing is necessary
	Refining and fixing metadata (cleaning `content`)	Areas for improvement include: detected language, tags, keywords, automated abstracts, Dewey classification
	Build QA on local LLM	Such a system should be independently operable and also accessible via Telegram.
	Fine-tuning LLMs on STC
Distribution
	Building STC Box	Develop and maintain a definitive guide and scripts for replicating and launching STC on compact devices like PI computers or TV Boxes
	Global replication	The goal is to replicate STC (including the search database and papers) a minimum of 100 times across at least 30 countries
	Establishing Frontier Outposts	Investigate strategies to replicate STC on an orbiting satellite or another planet in the solar system (Mars or Europa preferred)
Communities
	✅ Forming Science Communities on Telegram	Initiate the first version of Telegram-based forums focusing on specific scientific topics
	Addressing Copyright Issues	Organize more activities aimed at challenging the copyright laws for scholarly and educational writings

stc's People

Contributors

Stargazers

Watchers

stc's Issues

http://dx.doi.org/10.22251/jlcci.2023.23.1.689

id.nexus_id:37xucdl1i7012128jmkhr1c7j (Unable to download article)

Hi,
tried on telegram bot and on stc, but despite listed as available, unable to download article. Tried for over a span of one week.

id.nexus_id:37xucdl1i7012128jmkhr1c7j

Thank you

Bot says i have a bug

I can't download any more papers. Bot says i have a bug.
What do I do?

Unable to download as it says url is incorrect

I am downloading Harvard medical school books but it says url is not correct

Node implementation?

Has anyone developed a node implementation that can search and download by doi?

Imidazole–Monoethanolamine-Based Deep Eutectic Solvent for Carbon Dioxide Capture: A Combined Experimental and Molecular Dynamics Investigation

Imidazole (IMI) and monoethanolamine (MEA) are mixed in various molar ratios to form a nonionic deep eutectic solvent (DES). This DES shows promising application for carbon dioxide (CO2) capture. Solubility of CO2 in the DES was directly related to changes in pressure while being inversely proportional to change in temperature. The highest CO2 loading of 0.711 mol CO2/mol DES was obtained at 30 °C, 10 bar and for a DES molar ratio of 1:4. Interestingly, upon addition of 50 vol % (47.62 wt %) water to the DES, the absorption capacity of the DES was almost doubled to 1.357 mol CO2/mol DES. The calculated Henry’s constant value and the negative CO2 absorption enthalpy indicate a strong interaction between the DES and a low regeneration energy requirement. Nonreactive molecular dynamics (MD) simulations were performed to investigate the local microstructure of IMI and MEA in neat and wet DES and the various key interactions responsible for CO2 absorption identified. The potential of mean force-based free energy MD calculations indicated that in the presence of water, the DES shows increased CO2 physisorption, consistent with our experimental results. The inclusion of water in the DES weakens the inter- and intramolecular interactions between MEA and IMI, which is observed from the reduction in peak heights for the various pairwise interactions obtained from molecular dynamics simulations. The weakening of the inter- and intramolecular hydrogen-bonding interactions in MEA and IMI in the presence of water results in the exposure of the amine and hydroxyl sites on MEA and the annular NH nitrogen group in IMI, thereby enabling such sites to interact favorably with CO2 and result in increased absorption. This fundamental study should open many avenues for more indepth investigations involving IMI/MEA-based DES and their potential selective absorption of other flue gases.

The link in your Github "About" is wrong.

Its http://standard-template-consturct.org/

u and r are switched.

Database error

On web site for STC error:
DatabaseClosedError: NotFoundError Table MetaDb not part of transaction NotFoundError: Table MetaDb not part of transaction NotFoundError: Table MetaDb not part of transaction NotFoundError: Table MetaDb not part of transaction

installation problem with "summa-embed"

During the installation process, the build for the "summa-embed" package fails with a Rust compilation error. This error appears to be related to the Rust code compilation process, specifically during the build of the "summa-core" component. The error message suggests a problem with the native library build through cargo, leading to a non-zero exit status.

Error Details:
Compiling summa-core v0.17.18 (/tmp/pip-install-8yw6q_9t/summa-embed_7e4bfb0350794254be97a96cde62bd59/local_dependencies/summa-core)
error: could not compile summa-core (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...
💥 maturin failed
Caused by: Failed to build a native library through cargo
Caused by: Cargo build finished with "exit status: 101": PYO3_ENVIRONMENT_SIGNATURE="cpython-3.9-64bit" PYO3_PYTHON="/usr/bin/python3.9" PYTHON_SYS_EXECUTABLE="/usr/bin/python3.9" "cargo" "rustc" "--release" "--features" "pyo3/extension-module" "--manifest-path" "/tmp/pip-install-8yw6q_9t/summa-embed_7e4bfb0350794254be97a96cde62bd59/Cargo.toml" "--message-format" "json" "--lib"
warning: associated type documentsStream should have an upper camel case name
--> /tmp/pip-install-8yw6q_9t/summa-embed_7e4bfb0350794254be97a96cde62bd59/target/release/build/summa-proto-cf39bd6a57042ac9/out/summa.proto.rs:2949:14
|
2949 | type documentsStream: futures_core::Stream<
| ^^^^^^^^^^^^^^^ help: convert the identifier to upper camel case: DocumentsStream
|
= note: #[warn(non_camel_case_types)] on by default

  warning: 1 warning emitted


  error: couldn't read local_dependencies/summa-core/src/lib.rs: No such file or directory (os error 2)


  error: aborting due to previous error


  Error: command ['maturin', 'pep517', 'build-wheel', '-i', '/usr/bin/python3.9', '--compatibility', 'off'] returned non-zero exit status 1 
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for summa-embed
Failed to build summa-embed
ERROR: Could not build wheels for summa-embed, which is required to install pyproject.toml-based projects

Environment:
Python version: 3.9
Docker container: hopeful_bohr
Operating System: Windows 11

Steps to Reproduce:
Clone the "cybrex" package repository.
Follow the installation instructions using Python 3.9 within a Docker container.
Observe the compilation failure during the build of the "summa-embed" package.

Expected Outcome:
Successful compilation and installation of the "summa-embed" package, allowing seamless usage of the "cybrex" package.

Implement classification for books

I have a bug,

I can't download any more papers. Bot says i have a bug.
What do I do?

ARTICULO CIENTIFICO

NO CARGA EL ARTICULO COLOCANDO EL DOI 10.1001/jama.2022.2350

Novel alkanolamine-based biphasic solvent for CO2 capture with low energy consumption and phase change mechanism analysis

Broken/missing file, but unable to report/request

The file for https://libstc.cc/#/nexus_science/id.nexus_id:26dss13ixr6rrsg53lbk4auwt is broken or at least inaccessible.

A telegram nexus bot returns "Oops! Something goes wrong and we are trying hard to revive. Please, try a little bit later." There is no way to report this type of issue to the telegram bot or to request the file from the nexus defiler bot. Ongoing for weeks now.

Update 4/18/2024: The download link seems to have been fixed, but I'm not sure what was the cause of the issue, and other document objects seem to have been affected by this issue (and now resolved), so I will wait to close this issue until there's more information

موسوی، مرتضی؛ رئیسی، جلیل؛ یحیی‌پور، حسین. (1398). «توسعه پایدار در مقام یک راهبرد سیاست خارجی در دیپلماسی انرژی»، دانشگاه علوم انسانی و فناوری‌های نوین.

How to install IPFS without sudo access?

pyo3_asyncio.RustPanic: rust future panicked

Hi,

Thanks for creating such powerful creative tool.
I followed the post on Reddit to use the STC.

I can download a file successfully with this command last week:
stc-tools - download doi:10.1177/1745691612459058 file.pdf

But today, I failed to run it and got pyo3_asyncio.RustPanic: rust future panicked error.

(stc) PS E:\> stc-tools - download doi:10.1177/1745691612459058 file.pdf
INFO: Setting up indices: /ipns/standard-template-construct.org/data/nexus_science/...
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Tantivy(IoError(Custom { kind: Other, error: "Unsuported dictionary version, expected 1, found 0" }))', summa-embed-py\src\lib.rs:84:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "E:\conda\miniconda3\envs\stc\Scripts\stc-tools.exe\__main__.py", line 7, in <module>
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\cli.py", line 94, in main
    fire.Fire(stc_tools_cli, name='stc-tools')
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\fire\core.py", line 689, in _CallAndUpdateTrace
    component = loop.run_until_complete(fn(*varargs, **kwargs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\cli.py", line 64, in download
    results = await self.search(query, index_name=index_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\cli.py", line 43, in search
    await self.setup()
  File "E:\conda\miniconda3\envs\stc\Lib\site-packages\stc_tools\client.py", line 38, in setup
    description = await self.index_registry.add({'config': {'remote': {
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_asyncio.RustPanic: rust future panicked

Environment

OS: Windows 10
stc-tools version: 1.0.10

Это ваша хохлоблядь в asschandmustdie_bot?

У него какая стадия олигофрении?

Paper cannot be downloaded (times out) [DOI: 10.1038/s41562-023-01681-y ]

https://libstc.cc/#/nexus_science/id.nexus_id:24j6q66ssfs59g1qflid31t66 's download link times out every time I have tried it over the past 9 days, from multiple browsers & IPs, as well as for another person.

Implement classification for papers

Motivation

Classification of papers is essential task, it solves two tasks: creation of navigational menu in bot and web and also allows to cherry-pick papers on specific topic for mass processing.

The task suggests creation of classifier that takes publication metadata and derives a list of highly likely classes for the record.

Classification approach

https://www.frontiersin.org/articles/10.3389/frma.2023.1149834/full

This approach is described in the paper but have no any sources. One way can be reaching authors and requesting sources for kick-starting implementation

Technical description

What is needed: library that accepts paper description by the dict of the following format

authors: List[{first: str, given: str, name: str}]}
abstract?: str
content?: str
id: {dois: List[str]}
issued_at?: int
languages: List[str]
metadata?: {container_title?: str, publisher?: str}
tags?: List[str]
title: str

and returns SciNobo class for the paper. Fields are more precisely described in the schema. Consider all fields except title and abstract as absent most of times.

How to Start

pip install stc-geck
geck - documents

You will receive a stream of documents that is a subject of the task.

Broken link

The repo's provided link to your search engine website has a typo.

STC doesn't work in firefox based browsers.

Hey,
although Safari seems to work just fine, in firefox, librewolf and TOR I just can't get past the loading screen.

Plugins are disabled.

The proper functionality would be great since I am not willing to switch to another browser.

Thanks, keep up your work.

Frozen upon loading

It will not list papers for me. I have tried different devices. I also tried on public gateway an Local IPFS.

Unable to seed - make the list public?

Hi,
while trying to help this project, everytime I get just:
wait, we are generating file...
Oops! Something goes wrong and we are trying hard to revive. Please, try a little bit later.

This need to work. I am willing to give this project some TB, but i need something to share first.

How about making the IPFS list a littlebit more public?..