Comments (3)
It seems that the implementation with tokio::spawn
doesn't work as it requires to implement the Send
and Sync
traits.
Unfortunately, The current implementation with tokio::spawn
given below is invalid:
//! This module provides the functionality to scrape and gathers all the results from the upstream
//! search engines and then removes duplicate results.
use std::{collections::HashMap, rc::Rc, time::Duration, vec};
use rand::Rng;
use super::{
aggregation_models::{RawSearchResult, SearchResult, SearchResults},
user_agent::{self, random_user_agent},
};
use crate::engines::{duckduckgo, engine_models::SearchEngine, searx};
/// A function that aggregates all the scraped results from the above upstream engines and
/// then removes duplicate results and if two results are found to be from two or more engines
/// then puts their names together to show the results are fetched from these upstream engines
/// and then removes all data from the HashMap and puts into a struct of all results aggregated
/// into a vector and also adds the query used into the struct this is neccessory because
/// otherwise the search bar in search remains empty if searched from the query url
///
/// # Example:
///
/// If you search from the url like `https://127.0.0.1/search?q=huston` then the search bar should
/// contain the word huston and not remain empty.
///
/// # Arguments
///
/// * `query` - Accepts a string to query with the above upstream search engines.
/// * `page` - Accepts an u32 page number.
/// * `random_delay` - Accepts a boolean value to add a random delay before making the request.
///
/// # Error
///
/// Returns an error a reqwest and scraping selector errors if any error occurs in the results
/// function in either `searx` or `duckduckgo` or both otherwise returns a `SearchResults struct`
/// containing appropriate values.
pub async fn aggregate(
query: &str,
page: u32,
random_delay: bool,
debug: bool,
) -> Result<SearchResults, Box<dyn std::error::Error>> {
let user_agent: String = random_user_agent();
let mut result_map: HashMap<String, RawSearchResult> = HashMap::new();
// Add a random delay before making the request.
if random_delay || !debug {
let mut rng = rand::thread_rng();
let delay_secs = rng.gen_range(1..10);
std::thread::sleep(Duration::from_secs(delay_secs));
}
let engines: Vec<Box<dyn SearchEngine + Send + Sync>> =
vec![Box::new(duckduckgo::Duckduckgo), Box::new(searx::Searx)];
let mut tasks = Vec::with_capacity(engines.len());
for engine in engines {
tasks.push(tokio::spawn(async move {
engine.results(query, page, &user_agent)
}))
}
let initial: bool = true;
for task in tasks {
if initial {
result_map.extend(task.await?);
initial = false
} else {
task.into_iter().for_each(|(key, value)| {
result_map
.entry(key)
.and_modify(|result| {
result.add_engines(value.clone().engine());
})
.or_insert_with(|| -> RawSearchResult {
RawSearchResult::new(
value.title.clone(),
value.visiting_url.clone(),
value.description.clone(),
value.engine.clone(),
)
});
});
}
}
Ok(SearchResults::new(
result_map
.into_iter()
.map(|(key, value)| {
SearchResult::new(
value.title,
value.visiting_url,
key,
value.description,
value.engine,
)
})
.collect(),
query.to_string(),
))
}
Do you have any solution to the problem or any alternative implementation @alamin655 @xffxff ?? 🙂
from websurfx.
Do you have any solution to the problem or any alternative implementation @alamin655 @xffxff ?? 🙂
How about using rayon
or futures
?
from websurfx.
Do you have any solution to the problem or any alternative implementation @alamin655 @xffxff?? slightly_smiling_face
How about using
rayon
orfutures
?
Sorry for being late to reply.
The problem with rayon
is that it is only multi-threading crate which is also good, but the sync/await
works in rust is like it uses time slicing between each async
task thus it tends to be a lot faster than simple threading and also tokio
does support async multi-threading
using spawn
function and also in our case since the task we are performing is web based task so we have to wait for the response (in other words awaitable
) so I feel it would be more reasonable to use async multi-threading
, What do you say??
Also for the futures
crate but I couldn't get join_all
to work properly. That's why I avoid using it 🙂.
from websurfx.
Related Issues (20)
- :zap: Compression for the page responses of the search engine HOT 1
- :children_crossing: Display the user provided settings from the config or the UI in the settings page HOT 1
- 🐛 Results from different search engines get cached as the same key HOT 4
- :recycle: Standardize the `content-type` header by using an enum value over typing it manually HOT 5
- 🐛 Pagination for the upstream search engines not working HOT 1
- 🐛 `Librex` engine only returns one result on the search page HOT 1
- 🐛 Minification of javascript files fails while running the build command HOT 1
- :recycle: `Json` api support for the upstream search engines HOT 7
- :zap: Several optimizations for improving the performance of the engine HOT 2
- ✨ Categories tabs for different search content in the search page HOT 1
- 📝 Revise the `docs` to remain in sync with the current changes HOT 1
- 👷 Clippy/Format checking/linting GitHub action to analyze code for all the features HOT 1
- 🐛 Undeclared `mini-mocha` crate error when building the app with features other than `memory-cache` HOT 1
- 📝 `Stargazers` roaster link in the `readme` HOT 4
- 🐛 `parsed_cet` not found in scope error when building the app with the `no-cache` feature HOT 1
- 📝 Maintained badge/shield status in the `readme` from `stale` to `yes/maintained` HOT 1
- :sparkles: Support for the ARM architecture (raspberry pi) HOT 2
- :children_crossing: Thin `lto` for the compilation instead of fat `lto` to improve build times HOT 5
- ✨ Support json for the config format HOT 3
- ✨ restructure the config to have different layers for better composability HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from websurfx.