Comments (8)
I added logs before and after random_user_agent() and found that its processing time even exceeded 10 seconds. Is this expected?
This was unexpected 🙂 After testing it on my side it turns out that for me it takes 7 seconds so it seems that it has to do with the performance of the system
but we can try enabling one option, let's change the caching option from false
to true
for generating user agent and let's see if it improves the speed 🙂 .
The file that needs to be changed is located under src/search_results_handler/user_agent.rs
.
from websurfx.
So after studying and digging deep into the crate itself, I found that the crate actually fetches data from the upstream website and scrapes it to get the required user agents that's why it causes a delay and also it looks like the project has been abandoned because the last commit seems to be 5 years which is a very long time for an open source repository.
Also, enabling the cache option did improve speed slightly by 2-3 seconds but I think that having a delay of 5 seconds seems to be good to allow some random delay to occur between requests which help to evade IP blocking I can think of reducing the random time delay that I have added in the code from 1-10 secs to 1-5 seconds to improve speed. What do you say @xffxff??
Also, maybe in the future, we might need to either explore an alternative for this crate or maybe implement our own 😄 .
from websurfx.
Also, enabling the cache option did improve speed slightly by 2-3 seconds but I think that having a delay of 5 seconds seems to be good to allow some random delay to occur between requests which help to evade IP blocking I can think of reducing the random time delay that I have added in the code from 1-10 secs to 1-5 seconds to improve speed
@neon-mmd Hmm, Do we have to insert a delay between different requests? This may conflict with our lighting-fast goal. Additionally, when there are many concurrent search requests, even with a delay, there will still be a lot of requests to the engine at the same time.
from websurfx.
websurfx/src/search_results_handler/user_agent.rs
Lines 10 to 26 in 84dc6a9
I believe that it is unnecessary to construct a new UserAgent
object for every request, as all requests can utilize the same instance of UserAgent
. Instead, you can call UserAgent.random()
to obtain a random user agent string for each request. Here is an example of how this can be implemented:
// Construct the UserAgent object once when the server starts
let user_agents = UserAgentsBuilder::new()
.cache(false)
.dir(/tmp)
.thread(1)
.set_browsers(
Browsers::new()
.set_chrome()
.set_safari()
.set_edge()
.set_firefox()
.set_mozilla(),
)
.build()
...
// Retrieve a random user agent string in aggregator.rs
let user_agent = user_agents.random().to_string()
from websurfx.
I think @xffxff is right. Here is my implementation using the lazy_static
crate:
use fake_useragent::{Browsers, UserAgents, UserAgentsBuilder};
use lazy_static::lazy_static;
lazy_static! {
static ref USER_AGENTS: UserAgents = {
UserAgentsBuilder::new()
.cache(false)
.dir("/tmp")
.thread(1)
.set_browsers(
Browsers::new()
.set_chrome()
.set_safari()
.set_edge()
.set_firefox()
.set_mozilla(),
)
.build()
};
}
/// A function to generate a random user agent to improve privacy of the user.
///
/// # Returns
///
/// A randomly generated user agent string.
pub fn random_user_agent() -> String {
USER_AGENTS.random().to_string()
}
from websurfx.
@neon-mmd Hmm, Do we have to insert a delay between different requests? This may conflict with our lighting-fast goal. Additionally, when there are many concurrent search requests, even with a delay, there will still be a lot of requests to the engine at the same time.
No, actually we need it because if we do not add a random delay between requests especially for large-scale server use cases as these servers will have thousands of users and will create a lot of traffic and this, in turn, may cause the upstream search engines to get DDoSed which is not good and they might ban the IP that caused the DDoS but I can see one option like having a config option like production_use
which when enabled puts random delay lets say after every 4 concurrent requests and when disabled it either it reduces the random delays or removes it completely this will be very helpful for small scale use like if you are hosting on your home server just for own use. What do you say @xffxff @alamin655 ??
from websurfx.
I think @xffxff is right. Here is my implementation using the
lazy_static
crate:use fake_useragent::{Browsers, UserAgents, UserAgentsBuilder}; use lazy_static::lazy_static; lazy_static! { static ref USER_AGENTS: UserAgents = { UserAgentsBuilder::new() .cache(false) .dir("/tmp") .thread(1) .set_browsers( Browsers::new() .set_chrome() .set_safari() .set_edge() .set_firefox() .set_mozilla(), ) .build() }; } /// A function to generate a random user agent to improve privacy of the user. /// /// # Returns /// /// A randomly generated user agent string. pub fn random_user_agent() -> String { USER_AGENTS.random().to_string() }
This looks good 👍 but after doing some research to see whether there are any better and faster implementations than this I found that lazy_static
seems to be a bit slow and there is even better and faster crate for the same called once_cell
and it also has been merged into std::lazy
and is available as an experimental feature right now in the nightly version so I see once_cell
should be the way to go forward. What do you say @alamin655 ??
Here are some links to follow:
- https://dev.to/rimutaka/rusts-lazystatic-usage-benchmarks-and-code-deep-dive-1bic
- https://docs.rs/once_cell/latest/once_cell/
- https://docs.w3cub.com/rust/std/lazy/struct.lazy
- rust-lang/rust#74465
from websurfx.
No, actually we need it because if we do not add a random delay between requests especially for large-scale server use cases as these servers will have thousands of users and will create a lot of traffic and this, in turn, may cause the upstream search engines to get DDoSed which is not good and they might ban the IP that caused the DDoS but I can see one option like having a config option like
production_use
which when enabled puts random delay lets say after every 4 concurrent requests and when disabled it either it reduces the random delays or removes it completely this will be very helpful for small scale use like if you are hosting on your home server just for own use
@neon-mmd Thank you for the explanation! I think you are right and having a config option like production_use
is helpful
from websurfx.
Related Issues (20)
- 👽️ `Duckduckgo` engine code according to the new `html` changes HOT 1
- 💄 Redesign the theme design for the `about` page HOT 5
- 🚚 A more resilient and reliable `searx` instance HOT 1
- 📝 Revise the `docs` to remain in sync with the current changes HOT 1
- 🐛 Clicked Title Becomes Invisible on Web Page HOT 7
- ⚡️ Improve page load by optimizing the caching code HOT 2
- 📱 Redesign the `simple` theme to make it responsive to screen size changes HOT 1
- 🧑💻 Install `cargo-watch` via the `nix develop` command HOT 3
- 🐛 `Cargo test` failure when developing work on the app using `nix develop` due to missing `openssl` package HOT 3
- 💫 More `animations` for the search engine website HOT 1
- 💄 Replace the search button with a magnifying glass HOT 3
- 📝 NixOS development environment setup HOT 3
- :bug: Explicit `Content-Type` header for HTTP response HOT 3
- :recycle: The page number for upstream search queries should start at 0 HOT 3
- :zap: Compression for the page responses of the search engine HOT 1
- :children_crossing: Display the user provided settings from the config or the UI in the settings page HOT 1
- 🐛 Results from different search engines get cached as the same key HOT 4
- :recycle: Standardize the `content-type` header by using an enum value over typing it manually HOT 5
- 🐛 Pagination for the upstream search engines not working HOT 1
- 🐛 `Librex` engine only returns one result on the search page HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from websurfx.