Git Product home page Git Product logo

docsearch-configs's Introduction

DocSearch configurations

DEPRECATED

This repository is not maintained anymore in favor of our new infrastructure.

All of the configs can now be edited directly from our web interface which also offer you a way to start new crawls.

If you have not joined your new application yet, please check your emails! :D

Summary

If you're looking for a way to add DocSearch to your site, the easiest solution is to apply to DocSearch. If you want to have a look at configurations to run your own scraper you're at the right place.

Options

Please check the dedicated documentation to have the list of all available options along with examples.

Useful links

docsearch-configs's People

Contributors

1ceb3rg avatar anneramey avatar beutlich avatar bodinsamuel avatar clemfromspace avatar damilola-paystack avatar damithc avatar dyncmark avatar elpicador avatar eyworldwide avatar gladius-mtl avatar iamsameeraliyanage avatar janpetr avatar joelmarcey avatar jshah4517 avatar m-turek avatar maxiloc avatar nannanli avatar niden avatar phrawzty avatar pixelastic avatar posva avatar redox avatar robertmogos avatar shipow avatar shortcuts avatar solugebefola avatar tomklotzpro avatar unrealwork avatar xuechunl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docsearch-configs's Issues

Display search results content in the correct format

Do you want to request a feature or report a bug?

What is the current behaviour?

The output of {{{_highlightResult.content.value}}} shows only text, as if the results were just a paragraph:

img_197eb6bd7f74-1

What is the expected behaviour?

  • Show the headers:

screen shot 2018-03-23 at 11 23 21

  • Display code in code blocks, like Algolia does for their docs:

img_2a7bf74b1683-1

What have you tried to solve it?

I don't know how to solve this, couldn't find anywhere - I'm probably searching for the wrong query.

@s-pace thoughts?

Why delete babeljs_cn ๏ผŸ

Hey, guys! I'm the maintainer of the babeljs.cn. I found that the docsearch of babeljs_cn was deleted, but I don 't know why. Please tell me why and how to solve it.
commit: e0cc0b8

Reindexing after domain changed

Hi,

Here is my config file: https://github.com/algolia/docsearch-configs/blob/master/configs/uniwebview.json

I've changed my domain from "unidocs.onevcat.com" to "docs.uniwebview.com" several days ago. However, the navigation destination of search result are still pointing to the old domain.

I guess it is due to the old index in using and the new ones are not valid yet.

I want to confirm is there anything like expiring duration for index? What is the reindex policy and is it possible to request a reindex immediately?

Thanks!

Update drone_io documentation

hey all, I was hoping for some help updating the indexing for the drone_io documentation. We have moved the documentation:

{
  "index_name": "drone_io",
  "start_urls": [
-   "http://readme.drone.io/"
+   "http://docs.drone.io/sitemap/"
  ],

We have also adjusted the structure. I was hoping that perhaps Algolia could crawl the documentation using the sitemap only (at http://docs.drone.io/sitemap/). This would allow us to make structural changes to the main documentation, without having to re-configure the crawling, since the sitemap structure would never change.

Do you think this would be possible?

I would offer to submit a pull request but was unsure if the below notation was correct, and I was having trouble setting up an environment to test myself (I will keep trying, though)

-    "lvl0": "header nav a.selected",
-    "lvl1": "main h1",
-    "lvl2": "main h2",
+    "lvl0": "body > ul > li > span",
+    "lvl1": "body > ul > li > ul > li > span",
+    "lvl2": "body > ul > li > ul > li > ul > li > a",

public "metadata" in the docsearch configs?

It would be useful for tracking and reporting purposes to have some fields in the configs that aren't necessarily used by the scraper. Immediately I'd like to add:

  • name: Canonical, human-readable label (such as company or project name) for a given config.
  • human_url: A URL that a human could go to and get the documentation site.
    I'm not married to those key names - suggestions welcome. ๐Ÿ˜„

URLs not on sitemap are indexed

Do you want to request a feature or report a bug?

If it is a DocSearch index issue, what is the related index_name ?

index_name= pkgdown

What is the current behaviour?

Files that are not in the sitemap.xml are included in the index.

What is the expected behaviour?

Files that are not in the sitemap.xml should not be included in the index.

Summary

The pkgdown index includes the "Contributor Code of Conduct" page, which is not in the sitemap.xml.

To reproduce, go to the pkgdown website and search for "Contributor"; it's the first result.

The pkgdown config lists the sitemap.xml. Why is this page (and presumably other unwanted pages) included in the index?

From r-lib/pkgdown#626 (comment)

Parse same website with 2 designs

Hi Algolia team !

I have a question about this configuration
https://github.com/algolia/docsearch-configs/blob/master/configs/akeneo.json

We will introduce a new design (same as https://api.akeneo.com/ ) for the v2.0 (the old design is still here : https://docs.akeneo.com/2.0/index.html)

So the parsing configuration between paths https://docs.akeneo.com/1.x/ and https://docs.akeneo.com/2.x/ will not be the same.

My question is: is it possible to have 2 configurations (one for 1.x and one for 2.x) ? If not, don't worry, we will bring back the new design for the previous 1.x paths.

Regars

Pierre
Akeneo

Add `nb_hits_max` to documentation

It looks like a new parameter nb_hits_max has been introduced to the scraper. It would be great if information regarding this parameter is included in the documentation here ๐Ÿ˜„

Search not detecting lvl0 selector on some pages

For https://thumbprint.thumbtack.com our lvl0 selector looks like:

"lvl0": {
 "selector": "//*[@data-id='header__links']//a[@data-active='true']",
 "type": "xpath",
  "default_value": "Documentation"
},

https://github.com/algolia/docsearch-configs/blob/master/configs/thumbprint.json

When I search for the page title "Using Thumbprint in Sass" โ€” https://thumbprint.thumbtack.com/guide/creating-pages/ โ€” the search result correctly categorizes it under "Guide".

But if I search for the pages titles of the following pages:

It categorizes them under "Documentation" instead of "Guide".

In this screenshot "https://thumbprint.thumbtack.com/guide/utility-classes/" is among the results, note that it's categorized under "Documentation"

screen shot 2017-11-17 at 10 16 07 am

I've confirmed the lvl0 xpath works on those pages so am not sure what would have caused it to fail. Maybe your crawler searched cached pages that didn't have this selector available?

CI that runs on PRs of configs

If it's an existing config, run that one and suggest to update the nbHits in a comment. You could use Danger for this, since it's an easy way to run some things and comment on GitHub.

Another thing that could be checked is whether it's valid JSON, and if it's a valid docsearch config (by checking if necessary keys are present, and if they have the right values)

Indexing ellipsis

Do you want to request a feature or report a bug?

More feature than bug

If it is a DocSearch index issue, what is the related index_name ?

index_name= pkgdown

What is the current behaviour?

May R functions make use of ellipsis (...) to catch arguments in function calls. In a pkgdown documentation website these appear in the Argument list (see third argument on this page).

We capture most arguments and their values with the current docsearch pkgdown config, but ellipsis are not indexed because they are considered punctuation.

What is the expected behaviour?

I'd like ... to be included in search results as an argument value.

What have you tried to solve it?

I have tried including a period (.) in separatorsToIndex but the ellipsis are still not indexed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.