Git Product home page Git Product logo

solrclient's Introduction

Documentation Status

SolrClient

SolrClient 0.2.2

SolrClient is a simple python library for Solr; built in python3 with support for latest features of Solr 5 and 6. Development is heavily focused on indexing as well as parsing various query responses and returning them in native python data structures. Several helper classes will be built to automate querying and management of Solr clusters.

Enhancements in version 0.2.0:

  • Basic parsing for json.facet output
  • Better support for grouped results (SolrResponse)
  • Other minor enhancements to SolrClient
  • Fixed SolrClient.index method

Planned enhancements in version 0.3.0:

  • Solr node query routing (by @ddorian)
  • Streaming Expressions Support

Requirements

Features

  • Flexible and simple query mechanism
  • Response Object to easily extract data from Solr Response
  • Cursor Mark support
  • Indexing (raw JSON, JSON Files, gzipped JSON)
  • Specify multiple hosts/IPs for SolrCloud for redundancy
  • Basic Managed Schema field management
  • IndexManager for storing indexing documents off-line and batch indexing them

Getting Started

Installation:

pip install SolrClient

Basic usage:

>>> from SolrClient import SolrClient
>>> solr = SolrClient('http://localhost:8983/solr')
>>> res = solr.query('SolrClient_unittest',{
            'q':'product_name:Lorem',
            'facet':True,
            'facet.field':'facet_test',
    })
>>> res.get_results_count()
4
>>> res.get_facets()
{'facet_test': {'ipsum': 0, 'sit': 0, 'dolor': 2, 'amet,': 1, 'Lorem': 1}}
>>> res.get_facet_keys_as_list('facet_test')
['ipsum', 'sit', 'dolor', 'amet,', 'Lorem']
>>> res.docs
[{'product_name_exact': 'orci. Morbi ipsum
..... all the docs ....
 'consectetur Mauris dolor Lorem adipiscing'}]

See, easy.... you just need to know the Solr query syntax.

Roadmap

  • Better test coverage
  • Solr Streaming

Contributing

I've realized that that there isn't really a well maintained Solr Python library I liked so I put this together. Contributions (code, tests, documentation) are definitely welcome; if you have a question about development please open up an issue on github page. If you have a pull request, please make sure to add tests and that all of them pass before submitting. See tests README for testing resources.

Documentation: http://solrclient.readthedocs.org/en/latest/

solrclient's People

Contributors

ad510 avatar ddorian avatar epistemery avatar hartym avatar nickvasilyev avatar pliniker avatar rlskoeser avatar verdan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

solrclient's Issues

start , rows in Solr.query

how can i achieve this 'start' in query ? i've already tried:
res = solr.query('library',{ 'q':'*:*', 'start':9, 'rows':15, })
but still after i used res.get_result_count() it returned 15 instead 6.

Test performance is very(extremly) slow

I have an i3 laptop with 8GB ram & SSD, and it was running very slow (taking up all ram & probably ). Any idea how we can:

  1. make it faster
    or
  2. separate, so don't run tests for everything ?
    or both ?

Command to run single test

example:
python run_tests.py -py 3.5 -solr 6.3.0 -test test_client.ClientTestIndexing.test_down_solr_exception

To put it somewhere in docs ? Maybe at SolrVagrant?

multiple facets

Hi,

How can i query for multiple facets? It looks like SolrClient doesn't know how to do that but it should be possible

Regards,
Hans

Using IndexQ with finalize=True can "overwhelm" todo directory

Been using SolrClient in my latest project and it is by far the cleanest SOLR api for python out there, so kudos for that! Hoping it will keep being developed and supported for the foreseeable future.

One thing I see that could be improved is that indexing multi-million set of documents using IndexQ and the finalize=True argument will lead to the todo/ directory being overwhelmed (impossible to 'ls' normally nor 'rm -rf' due to size of directory listing metadata). I was thinking about creating subfolders under todo/ and put data there instead, capping it at certain number (0.5M files, for ex.).

I understand that it is bad to proceed like this when you understand how it all works, but for first time users and non-experts this could actually prevent this unfortunate case.

Please update the PyPi package to reflect the latest changes

Hi,

My application code depends on the latest commit#a75912f73730d3aaaf64ca5684ab035c2666eedc where integer conversion step is removed. But, i see the PyPi package of SolrClient doesn't have that issue fixed. Could you please update the PyPi package with the latest version.

Thanks,
Abhay

Need ~nice way to specify dynamic 'params' that works on every request/function

Currently all params (query string) must go on the "params={}" keyword argument.
But this sometimes works and sometimes doesn't. Example where it doesn't is the "query()". You can't set currently a "route='ay'" in there, unless you put it in the query, which sucks.

So, how to do that ?
My idea would be to have stuff like route,min_rf, etc in kwargs, that then can be included in the "params" right before making a request. Makes sense ?

And if there are kwargs-keys that are needed on other parts (ex: a router may need a "prefer_leader", it can pop it from the kwargs) ?

Makes sense ? Or maybe a nicer way ?

Multiple faceted fields?

Hello I am using SolrClient in my project.
I want to make faceted fields multiple by using res.get_facet_keys_as_list, I mean I wanna see more fields.
However, it is not correlated with Solr syntax.
In Solr, it is used like this:
"fl":"genres,movie_title,title_year"
Is there any possibility to make it ?
Thanks

Automatic Paging

Add a method in main SolrClient that easily pages through the data in Solr.

Support Timeout on requests to solr

image

Hello community, first thank you for creating this wonderful SolrClient application, lately I am experiencing that sometimes when I perform querys or high volume commits it stays in indeterminate standby mode, in which I think at least from the client I should be able to handle this problem. wait time. I see at the level of connection to solr it does it with requests and I think that right there you can add the timeout parameter. That is, you could send something like: data_solr = conectionSolr.query (collection, query, timeout = 80) and inside the _send function it would be something like: res = self.session.request (method, url, params = params, data = data , headers = headers, verify = False, timeout = param_timeOut). How feasible would it be to integrate this function? For my part, I still feel like a novice to make a change like this ..., thank you very much for everything, I will be waiting for your contributions

PD: In the screenshot I send the line where I think you can add the timeOut option

Change on index() ?

I need the .index() command to NOT return True/False but return the full json.

Should I create a new index_raw(?) or just edit the .index(if it's not in any release yet) ? Since this changes the return type.

Reason, is I need to know the "rf" parameter when I set "min_rf".

get_field_values_as_list for empty fields

Hello,
I have recently worked with your package, and noticed that it has a somewhat strange behaviour when encountering fields that are only present in some of the documents.
In my example, I had let's say 2500 documents with an unique "id" field, but only 2000 or so would have another field called "links".
Iterating over both the "id" list and "links" list is not possible, since they have different lengths, since get_field_values_as_list("links") would return only a list of length 2000, and I cannot discern which pages return empty.
For this, I simply altered the function to:
return [doc[field] if field in doc else [] for doc in self.docs]
which solved the problem in my case. It might not always be needed, but possibly this could be included as get_field_values_as_full_list, or something like this.

Best regards,
Dennis

_route_ to correct shard ?

Example, I was thinking of also adding:

  1. method getting cluster state
  2. keep track of hash-->shard-->host linking
  3. when route is specified, route directly to correct node(with failover to replicas)
    3.1. route directly to leader for create,update,delete
  4. prefer_leader=True will always contact leader of shard first and then replicas.
  5. prefer_leader=False will contact leader last
  6. prefer_leader=None will random.choice(replicas)
  7. when multiple route are added "tenant1,tenant2,tenant3", pick random replica to contact

Makes sense ?

The whole point is to contact minimum amount of nodes necessary, reducing network hops/traffic, lowering cpu.

"Invalid syntax" on Python 3.7.0

SolrClient 0.2.1 fails to run on Python 3.7.0:

$ python test.py 
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    from SolrClient import SolrClient
  File "/home/aorth/.local/share/virtualenvs/solr-pipenv-nV7koRrN/lib/python3.7/site-packages/SolrClient/__init__.py", line 1, in <module>
    from .solrclient import SolrClient
  File "/home/aorth/.local/share/virtualenvs/solr-pipenv-nV7koRrN/lib/python3.7/site-packages/SolrClient/solrclient.py", line 10, in <module>
    from .zk import ZK
  File "/home/aorth/.local/share/virtualenvs/solr-pipenv-nV7koRrN/lib/python3.7/site-packages/SolrClient/zk.py", line 9, in <module>
    from kazoo.client import KazooClient
  File "/home/aorth/.local/share/virtualenvs/solr-pipenv-nV7koRrN/lib/python3.7/site-packages/kazoo/client.py", line 62, in <module>
    from kazoo.recipe.partitioner import SetPartitioner
  File "/home/aorth/.local/share/virtualenvs/solr-pipenv-nV7koRrN/lib/python3.7/site-packages/kazoo/recipe/partitioner.py", line 193
    self._child_watching(self._allocate_transition, async=True)
                                                        ^
SyntaxError: invalid syntax

Virtual environment was created with pipenv.

facets and facet ranges don't preserve order

In the SolrResponse class, get_facets and get_facets_ranges return ordinary dict objects, so ordered results as returned by Solr get lost. Should just need to be returned as an OrderedDict.

This looks easy enough to fix, I'll see if I can create a pull request for it.

Facet query with "facet.sort": "count" not returning sorted facets

Hello,
I'm trying to submit this query to Solr Core with SolrClient, but the expected result object doesn't match the query since keys are not sorted by facet count.
Here the query object passed to SolrClient:

onto_suggest = solr.query('merged', {
    "q": "*:*",
    "rows": 0,
    "facet": True,
    "facet.field": "onto_suggest",
    "facet.sort": "count",
    "facet.limit": -1,
    "facet.mincount": 2
})

I've also tried to use "facet.count" to sort, same result.
Here an excerpt of the result

{
  "onto_suggest": {
    "Arthrobacter siderocapsulatus": 433,
    "Enterobacter aglomerans": 39,
    "Genetic endocrine tumor": 32,
    "ETOP": 15,
    "lymphoid lineage restricted progenitor cell": 47,
    "Escherichia coli O157:H7 EDL933": 127,
    "deferoxaminum": 2,
    "Bacillus francki": 472,
    "srf": 9,
    "Malignant Peripheral Nerve Sheath Tumors": 9,
    "Pancreas Cancer": 122,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.