Comments (5)
Hi. You know about [https://github.com/DigitalPebble/behemoth-elasticsearch]? It is probably in need of an update but should be a good starting point.
from behemoth.
Nope I didn't even see this Julien. I've nearly finished this patch as well :(
You want me to send a PR to add the elasticsearch module?
Any reason you want to (or don't want to) have the ES module as part of the main codebase?
from behemoth.
I've added a link to [https://github.com/DigitalPebble/behemoth/wiki/Behemoth-Modules].
You want me to send a PR to add the elasticsearch module?
yep, would be the right place for it and the easiest way to compare your version with the existing one.Any reason you want to (or don't want to) have the ES module as part of the main codebase?
I decided not to have an ever-expanding list of modules in Behemoth per-se and be as decoupled and modular as possible. This also serves as an example of how to build a new resource for Behemoth with the Maven pom etc...
BTW would be interesting to hear about your project and how Behemoth fits in. There's a page for use case which you're welcome to contribute a short blurb to if you feel like it [https://github.com/DigitalPebble/behemoth/wiki/Users].
from behemoth.
I've added a link to [https://github.com/DigitalPebble/behemoth/wiki/Behemoth-Modules].
Great.
I decided not to have an ever-expanding list of modules in Behemoth per-se and be as decoupled and modular as possible. This also serves as an example of how to build a new resource for Behemoth with the Maven pom etc...
I just went through around 4 hours of debugging network issues and making upgrades to various dependencies in order to get that ElasticSearch component to work with the master Behemoth codebase. I am however able to persist data into most recent release of ES now and want to push this into the codebase so I will send you a PR.
The issue I see here is that it is clear the Behemoth-elastic module is not being maintained as much (and not being released and/or synced with master) and therefore it is difficult to pick it up and hit the group running.
It is up to you, however I would make an argument to you, that as one of many users of Behemoth, it would be great to see the ES module make it in to the core codebase.
I very much take the point how it can serve as an example module though.
BTW would be interesting to hear about your project and how Behemoth fits in. There's a page for use case which you're welcome to contribute a short blurb to if you feel like it [https://github.com/DigitalPebble/behemoth/wiki/Users].
Yes I'll send you something right now. Please reply here with any thoughts on the above. Thanks Julien.
from behemoth.
I am however able to persist data into most recent release of ES now and want to push this into the codebase so I will send you a PR.
great
The issue I see here is that it is clear the Behemoth-elastic module is not being maintained as much (and not being released and/or synced with master)
it was kept separate also because it was less mature than the other components. It should be in sync with core - otherwise it would not compile at all. I take your point about having it released alongside the other modules though.
It is up to you, however I would make an argument to you, that as one of many users of Behemoth, it would be great to see the ES module make it in to the core codebase.
the many users of Behemoth have been very quiet in the last couple of years ;-)
ES and SOLR are the main tools for search; I also use ES a lot on my various projects so yes, it would make sense to have it in the main repo alongside the other components.
I'll look at your PR before moving the code. BTW do you leverage [https://github.com/elastic/elasticsearch-hadoop] at all?
from behemoth.
Related Issues (20)
- Ingest times with CorpusGenerator HOT 5
- Exception when calling DistributedCache.purgeCache(job) in GATEDriver.java HOT 3
- Unnecessary jars being included in .job files HOT 4
- Classloader problems with job files that include behemoth.core.jar HOT 3
- ClassNotFoundException org.apache.mahout.math.Vector HOT 5
- Language Identification HOT 7
- Output to LucidWorks 2.1 HOT 3
- Warn when input is not available for CorpusGenerator HOT 1
- UIMAMapper to use UIMAProcessor HOT 1
- CorpusReader generic parameter for annotations
- Add negative filter for mimetype
- Unable to Index Tika file to Solr using behemoth HOT 9
- Tests cant be run by more than one person HOT 1
- CTakes modules for Behemoth HOT 2
- Use warc-hadoop library
- Upgrade to Mahout 0.9 HOT 4
- Upgrade to Mahout 0.10.0 HOT 3
- CorpusGenerator never invokes document.setText HOT 2
- WARC converter to allow custom metadata
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from behemoth.