Comments (2)
Are your components a driver and mapper using the TikaProcessor? We could definitely add them to the tika package in the same way as we have a driver and mapper for the other components.
Please clone and send a pull request. Thanks!
from behemoth.
Yes they do use TikaProcessor and should make it easy for people to extend it too, since not everyone will just want text.
I've forked and will push my changes to the fork soon and then submit a pull request
from behemoth.
Related Issues (20)
- Ingest times with CorpusGenerator HOT 5
- Exception when calling DistributedCache.purgeCache(job) in GATEDriver.java HOT 3
- Unnecessary jars being included in .job files HOT 4
- Classloader problems with job files that include behemoth.core.jar HOT 3
- ClassNotFoundException org.apache.mahout.math.Vector HOT 5
- Language Identification HOT 7
- Output to LucidWorks 2.1 HOT 3
- Warn when input is not available for CorpusGenerator HOT 1
- UIMAMapper to use UIMAProcessor HOT 1
- CorpusReader generic parameter for annotations
- Add negative filter for mimetype
- Unable to Index Tika file to Solr using behemoth HOT 9
- Tests cant be run by more than one person HOT 1
- CTakes modules for Behemoth HOT 2
- Elasticsearch module HOT 5
- Use warc-hadoop library
- Upgrade to Mahout 0.9 HOT 4
- Upgrade to Mahout 0.10.0 HOT 3
- CorpusGenerator never invokes document.setText HOT 2
- WARC converter to allow custom metadata
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from behemoth.