Comments (6)
myInput.lzo.index
is not an input file itself. An lzo index file itself exists to help the lzo input to handle the actual lzo file more efficiently.
from hadoop-lzo.
Sorry, thanks for the clarification. A little confusing with the documentation saying "Running MR Jobs over Indexed Files: Now run any job, say wordcount, over the new file".
from hadoop-lzo.
So how does one pass the index
file? I run the above, get the .index
file and then pass the original file through existing code and get a single split.
from hadoop-lzo.
You don't directly pass the index file. The path of the index file is implicitly assumed to be (lzo file name).index (see https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/compression/lzo/LzoIndex.java#L41). The lzo file should be large enough to cause multiple splits.
from hadoop-lzo.
If I decompress the file and run it I get multiple splits, keeping it compressed I always get 1 split. This is happening across hundreds of files. Does this mean I'm not creating the index file properly?
from hadoop-lzo.
As long as you use the right input format (e.g. LzoTextInputFormat) and the file is big enough, you should see multiple splits. You might want to play with LzoIndex to see if you can find out more of your file.
from hadoop-lzo.
Related Issues (20)
- create a public email group?
- sc.textFile doesn't seem to use LzoTextInputFormat when hadoop-lzo is installed HOT 2
- where is /build.properties generated HOT 3
- Hadoop LZO does not take non-default queue HOT 1
- mvn clean test doesn't build jar HOT 4
- lzo with gradle
- New maven version with AArch64 binary HOT 5
- JNI issue in LzoDecompressor_decompressBytesDirect
- Build Failure on Ubuntu HOT 7
- maven.twttr.com has been down for over a day
- Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16 in Twitter public Maven repo (http://maven.twttr.com) HOT 11
- pom.xml may have an incorrect license
- Compression Level is ignored. HOT 2
- support fileglobs when index files
- Full build instructions for windows 10 HOT 1
- maven.twttr.com outage - 503 errors - breaks builds of downstream projects HOT 23
- changes to continuous integration
- How to decompress LZO file using hadoop-lzo
- LZO codec not working for graviton instances
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hadoop-lzo.