shyam334 / hive-json-serde Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 3.17 MB

Automatically exported from code.google.com/p/hive-json-serde

Java 86.71% XSLT 13.29%

hive-json-serde's People

Watchers

hive-json-serde's Issues

Can't partition when using serdeproperties list

I'm experiencing some incompatibility with Json SerDe and Partitioning, here's 
an example query :

CREATE TABLE clicks (
condition_set STRING,
creative STRING,
date_created STRING,
from_app STRING,
from_campaign STRING,
meta_country STRING,
meta_model STRING,
meta_os STRING,
to_app STRING,
to_campaign STRING,
uuid STRING,
`time` STRING,
`hour` STRING
)
PARTITIONED BY (`date` STRING)
ROW FORMAT 
SERDE 'com.amazon.elasticmapreduce.JsonSerde'
WITH SERDEPROPERTIES ('paths'='
condition_set, 
creative, 
date_created, 
from_app, 
from_campaign, 
meta_country, 
meta_model, 
meta_os, 
to_app, 
to_campaign, 
uuid, 
time, 
date,
hour')
LOCATION '/mnt/hdfsmall/'
;


Error is : Error in metadata: org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Expected a 
one-one correspondance between paths 14 and columns 13)

If I add the column in the table column's definition, the partitioning will 
give me an error too.

I think I tried every possibility in Hive to go around the problem. I have no 
clue anymore on how to solve this So I think it's just not possible.

Original issue reported on code.google.com by [email protected] on 1 Dec 2011 at 7:01

SerDe does not work for any MapReduce job

What steps will reproduce the problem?
1. create any table using SerDe (for my it was the one from "Getting Started" 
section)
2. execute in hive: select count(1) from table;

What is the expected output? What do you see instead?
I expect to see number of records. Instead I got error:

hive> select count(1) from jsontest;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201205071112_0089, Tracking URL = [...]
Kill Command = /usr/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=[...] -kill 
job_201205071112_0089
2012-06-06 09:28:49,506 Stage-1 map = 0%,  reduce = 0%
2012-06-06 09:29:26,655 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201205071112_0089 with errors
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask


What version of the product are you using? On what operating system?
Hadoop: Hadoop 0.20.2-cdh3u3
Hive: 0.7.0-cdh3u0
OS: Ubuntu Server

Please provide any additional information below.
Full stack trace from jobtracker can be found in attached file.
"select * from jsontest" works fine.

Original issue reported on code.google.com by [email protected] on 6 Jun 2012 at 7:39

Attachments:

stacktrace.txt

Version hive-json-serde.jar

What steps will reproduce the problem?
1. Checkout the project
2. Build with "ant build"
3. Observe the created artifact is build/hive-json-serde.jar

What is the expected output? What do you see instead?
The expected output would have a version number, so it can be referenced with 
ivy.

What version of the product are you using? On what operating system?
Unknown version :)

Please provide any additional information below.
One of my coworkers made a few changes to this project at:

    https://github.com/johanoskarsson/hive-json-serde

I pinged him about adding versioning, and got pointed at this project since 
we'd like to get all changes merged back here. We have this working 
experimentally but would like to use in production, and need to push the jar 
into ivy so we can build, which is when versioning came up. I see there are 
some versioned jar's published in the downloads section here, but we'd like to 
build from source. Any objection to building versioned jar's?

Original issue reported on code.google.com by [email protected] on 6 Jan 2011 at 7:05

Add build to svn ignore

This is mostly for my convenience. Would it be possible to add the directory 
"build" to svn:ignore? Would make it easier to work with the project and svn.

Original issue reported on code.google.com by johan%[email protected] on 10 Jan 2011 at 11:45

Convert numbers to strings if need be

We have a data stream with columns which are usually integers, but sometimes 
strings. To handle this, we use a string column in Hive and convert integers to 
their string representation.

Right now the SerDe requires all objects in a column to be the same type. If 
you feed an integer into a column that is expecting a string you get:

Failed with exception java.io.IOException:java.lang.ClassCastException: 
java.lang.Integer cannot be cast to java.lang.String

Patch attached. If the column is a string, and the JSON data is a Number, 
convert it automatically instead of failing.

Original issue reported on code.google.com by [email protected] on 14 Jan 2011 at 11:31

Attachments:

convert-numbers-to-strings.patch

Upper case column names

What steps will reproduce the problem?
1.  Try to create a table with a JSON file that has an upper case field.  For 
example, {"TEST":1,"case":2}, create external table test1 (TEST int, case int).
2.  Doing select * test1 will give you NULL for TEST and 2 for case


What is the expected output? What do you see instead?

I would expect it to return 1 for TEST rather than NULL

Original issue reported on code.google.com by [email protected] on 8 Oct 2010 at 10:39

Logging in deserialize uses lots of resources

We have a common case where lots of json entries will not contain all of the 
possible "columns". This triggers the LOG.warn and spams the logs, causing a 
noticeable slowdown of the job. Since the missing columns are expected could 
the log level be reduced to debug?

Original issue reported on code.google.com by johan%[email protected] on 10 Jan 2011 at 11:43

Attachments:

hiveserde-deserializedebuglog.patch

Upgrade to work with Hive 0.10.0 [PATCH]

What steps will reproduce the problem?
1. Use this serde with Hive 0.10.0

What is the expected output? What do you see instead?
I'd like it to work, but naturally it doesn't since Hive APIs have changed 
slightly for 0.10.0.


What version of the product are you using? On what operating system?
Hive 0.10.0

I've attached a simple patch that allows me to use this serde in 0.10.0.

Sorry, I'm a git person so the patch came from git. But the diff is simple 
enough to be applied manually if necessary. The API differences are largely 
cosmetic and the new method added is a no-op since implementation of that 
method is optional (and ignored even for some of the native Hive serdes 
themselves).

Note, this patch omits the upgrading of the lib/*.jar files. I upgraded the 
following jar files (note I'm a Cloudera user, so I'm using CDH 4.2.0 jars):

* lib/hadoop-0.20.1-core.jar -> lib/hadoop-common-2.0.0-cdh4.2.0.jar
* lib/hive_serde.jar -> lib/hive-serde-0.10.0-cdh4.2.0.jar

Original issue reported on code.google.com by [email protected] on 3 May 2013 at 9:05

Attachments:

hive-0.10.0.patch

Implement serialization

For now, this SerDe only supports reading data (deserialization). In order
to write data in JSON format, the serialization process needs to be
implemented.

Original issue reported on code.google.com by [email protected] on 16 Feb 2010 at 9:57

Allow JSON 'null' value in input. Convert to Hive NULL.

What steps will reproduce the problem?

Put a null value in JSON input data 


What is the expected output? What do you see instead?

Expect to see Hive NULL in output.
Instead you get:
Failed with exception java.io.IOException:java.lang.ClassCastException: 
org.json.JSONObject$Null cannot be cast to java.lang.String

Please provide any additional information below.

Patch attached.

Original issue reported on code.google.com by [email protected] on 14 Jan 2011 at 11:16

Attachments:

handle-null.patch

Make SerDe testable

There is place holder code for JUnit tests, but none of the tests are
implemented. 

Get this thing tested!

Original issue reported on code.google.com by [email protected] on 17 Feb 2010 at 12:19

insert overwrite table fails with json serde

Steps
1. Created an external table that points to the gzip log files
2. Select query with limit 10 or 100 returned results.
3. Now created a secondary external table that's pointing to a different 
location.
4. Used Insert Overwrite clause to pull out records from a certain day/month 
into a partition. 
5. the select statement succeeds but file creation fails

What is the expected output? What do you see instead?
a flat table in text file format. The job fails with following error .
=====================================


ERROR="java\.lang\.RuntimeException: 
org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while
   processing row
   {\"atype\":\"type1\",\"operation\":\"orize\",\"status\":\"Allow\",\"tme\":156\.25900000000001,\"starttime\":\"/Date(1314981024895)/\",\"remoteip\":\"x\.y\.
   z\.t,
   x1\.y1\.z1\.t1\",\"requesturi\":\"uri\",\"userid\":\"x\",\"eidmid\":\"y\",\"userlanguage\":\"en\",\"
   usercountry\":\"US\",\"mode\":\"normal_mode\",\"servicekey\":\"key1\",\"consumerkey\":\"ke2\",\"line\":null,\"number\":null,\"d
   t\":\"2011\.09\.02\"} at org\.apache\.hadoop\.hive\.ql\.exec\.ExecMapper\.map(ExecMapper\.java:161) at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50) at
   org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:363) at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:312) at
   org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170) Caused by: org\.apache\.hadoop\.hive\.ql\.metadata\.HiveException: Hive Runtime Error while processing row
 {\"atype\":\"type1\",\"operation\":\"orize\",\"status\":\"Allow\",\"tme\":156\.25900000000001,\"starttime\":\"/Date(1314981024895)/\",\"remoteip\":\"x\.y\.
   z\.t,
   x1\.y1\.z1\.t1\",\"requesturi\":\"uri\",\"userid\":\"x\",\"eidmid\":\"y\",\"userlanguage\":\"en\",\"
   usercountry\":\"US\",\"mode\":\"normal_mode\",\"servicekey\":\"key1\",\"consumerkey\":\"ke2\",\"line\":null,\"number\":null,\"d
   t\":\"2011\.09\.02\"} at org\.apache\.hadoop\.hive\.ql\.exec\.MapOperator\.process(MapOperator\.java:483) at
   org\.apache\.hadoop\.hive\.ql\.exec\.ExecMapper\.map(ExecMapper\.java:143) \.\.\. 4 more Caused by: java\.lang\.NullPointerException at
   org\.apache\.hadoop\.hive\.ql\.io\.HiveIgnoreKeyTextOutputFormat$1\.write(HiveIgnoreKeyTextOutputFormat\.java:97) at
   org\.apache\.hadoop\.hive\.ql\.exec\.FileSinkOperator\.processOp(FileSinkOperator\.java:606) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at
   org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at org\.apache\.hadoop\.hive\.ql\.exec\.SelectOperator\.processOp(SelectOperator\.java:84) at
   org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at
   org\.apache\.hadoop\.hive\.ql\.exec\.FilterOperator\.processOp(FilterOperator\.java:87) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at
   org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at org\.apache\.hadoop\.hive\.ql\.exec\.FilterOperator\.processOp(FilterOperator\.java:87) at
   org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at
   org\.apache\.hadoop\.hive\.ql\.exec\.TableScanOperator\.processOp(TableScanOperator\.java:77) at org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.process(Operator\.java:470) at
   org\.apache\.hadoop\.hive\.ql\.exec\.Operator\.forward(Operator\.java:743) at org\.apache\.hadoop\.hive\.ql\.exec\.MapOperator\.process(MapOperator\.java:466) \.\.\. 5 more " .

=====================================

What version of the product are you using? On what operating system?
Mac OSX, Amazon EMR/Hive, --hadoop-version 0.20   --hive-interactive  
--hive-versions 0.7, hive-json-serde-0.2.jar

Please provide any additional information below.
I am also having additional issues with null values in columns. But probably 
open a new issue

Original issue reported on code.google.com by [email protected] on 4 Oct 2011 at 1:23

Add a SerDe property to use a different name for the Hive column name and the JSON key name.

Add a SerDe property to use a different name for the Hive column name and the 
JSON key name.

This helps in case you have a data stream with columns named the same as Hive 
reserved words (eg 'timestamp' and 'bucket').

Patch attached. Use like so:

CREATE TABLE foo (
 ts double,
 bckt string,
 event string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES ('rename_columns'='timestamp>ts,bucket>bckt');

Original issue reported on code.google.com by [email protected] on 14 Jan 2011 at 11:48

Attachments:

rename-columns.patch

ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

One of my fields are of type bigint in Hive. When running  a query over that 
table I get the following exception.

I assumed that the JSON library reads the field as an Integer and when Hive 
expects a Long things blow up. Tried changing the field to int in Hive, but 
then I get the reversed class cast exception. Also tried making it a string in 
Hive, but then it also fails with a class cast exception.


java.io.IOException: java.lang.ClassCastException: java.lang.Integer cannot be 
cast to java.lang.Long
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:684)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:146)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaLongObjectInspector.get(JavaLongObjectInspector.java:39)
    at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:190)
    at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:480)
    at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:426)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:129)
    ... 9 more

Original issue reported on code.google.com by johan%[email protected] on 10 Sep 2010 at 1:09

Handle complex JSON objects

Deserializing basic JSON objects with simple key/values is fine, but it
cannot handle nested objects or arrays. Hive can support complex objects so
this SerDe should too.

Original issue reported on code.google.com by [email protected] on 16 Feb 2010 at 9:58

shyam334 / hive-json-serde Goto Github PK

hive-json-serde's People

Watchers

hive-json-serde's Issues

Recommend Projects

Recommend Topics

Recommend Org