Git Product home page Git Product logo

asakusafw's Introduction

Asakusa Framework

Asakusa is a full stack framework for distributed/parallel computing, which provides with a development platform and runtime libraries supporting various distributed/parallel computing environments such as Hadoop, Spark, M3 for Batch Processing, and so on. Users can enjoy the best performance on distributed/parallel computing transparently changing execution engines among MapReduce, SparkRDD, and C++ native based on their data size.

Other than query-based languages, Asakusa helps to develop more complicated data flow programs more easily, efficiently, and comprehensively due to following components.

  • Data-flow oriented DSL

    Data-flow based approach is suitable for DAG constructions which is appropriate for distributed/parallel computing. Asakusa offers Domain Specific Language based on Java with data-flow design, which is integrated with compilers.

  • Compilers

    A multi-tier compiler is supported. Java based source code is once compiled to inter-mediated representation and then optimized for each execution environments such that Hadoop(MapReduce), Spark(RDD), M3 for Batch Processing(C++ Native), respectively.

  • Data-Modeling language

    Data-Model language is supported, which is comprehensive for mapping with relational models, CSVs, or other data formats.

  • Test Environment

    JUnit based unit testing and end-to-end testing are supported, which are portable among each execution environments. Source code, test code, and test data are fully compatible across Hadoop, Spark, M3 for Batch Processing and others.

  • Runtime execution driver

    A transparent job execution driver is supported.

All these features have been well designed and developed with the expertise from experiences on enterprise-scale system developments over decades and promised to contribute to large scale systems on distributed/parallel environments to be more robust and stable.

How to build

Maven artifacts

./mvnw clean install -DskipTests

Gradle plug-ins

cd gradle
./gradlew clean [build] install

How to run tests

Maven artifacts

export HADOOP_CMD=/path/to/bin/hadoop
./mvnw test

Gradle plug-ins

cd gradle
./gradlew [clean] check

How to import projects into Eclipse

Maven artifacts

./mvnw eclipse:eclipse

And then import existing projects from Eclipse.

If you run tests in Eclipse, please activate Preferences > Java > Debug > 'Only include exported classpath entries when launching'.

Gradle plug-ins

cd gradle
./gradlew eclipse

And then import existing projects from Eclipse.

Sub Projects

Related Projects

Resources

Bug reports, Patch contribution

License

asakusafw's People

Contributors

akirakw avatar ashigeru avatar bohnen avatar cocoatomo avatar hishidama avatar kuenishi avatar okachimachiorz avatar shino avatar ueshin avatar yshirai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asakusafw's Issues

Build failed when mvn clean install

How to reproduce the issue

Build failed when mvn clean install.

What is expected result and actual result

 [java] usage: java -classpath ... com.asakusafw.compiler.bootstrap.AllBatchCompilerDriver [-compilerwork</path/to/temporary>] -hadoopwork <batch/working> [-link <classlib.jar;/path/to/classes>] -output</path/to/output> -package <pkg.name> [-plugin <plugin-1.jar;plugin-2.jar>] -scanpath </path/to/classlib> [-skiperror]
 [java]  -compilerwork </path/to/temporary>      コンパイラのワーキングディレクトリ
 [java]  -hadoopwork <batch/working>             Hadoop上でのワーキングディレクトリ (ホームディレクトリからの相対パス)
 [java]  -link <classlib.jar;/path/to/classes>   リンクするクラスライブラリの一覧
 [java]  -output </path/to/output>               コンパイル結果を出力する先のディレクトリ
 [java]  -package <pkg.name>                     コンパイル結果のベースパッケージ
 [java]  -plugin <plugin-1.jar;plugin-2.jar>     利用するコンパイラプラグインの一覧
 [java]  -scanpath </path/to/classlib>           コンパイル対象のバッチを含むクラスライブラリ
 [java]  -skiperror                              コンパイルエラーが発生しても続けて次のバッチをコンパイルする
 [java] java.lang.IllegalArgumentException: outputDirectory must not be null
 [java]     at com.asakusafw.compiler.common.Precondition.checkMustNotBeNull(Precondition.java:33)
 [java]     at com.asakusafw.compiler.flow.epilogue.parallel.ParallelSortClientEmitter.emit(ParallelSortClientEmitter.java:95)
 [java]     at com.asakusafw.compiler.flow.external.FileIoProcessor.emitEpilogue(FileIoProcessor.java:93)
 [java]     at com.asakusafw.compiler.flow.jobflow.JobflowCompiler.emit(JobflowCompiler.java:142)
 [java]     at com.asakusafw.compiler.flow.jobflow.JobflowCompiler.compile(JobflowCompiler.java:95)
 [java]     at com.asakusafw.compiler.flow.FlowCompiler.compileJobflow(FlowCompiler.java:203)
 [java]     at com.asakusafw.compiler.flow.FlowCompiler.compile(FlowCompiler.java:78)
 [java]     at com.asakusafw.compiler.batch.processor.JobFlowWorkDescriptionProcessor.build(JobFlowWorkDescriptionProcessor.java:67)
 [java]     at com.asakusafw.compiler.batch.processor.JobFlowWorkDescriptionProcessor.process(JobFlowWorkDescriptionProcessor.java:55)
 [java]     at com.asakusafw.compiler.batch.processor.JobFlowWorkDescriptionProcessor.process(JobFlowWorkDescriptionProcessor.java:39)
 [java]     at com.asakusafw.compiler.batch.BatchCompiler.processUnit(BatchCompiler.java:104)
 [java]     at com.asakusafw.compiler.batch.BatchCompiler.processUnits(BatchCompiler.java:93)
 [java]     at com.asakusafw.compiler.batch.BatchCompiler.compile(BatchCompiler.java:59)
 [java]     at com.asakusafw.compiler.testing.DirectBatchCompiler.compile(DirectBatchCompiler.java:105)
 [java]     at com.asakusafw.compiler.bootstrap.BatchCompilerDriver.compile(BatchCompilerDriver.java:214)
 [java]     at com.asakusafw.compiler.bootstrap.AllBatchCompilerDriver.start(AllBatchCompilerDriver.java:179)
 [java]     at com.asakusafw.compiler.bootstrap.AllBatchCompilerDriver.main(AllBatchCompilerDriver.java:110)

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.677s
[INFO] Finished at: Thu Apr 28 14:03:36 JST 2011
[INFO] Final Memory: 9M/106M
[INFO] ------------------------------------------------------------------------

How to fix the issue

It seems asakusa.batchc.dir is null, but the property is configures.

cleanHDFS.sh/cleanLocalFS.sh does not work.

cleanHDFS.sh/cleanLocalFS.sh does not work and the following message is displayed.

./cleanHDFS.sh.ORG: line 30: /home/asakusa/asakusa/cleaner/bin/conf/.clean_hdfs_profile: そのようなファイルやディレクトリはありません
./cleanHDFS.sh.ORG: line 33: /bin/java: そのようなファイルやディレクトリはありません

'mvn test' fails if X window system is not available

How to reproduce the issue

On the environment where X is not available (i.e. via SSH),

  1. Create an Asakusa batch project: 'mvn archetype:generate ...'
  2. In the project direcory, launch the 'test' goal: 'mvn test'

What is expected result and actual result

The build failed. Instead, I got the following erorr.

[java] Exception in thread "main" java.lang.InternalError: Can't connect to X11 window server using ':0.0' as the value of the DISPLAY variable.
[java]     at sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
[java]     at sun.awt.X11GraphicsEnvironment.access$100(X11GraphicsEnvironment.java:52)
[java]     at sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:155)
[java]     at java.security.AccessController.doPrivileged(Native Method)
[java]     at sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:131)
[java]     at java.lang.Class.forName0(Native Method)
[java]     at java.lang.Class.forName(Class.java:169)
[java]     at java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:68)
[java]     at sun.font.FontManager.initSGEnv(FontManager.java:1307)
[java]     at sun.font.FontManager.findFont2D(FontManager.java:1984)
[java]     at java.awt.Font.getFont2D(Font.java:455)
[java]     at java.awt.Font.canDisplay(Font.java:1904)
[java]     at java.awt.Font.canDisplayUpTo(Font.java:1970)
[java]     at java.awt.font.TextLayout.singleFont(TextLayout.java:451)
[java]     at java.awt.font.TextLayout.<init>(TextLayout.java:509)
[java]     at org.apache.poi.hssf.usermodel.HSSFSheet.autoSizeColumn(HSSFSheet.java:1701)
[java]     at org.apache.poi.hssf.usermodel.HSSFSheet.autoSizeColumn(HSSFSheet.java:1662)
[java]     at com.asakusafw.testtools.templategen.ExcelBookBuilder.createInputDataSheet(ExcelBookBuilder.java:441)
[java]     at com.asakusafw.testtools.templategen.ExcelBookBuilder.build(ExcelBookBuilder.java:140)
[java]     at com.asakusafw.testtools.templategen.Main.run(Main.java:89)
[java]     at com.asakusafw.testtools.templategen.Main.main(Main.java:47)
[java]     at com.asakusafw.generator.ModelSheetGenerator.main(ModelSheetGenerator.java:54)
[java] Java Result: 1
[sql] Executing resource: /home/asakusa/projects/quickhack/target/sql/bulkloader_generated_table.sql
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] An Ant BuildException has occured: The following error occurred while executing this line:
/home/asakusa/projects/quickhack/src/main/scripts/asakusa-build.xml:91: java.io.FileNotFoundException: /home/asakusa/projects/quickhack/target/sql/bulkloader_generated_table.sql (No such file or directory)

How to fix the issue

The build succeeds after adding the following line to the line 72 of src/main/scripts/asakusa-build.xml on the project.

...
<java classname="com.asakusafw.generator.ModelSheetGenerator"
    classpath="${compile_classpath}"
    fork="true">
    <!-- START OF THE ADDED CODE -->
    <jvmarg value="-Djava.awt.headless=true" />
    <!-- END OF THE ADDED CODE -->
    <jvmarg value="-Dlogback.configurationFile=src/test/resources/logback-test.xml" />
    <jvmarg value="-DASAKUSA_MODELGEN_PACKAGE=${asakusa.modelgen.package}" />
...

TestUtilsTest.testNormal failed in rare cases

TestUtilsTest.testNormal rarely failed with following stacktrack.


エラーメッセージ

検査NGの数 expected:<0> but was:<1>
スタックトレース

junit.framework.AssertionFailedError: 検査NGの数 expected:<0> but was:<1>
    at junit.framework.Assert.fail(Assert.java:47)
    at junit.framework.Assert.failNotEquals(Assert.java:283)
    at junit.framework.Assert.assertEquals(Assert.java:64)
    at junit.framework.Assert.assertEquals(Assert.java:195)
    at com.asakusafw.testtools.TestUtilsTest.testNormal(TestUtilsTest.java:52)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:119)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:101)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
    at $Proxy0.invoke(Unknown Source)
    at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
    at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
標準出力

@SLTests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
@SLRunning com.asakusafw.testtools.TestUtilsTest
期待データに対応する実際のデータのレコードが存在しません。table = ALL_TYPES_W_NOERR, Key = (C_TAG = STRING_DATETIME)

ThunderGate does not work on Ubuntu for using source command in shell scripts

ThunderGate specified 「/bin/sh」in importer.sh and using source command in that script but B-shell does not have source command. On Ubuntu, /bin/sh is linked to dash that also does not have source command thus ThnderGate does not work on Ubuntu.

The workaround for this issue is to run "sudo dpkg-reconfigure dash" to change symbolic link /bin/sh pointing to bash.

Inefficient process of getting FileSystem in HDFSCleaner

HDFSCleaner#cleanDir() get the FileSystem per traversed directories by FileSystem#get() but FileSystem object may be obtained only once.

It shoud be fixed because that process consume OS resource (such as file descripter) unnecessarily.

Failed to create join tables from distributed cache


11/05/03 12:03:49 INFO join.JoinResource: itemから結合表を構築します
11/05/03 12:03:49 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: 分散キャッシュ"item"からハッシュ表を作成できませんでした
    at com.asakusafw.runtime.flow.join.JoinResource.setup(JoinResource.java:69)
    at test.batch.tutorial.order.stage0001.StageMapper1.setup(StageMapper1.java:22)
    at test.batch.tutorial.order.stage0001.StageMapper1.run(StageMapper1.java:36)
    at com.asakusafw.runtime.stage.input.StageInputMapper.run(StageInputMapper.java:51)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.io.FileNotFoundException: File /tmp/hadoop-root/mapred/local/archive/6898381811534945340_-693337239_869253271/filetarget/testdriver/hadoopwork/TutorialBatchTest_testExample_20110503120344/tutorial/order/prologue/bulkloader/item does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
    at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:776)
    at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1424)
    at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1419)
    at com.asakusafw.runtime.flow.join.JoinResource.createTable(JoinResource.java:88)
    at com.asakusafw.runtime.flow.join.JoinResource.setup(JoinResource.java:67)
    ... 6 more
11/05/03 12:03:49 INFO filecache.TrackerDistributedCacheManager: Deleted path /tmp/hadoop-root/mapred/local/archive/-8764149790598475393_1251655051_869252271/file/root/asakusa/batchapps/tutorial/lib/jobflow-order.jar
11/05/03 12:03:49 INFO filecache.TrackerDistributedCacheManager: Deleted path /tmp/hadoop-root/mapred/local/archive/-2958845997361501945_-1092765553_868593271/file/root/asakusa/core/lib/asakusa-runtime.jar
11/05/03 12:03:49 INFO filecache.TrackerDistributedCacheManager: Deleted path /tmp/hadoop-root/mapred/local/archive/6898381811534945340_-693337239_869253271/filetarget/testdriver/hadoopwork/TutorialBatchTest_testExample_20110503120344/tutorial/order/prologue/bulkloader/item

New data model generator

Currently, Asakusa Framework only supports data model generation from MySQL INFORMATION_SCHEMA. This requires a MySQL server and its table/view DDLs even if you don't want to use any RDB.
We will replace it with Data Model Definition Language (DMDL), which can be written in textual DDL and generate Java data model classes as ever.

Generic operators support

Generic operators can accept polymorphic type of models as their input/output.
That is, we will introduce sub-typing on data models, and then ensure operators that handle various types of data models with common super-types (called "projective models").

bash dependency problems for some shell scripts

There are bash dependencies in following shell scripts.

  • experimental.sh: use pushd command
  • some launch scripts of ThunderGate: use .(dot) commad with parameter. these parmeters are ignored when running on B-shell.

These scripts are used shebang "/bin/sh", so These scripts fail when running on environment which does not use bash to link /bin/sh.

empty cells are treaded as an invaid value in the Test Data Definition Sheet

In Test data definition sheet, empty cells are treaded as an invaid value. Seen in many cases confused by tester.

As a workaround, a border line puts records that include cells and this workaround guides "Application Development Guide (Batch Implementation Guide) ". but this spec should be reviewed.

For example, if there is any one of the non-empty cells in a row, empty cells are treated as null values.

In addition, if this specification would be improved, The OpenOfiice.org Calc might be able to handle any test data definition sheet.

The cache file table on ThunderGate is unnecessary

when setup ThunderGate, the cache file table is created but not required currently. Moreover, the feature of cache using this table is pending and future too, this table should be removed from the DDL.

Cleaner does not check errors to get FileSystem

Cleaner calls FileSystem.get() at some process and when this method returns null when cannot get FileSystem but Cleaner does not null check. Then there is a possibility of unintended NullPointrException has occurred.

It should check null return value from FileSystem.get() and do proper excepion process.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.