thaliproject / ci Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 3.0 4.54 MB

CI project for testing mobile devices

License: MIT License

JavaScript 86.60% Shell 13.40%

ci's People

Contributors

Stargazers

Watchers

Forkers

czyzm jareksl paeony

ci's Issues

Sometimes long delay with results and not getting all logs

Example result at https://github.com/ThaliTester/TestResults/tree/51193821e3da755_Performance_test_in_CI__vjrantal/ where you see that the delta between success build and getting results is > 5 hours. Also, the CI reported there should be 16 Android devices, but there aren't that many logs posted (only 5/16 available).

Please enable lollipop devices in CI with the right hardware

Apparently all of our Lollipop devices don't have the right BLE hardware to run the coordinator so we need Lollipop devices that do have the right hardware.

Log is not posted if CI build result is a timeout

Example at https://github.com/ThaliTester/TestResults/blob/5809977105d9cee_Run_coordinated_desktop_tests_in_CI_vjrantal/build.md and can be reproduced by having a build that takes over the timeout (currently 30 minutes).

Clearly, builds should not take more than 30 minutes, but if something is broken with the script, this may happen and if so, having logs would help debugging the root cause (in the example posted above, the root cause is known).

Enhance timeout handling for running tests

There are several items to be done within android.js test runner:

Extend timeout for checking if device is ready (after reboot) since for some devices 45s is not enough.
When timeout for running all tests occurs but some devices didn't manage to put anything into logArray, the logArray is missing entry for the device and as a result timeout handling fails with exception. We need to ensure that proper entry exists for each device in logArray.
When running instrumentation tests and timeout occurs the tests are stopped and you can see that in logs but the general outcome from such run is "Tests passed successsfully". We need to handle this case.

Update the radme with the info about signing the app.

Entire project blocked on CI not behaving itself

It looks like 742 doesn't cause CI to run when it is updated. We need to know why.

Switch CI to using out custom JXcore build

We only use SM on the devices so it's bad that we are using V8 in the VM.

WIFI connection to Accesspoint is unreliable with Android devices

Basing on the test results: https://raw.githubusercontent.com/ThaliTester/TestResults/48600797ce001f3_Story_001_juksilve_DrJukka/test_log.md

Each device sets Wifi & Bluetooth on successfully in the beginning. this can be verified by following log lines
"Turning radios to true
toggleBluetooth -
toggleWiFi"

Then from the logs it can be seen that following devices:

samsung-SM-T232: (DEV6226)
HUAWEI-ALE-L21: (DEV5429)

Are never assigned IP address other than local host, indicating that they are not having connection to the Access point, even when the Wifi has been successfully turned on.

As a second issue, from logs it can also be seen that some devices do lose the connectivity in the middle of the test. Following two devices are illustrating this:

HTC-HTC Desire 820:(DEV3229)
HTC-HTC One_M8: (DEV2969)

As it can be seen that they do have sonnectivity in the start, but then on some point middle of the run they start re-reporting connection error, and on that point they are only reporting to have the IP address for local host, indicating disconnection from the AP.

In general this test was better than average, most often the successful connections are well less than 20, and this time it was 22 out of 25 in beginning.

The missing device which did not make it to teh test was LGE-Nexus 5: (DEV2582), which for some reaso had really slow start and was connecting to the coordinator server late.

Upgrade to latest adb

For marshmallow, we would need adb that supports the -g switch to grant all runtime permissions.

We need to upgrade the CI host machine and VM to using El Capitan and to having the latest XCode

More verbose output format with logcat in CI

To get more information via logcat, we could pass -v threadtime to logcat.

There would also be -v long, but based on current understanding, that wouldn't bring useful new information and would generate longer outputs.

Output formats documented at http://developer.android.com/tools/debugging/debugging-log.html#outputFormat.

Review adb commands for failure cases

In android.js there are several places where adb command is issued and it is considered an error case only if the exit code of adb is non-zero. However, that is often not the case, for example:

$ adb -s VS986da9f36ea install ~/Desktop/android_0_625481247756efd.apk 
2953 KB/s (47606904 bytes in 15.741s)
    pkg: /data/local/tmp/android_0_625481247756efd.apk
Failure [INSTALL_FAILED_UPDATE_INCOMPATIBLE]
$ echo $?
0

In above, adb install fails, because the signature in the .apk is different than what is installed on the device. This is as expected, but the as seen above, the exit code is 0.

The failure conditions probably require additional checks to ensure commands worked as expected. For example, when installing an app, it might be required to issue something like shell pm list packages | grep <package-name> and make sure the app actually got installed to the device and fail if not.

Build failures sometimes result in no logs

Check out thaliproject/Thali_CordovaPlugin#724 (comment) for example. I did get output from Appveyor but the logs for CI are empty. This makes debugging a nightmare.

We have to figure out how to not accidentally let in PRs that are not tested against the latest vNext

The issue is that someone creates two PRs and tests both at the same time against vNext using CI. They both pass. But when the first is checked in the tests for the second are no longer relevant because they aren't against the 'new' vNext created by the first check in. So how do we detect that the second PR needs to have its tests run again before it's checked in?

tests are not giving results

in: thaliproject/Thali_CordovaPlugin#223 (comment)

There is successful build with 20 minutes timeout that has not been completed for 4 hours.

And second one which has been build hour ago.

Suspecting something being wrong with CI

Set up Sinopia & Node 6 in CI

Set up Sinopia
Install Node 6 and latest NPM
Update the VM image after calling add user

For PR 811 to ever pass we have to have Sinopia set up. Please look here for instructions on how to do it. The key thing is that once you have run add user you have to take a snapshot of the VM and set up that snapshot to run in the future so we will be able to talk to the Sinopia instance.

As we discussed Sinopia should be run outside of the VM so we can persist information across VM re-runs.

Also we need to add Node 6 and NPM to the VM image so we can do our custom builds of PouchDB and Express-PouchDB.

CI exited to an issue with killing some iOS tasks

Build finished
IS Args: /Users/thali/Github/CI/builder/builds/server_62548124bf1e918 [email protected]:~/Test/server_62548124bf1e918/test/TestServer/
Report server

/Users/thali/Github/CI/tasker/taskman.js:122
            iosChild.kill();
                    ^
TypeError: Cannot read property 'kill' of undefined
    at /Users/thali/Github/CI/tasker/taskman.js:122:21
    at ChildProcess.exithandler (child_process.js:669:1)
    at ChildProcess.emit (events.js:85:9)
    at maybeClose (child_process.js:773:12)
    at Socket.<anonymous> (child_process.js:994:1)
    at Socket.emit (events.js:82:9)
    at Pipe.close (net.js:422:6)

CI starts a rebuild if PR assignment changes

For example, see thaliproject/Thali_CordovaPlugin#246 (comment) where a new build is triggered after assignment changes even though the commit sha1 doesn't change.

Make CI update itself

If we update the CI project we need the CI framework to detect this and then update itself on the next run.

CI doesn't build per commit, but per branch.

If I have a PR from a branch to which I push two commits fairly close to each other, CI is triggered twice (which is expected), but ends up building using the most recent commit, because it does the checkout based on the branch and not based on the commit sha1.

This might be seen as an unexpected behavior when afterwards looking at the logs since the logs are posted as if the test run was made using a certain commit sha1, which isn't the one CI actually used.

Agree on CI to GitHub and Test Server

I submitted a PR defining how I think the CI system should talk to GitHub and the Test Server.

Getting test results took very long (maybe due to branch deletion?)

Build was completed at thaliproject/Thali_CordovaPlugin#254 (comment) and results were received at thaliproject/Thali_CordovaPlugin#254 (comment) which is like twenty'ish hours.

This might be due to the fact that the branch was deleted in between.

Update all iOS devices to 9

Is CI able to output logging from Java?

In looking through the logs I don't see any logs from our Java code, only jxcore-log. If we are building as a release build then generally Android suppresses logs. Which is bad because we need to see those logs!

iPhones can't see coordination server

Oguz is debugging

Make sure CI's router isn't allowing the iOS devices to talk to each other over the local WiFi

We have a new router in CI and we need to make sure that it is configured so that the iOS devices can get to the coordination server (which btw has an Internet routable address) but can't get to any of the other iOS devices over the local wifi or this will invalidate our MPCF testing as we will be testing over normal wifi and not using the P2P radio features.

Please also update the CI README to point out the need for this and explain how we did it with what ever router we are currently using.

Have Marshmallow devices in the device pool

For Marshmallow devices to work, the CI code needs to be updated to issue a command like:

adb shell pm grant <package name> android.permission.ACCESS_COARSE_LOCATION

after the app is installed.

Proposal is that since node thali09 is currently used for offline testing, that node would also be used to test this code change and eventually have 2 Marshmallow devices plugged into it and then the node could be enabled back in CI configuration.

Update the radme with current link to jxcore binaries

The installation section of the readme.md contains obsolete link to jxcore binaries. Instead of jxcore.com the jxcore.azureedge.net/jxcore should be used

CI is incorrectly returning success when it should be returning failure

I got the following message via Email from CI:

Test 79426650eeb77ae(eeb77ae) has successfully finished without an error
See https://github.com/ThaliTester/TestResults/tree/79426650eeb77ae_Almost_to_the_top_of_the_Thali_stack_yaronyg/ for the logs

Which was both exciting and of course mystifying since it shouldn't be working. In fact when I groveled through the logs I saw that we are failing on 'Cannot find module '../thalilogger' but I'll file a separate bug on that.

This bug is - why the heck did I get a mail saying the tests passed when they didn't?

It doesn't look like all the iOS devices have been registered

We are getting errors for some iOS phones in CI that make us think that some of the phones have not been registered with Apple as belonging to our cert

install.js is a beast, it has to be refactored

install.js is a critical piece of code. It is the first thing that anyone using Thali ends up calling and it controls if their initial experience will be good or bad. It is also doing a lot of very complex stuff that will continue to evolve over time. We desperately need it to be less of a hacky mess.

How did this PR break CI?

This PR broke the CI build script. @vjrantal thinks it is because of an illegal character in the title. Can you please investigate and fix?

Implement multiConnect and multiConnectResolved

On iOS connect callback is not called if connection rejected

This can be reproduce with the unit test app.

Example log from the issue at https://github.com/ThaliTester/TestResults/blob/61362366b6c9a6f_Make_CI_build_and_unit_tests_pass_vjrantal/iOS_Iphone5-1.md.

Support for longer CI runs.

Currently, the maximum timeout we can set in mobile_test.json is effectively 30 minutes so that there is enough time to post all logs after the hard-coded "master-timeout" in the CI is reached.

In the future, we probably want to sometimes (e.g., nightly) run longer stress tests that last, for example, 2-3 hours. To support this kind of scenarios, it would be good if there wasn't a hard-coded "master-timeout" in CI, but rather the internal logic would be so that there would be enough to time to do log gathering after the timeout value set in mobile_test.json is reached.

VM needs to be updated with the right path to the latest JX release

CI doesn't seem to test the merged content with PRs

When doing a PR, an important aspect to verify is how the new code works together with the base branch it targets. Here is a quote from Travis docs how they explain is:

Rather than test the commits from the branches the pull request is sent from, we test the merge between the origin and the upstream branch.

Currently, it seems like the CI system tests the code from the branch the pull request is from and not the merged content.

@yaronyg: Can you confirm how we would like this to work?

@obastemur: Please correct me if I had misunderstood the current CI behavior.

CI appears to be running an old version of JXcore

When our tests run in CI I am getting the error:

Missing PFX or certificate + private key.

I believe this error is from an old version of Shawn's build of JXcore before he fixed the need to put in a PFX cert even when you were using PSK. So we need to update the version of JXcore in CI to the latest so we don't see this error.

Follow up for connection status event

jxcore/jxcore-cordova#96

Handle unauthorized Android devices

Devices may sometimes be unauthorized for adb access:

pi@thali05 ~ $ adb devices
List of devices attached 
LGH8153b36be34    unauthorized

Currently android.js has code like:

  if (res[0].indexOf("List of devices") == 0) {
    for (var i = 1; i < res.length; i++) {
      if (res[i].trim().length == 0) continue;
      if (res[i].indexOf('offline') > 0) {
        logme("Warning: Phone " + res[i] + " OFFLINE");
        continue; // phone offline
      }
      var dev = res[i].split('\t');

This code should be updated to also take into account the unauthorized case.

CI might sometimes run old/other apps concurrently with the app under testing

For example at:
https://github.com/ThaliTester/TestResults/blob/49526184a3a5cb3_Reproduce_network_reliability_issues_in_the_CI__vjrantal/test_log.md

We can see that logcat output contains output from the app that is under test, but also from another app (different package id).

Old apps that were tested in the CI should be properly shut down and cleaned up so that they don't interfere with new test runs.

We really need a local NPM cache

Today we lost a whole day of productivity because NPM wasn't behaving itself. We really need CI to have its own NPM registry so we don't have problems like this. Something like https://www.npmjs.com/package/sinopia should be super easy to set up.

Please move the iOS devices to the non-guest network in CI

Until we have the native layers back we are going to test using the wifi based mock. So we need to move all the iOS devices over to the non-guest network in CI so that the wifi based mock will work.

Document what we need to install in the CI VM

We have never written down what software needs to be on the CI VM to work so if we lose the VM We are in a bad place.

Show test server output first in test_log.md

First should be test server output.

The status output (that is currently first) should be moved last and the test server output shouldn't be duplicated (like now).

Android logcat logs not available from all devices

As an example, see https://raw.githubusercontent.com/ThaliTester/TestResults/486007970c43223_Story_001_juksilve_DrJukka/test_log.md where there is 21 Android devices participating to the test.

On app startup, this line is printed to the jxcore log:
Initializing JXcore engine

Above line is found from the linked test results page 13 times, which indicates the the logcat output was available for 13/21 devices.

As a side note, the coordination server indicates that at least 19 devices were able to connect to it so at least 19 devices have been able to successfully start the app and thus at least 19 logcat outputs should be available.

Please change CI's maximum timeout to 30 minutes

Bring new Android devices online

We have a bunch of new devices that we have taken off line because of various bugs that are fixed or soon will be but until we get Android stood up and working on the existing devices we don't want to introduce new ones. So we'll schedule this work item in a sprint once we are ready to take the new devices.

CI sometimes posts wrong log files

I run a test at thaliproject/Thali_CordovaPlugin#340 (comment) but when I followed the results link, some devices seems to have log results from some previous runs.

For example, open https://github.com/ThaliTester/TestResults/blob/51074821f72e61a_Enable_native_layer_and_replication_tests__vjrantal/thali07_samsung-SM-A500FU.md and search for string:

execPath /data/data/com.example.hello/files/www/jxcore

The reason why this line is suspicious is that the package name of the app under this test is not com.example.hello but should be com.test.thalitest.

CI tends to start failing builds if there are too many tasks in the queue

We need to fix this so we don't get random silly failures