peteruhnak / git-migration Goto Github PK
View Code? Open in Web Editor NEWUtility to migrate code from SmalltalkHub (or any MCZ-based repo) to Git
License: MIT License
Utility to migrate code from SmalltalkHub (or any MCZ-based repo) to Git
License: MIT License
"1284 mczs and then a file primitive failed. It wa due to the limit of 256 open files for a Mac process."
temporary workaroundsudo launchctl limit maxfiles 65536
There is no reason why the file limit should be hit as files are read one by one... so maybe bad closing of streams.
In Monticello there can be MCZs with an empty commit messages (at least I found some in the GT repo). Now they get imported in git to commits with an empty commit messages. This causes issues with some operations in git like rebase, which will fail if the git commit has an empty message. It can be fixed with something like the following on the git side:
git filter-branch -f --msg-filter '
read msg
if [ -n "$msg" ] ; then
echo "$msg"
else
echo "The commit message was empty"
fi'
But could be better to have a default template for missing commit messages ().
Repository:
MCSmalltalkhubRepository
owner: 'PavelKrivanek'
project: 'Tuppu'
user: ''
password: ''
Generation of fast import file:
migration := GitMigration on: 'PavelKrivanek/Tuppu'.
migration cacheAllVersions.
migration allAuthors.
migration authors: {'PavelKrivanek' -> #('Pavel Krivanek' '<[email protected]>')}.
migration
fastImportCodeToDirectory: 'src'
initialCommit: '5e53cc6'
to: 'import-tuppu.txt'
fast import:
# git fast-import < ../import-tuppu.txt
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 5000
Total objects: 278 ( 302 duplicates )
blobs : 197 ( 300 duplicates 36 deltas of 191 attempts)
trees : 74 ( 2 duplicates 34 deltas of 74 attempts)
commits: 7 ( 0 duplicates 0 deltas of 0 attempts)
tags : 0 ( 0 duplicates 0 deltas of 0 attempts)
Total branches: 1 ( 1 loads )
marks: 1024 ( 7 unique )
atoms: 113
Memory total: 2294 KiB
pools: 2098 KiB
objects: 195 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 33554432
pack_report: core.packedGitLimit = 268435456
pack_report: pack_used_ctr = 8
pack_report: pack_mmap_calls = 1
pack_report: pack_open_windows = 1 / 1
pack_report: pack_mapped = 32607 / 32607
---------------------------------------------------------------------
Then load both packages from the orriginal repository and try to merge with sources in the Git migrated repository. Several methods are added:
Tuppu>>#mutex
Tuppu>>#mutex:
Tuppu>>#open
TuppuRepository>>#fileName
...and many others
I get this exception when trying to move the Cryptography (this is not the only one that triggers this error, it is just an example) repository to git.
It happens when executing the last step.migration fastImportCodeToDirectory: 'Cryptography' initialCommit: '785682d219b2dfef1320ee6657211c74fb15ebf0' to: 'Cryptography.txt'.
My entire script is:
`
migration := GitMigration on: 'http://www.squeaksource.com/Cryptography'.
migration downloadAllVersions.
migration populateCaches.
migration allAuthors.
newAuthors := OrderedCollection new.
migration allAuthors do: [ :each |
|email| email := '<',each,'>'.
newAuthors add: each -> {each . email} ].
newAuthors.
newAuthors add: ('GeorgeGanea' -> #('George Ganea' '[email protected]')).
migration authors: newAuthors.
migration
fastImportCodeToDirectory: 'Cryptography'
initialCommit: '785682d219b2dfef1320ee6657211c74fb15ebf0'
to: 'Cryptography.txt'.
`
This happens in
Pharo 7.0
Build information: Pharo-7.0+alpha.build.1261.sha.9ed1473c3fb9c3853ee730e406bfb012c9fa8297 (32 Bit)
Is there anything else I could do to help with this issue?
When converting the (legacy) Seaside30 Smalltalkhub repository to github, I encountered import errors which I traced to an incorrect declared byte size of the data
command that carries the commit message text.
I also discovered that the class GitMigrationCommitInfo
already has code to clean a commit message in inlineDataFor:
but that is was unused. Changing the writeCommitPreambleFor:
method such that it calls the GitMigrationCommitInfo>>inlineDataFor:
method seems to fix the problem adequately.
I will prepare a PR after I check the result of migrating the Seaside30 repository with those changes.
Hi, I understand that it is difficult to port branches history across several packages: information is missing as whether package_P.branch_2 relates to package_Q.branch_4...
Since most MC branches are nameless, there is no hint for resolving such dilemma.
But I'm convinced that we can do better, and we should do better.
Simple cases first: for a single package, it's perfectly doable (see for example https://github.com/hpi-swa/Squot).
Example of simple MC branches with versions A B C D:
*-B---*
/ \
A-* *-D
\ /
*---C-*
will appear as:
A---B---C---D
when we search what C was good for, we see diffs between parallel branches B and C, that is all diffs between A and B, (B-A) union all diffs between A and C (C-A)... It's exactly the MC reparent option which was proposed recently in Squeak, and which is kind of dangerous for these reasons.
If (B-A) is big, (it might in fact span several commits), then we completely loose the point of C, and history becomes useless. That's why we should do better if we can. It occurs quite often in VMMaker for example, see OpenSmalltalk/opensmalltalk-vm#305
In case of single package P, then I would say reproduce P topology.
For multi-package, I have some ideas (and heuristics), but it will be too long to discuss here.
I will try to think longer about details and gather them into another place.
For now, if we don't want to enter into multi-package parallel branches complexity, then the least we can do is to rebase C on B (with automatic conflict resolution on ours, that is, newest wins). At least, the diff will be relevant (modulo the conflicts, but conflicts are kind of relevant too, aren't they?).
This may not work so well when renaming classes/methods in one of the branches, because MC does not track such operation, but in most cases, it will lead to a more useful diff than we are getting now.
Also, the commit message should contain meta-information of which MC version is committed, and what special operation (rebase or reparent) did take place at translation.
When migrating the GT repo (Moose/GToolkit) there is an #hour was sent to nil
exception in #topologicallySort: as a MCVersionInfo has the time
attribute nil.
I can bypass this by setting a time of 00:00 but not sure why the time is not extracted from the mcz? Can it be missing? The MCVersionInfo looks kind of strange.
Anything that could be done in the migration tool?
When importing the history for Moose/GToolkit into git, there is a commit that has a null byte. Because of this it's not possible to push the changes.
Running git fsck
shows:
Checking object directories: 100% (256/256), done.
warning in commit 5c31cc85a7b9e629a167998edfde8f4639b54670: nulInCommit: NUL byte in the commit object body
Checking objects: 100% (26884/26884), done.
When pushing:
Counting objects: 26884, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (10393/10393), done.
error: RPC failed; curl 55 SSL_write() returned SYSCALL, errno = 32
fatal: The remote end hung up unexpectedly
Writing objects: 100% (26884/26884), 14.59 MiB | 8.05 MiB/s, done.
Total 26884 (delta 16424), reused 26882 (delta 16422)
fatal: The remote end hung up unexpectedly
Everything up-to-date
The commit in question is:
commit 5c31cc85a7b9e629a167998edfde8f4639b54670
Author: Alexandre Bergel <[email protected]>
Date: Fri Aug 22 11:50:53 2014 +0000
Some tab to Class:
Is this currently possible? For example, I sometimes have an St repo with multiple tiny projects. When I'd like to port one to GH, I'd need to be able to specify a package filter (e.g. a block).
I hacked this behavior with:
GitMigration>>versionsByPackage
| versionsByPackage all |
versionsByPackage := Dictionary new.
all := repository versionsWithPackageNames.
(all select: [ :e | e first beginsWith: 'Val' ])
do: [ :quad |
(versionsByPackage at: quad first ifAbsentPut: [ OrderedCollection new ])
add: (self cachedVersions at: (quad last withoutSuffix: '.mcz')) ].
^ versionsByPackage
During the export of the history in Glamour there seems to be an infinite loop when exporting the mcz Glamour-Examples-tg.36
. After half an hour the export made no progress. Interrupting the execution a few times always end up in TonesWriter>>#splitMethodSource:
for the method a MCMethodDefinition(GLMSTNamedModel>>#nameØY)
. The strange part is that the selector name is nameØY
while the source code of the method is:
named: aString
^ self named: aString environment: self defaultEnvironment
Its sent in GitMigrationMemoryTreeGitRepository>>#memoryStoreVersion:
Do you have a copy of the method lying around?
thanks
both export methods call caching already, so I am not sure where the problem lies.
In either case also visualizations would fail without the call.
When exporting Glamour for the package Glamour-Scripting-tg.2
there is an error NotFound: [ :wc | cat beginsWith: wc packageName ] not found in Array
when exporting the file.
The error happens when exporting the method GLMPresentation>>#display:
. This method is present in the mcz and looks fine there. The issue might be related to the fact that it's an extension method.
!GLMPresentation methodsFor: '*glamour-scripting' stamp: ' 25/2/09 11:51'!
display: aBlock
self transformation: aBlock! !
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.