rbowler / spinhawk Goto Github PK

spinhawk is the repository for the production-quality version (release 3.xx) of the Hercules mainframe virtualization platform

License: Other

Shell 2.25% C 86.39% C++ 1.95% HTML 6.63% CSS 0.03% Objective-C++ 0.06% Perl 0.06% Batchfile 0.01% Makefile 0.29% M4 1.89% Rich Text Format 0.45%

spinhawk's People

Contributors

Stargazers

Watchers

spinhawk's Issues

3.09 fails to build via homebrew on Mac OS X 10.9

When trying to install hercules version 3.09 via homebrew, the build failed with two errors involving SOL_TCP. Here is the first:

clang -DHAVE_CONFIG_H -I. -I. -I./decNumber -DPKGDATADIR="/usr/local/Cellar/hercules/3.09/share/hercules" -DMODULESDIR="/usr/local/Cellar/hercules/3.09/lib/hercules" -W -Wall -c hdl.c -fno-common -DPIC -o .libs/hdl.o
hscutl.c:704:26: error: use of undeclared identifier 'SOL_TCP'
rc = setsockopt(sfd, SOL_TCP, TCP_KEEPINTVL, &optval, sizeof(optval));
^
SOL_TCP appears to be a linux specific network identifier (include/linux/socket.h).

Hercules r command incorrect output

Just give r 8ffe.20 as I did

15:31:56 r 8ffe.20
15:31:56 R:0000000000008FFE:K:06=0000 ..
15:31:56 R:000000000000900E:K:06=0F10 11121314 15161718 191A1B1C 1D1E ................

It seems that going over to the new page the offset is forgotten to clear!

Missing interval timer/CPU timer interruptions with ECPS:VM active

The following bug issue and text is reproduced from Hyperion issue 214 so that this bug may be documented and fixed in Spinhawk.

It was recently reported to me that the CPWATCH utility virtual machine was abending with a PIC 9 when connecting to CPWATCH after it had some eight-plus hours of inactivity. It was reported that it had something to do with the DISP0 assist of ECPS:VM. When DISP0 was disabled, CPWATCH operated properly. I was able to recreate and confirm these findings.

To boil down days of investigation and research, it turns out that this problem is related to the prior issue #63 . In that prior issue, which was a case of a virtual machine being dispatched as runable even though the virtual PSW wait bit was set, I mentioned that it was odd that the VMPSWAIT dispatchability flag had not been set in the VMBLOK even though the virtual machine was clearly in a wait.

As a result of the investigation into the CPWATCH issue, I found that the code for DISP0 is completely missing a statement which would set the VMPSWAIT bit. This bit is clearly set in the corresponding code in DMKDSP. Setting this bit in DISP0 resolves the issue with CPWATCH and no interruptions go missing. By not setting the bit, later execution by the scheduler (DMKSCH) and then DMKDSP2 (or DISP2 assist) would later attempt to dispatch the virtual machine even though it was in a wait. The fix to #63 allowed DISP2 to bail out of an actual dispatch but this whole sequence of events meant that CP was not able to reschedule a new TRQBLOK representing the next timer interruption. Thus, the timer interruption the virtual machine was waiting on would never come and was "missing".

Besides the CPWATCH issue, this could potentially also affect guest operating systems or any other virtual machine whose programming depended on periodic interruptions from one of the timers.

The following code from DISP0 shows the absence of code to set the VMPSWAIT bit. Just before testing the V-PSW for Wait, VMPSWAIT is set off, as is done in the corresponding DMKDSP code. But once determining that the V-PSW has Wait set, the VMPSWAIT flag is not set back on, even though DMKDSP (further below) does so.

 /* DMKDSP - CKWAIT */
 /* Clear Wait / Idle bits in VMRSTAT */
 B_VMRSTAT=EVM_IC(vmb+VMRSTAT);
 B_VMRSTAT &= ~(VMPSWAIT | VMIDLE);
 EVM_STC(B_VMRSTAT,vmb+VMRSTAT);
 if(F_VMPSWHI & 0x00020000)
 {
     DEBUG_CPASSISTX(DISP0,WRMSG(HHC90000, "D", "DISP0 : VWAIT - Taking exit #28"));
     /* Take exit 28  */
     regs->GR_L(11)=vmb;
     UPD_PSW_IA(regs, EVM_L(elist+28));   /* Exit +28 */
     CPASSIST_HIT(DISP0);
     EVM_ST(DISPCNT,dlist);
     return;
 }

Here is the corresponding code from DMKDSP:

 CKWAIT   EQU   *  CHECK HERE FOR DISABLED OR IDLE WAIT STATES: %V3M4038
     NI    VMRSTAT,X'FF'-(VMPSWAIT+VMIDLE)   UNFLAG WAIT   %V3M4038
     TM    VMPSW+1,WAIT   STILL IN WAIT ??                 %V3M4038
     BZ    DISPATCH       NO -- GO DISPATCH                %V3M4038
     OI    VMRSTAT,VMPSWAIT   FLAG IN WAIT                 %V3M4038

Therefore, the solution is easy. Set the VMPSWAIT bit within the same if-block shown above that is executed when the V-PSW wait bit is set. The newly inserted statements are the last two statements shown:

 if(F_VMPSWHI & 0x00020000)
 {
     DEBUG_CPASSISTX(DISP0,WRMSG(HHC90000, "D", "DISP0 : VWAIT - Taking exit #28"));
     /* Take exit 28  */
     B_VMRSTAT |= VMPSWAIT;
     EVM_STC(B_VMRSTAT,vmb+VMRSTAT);

This resolves the issue with CPWATCH.

hscmisc.c:1218:29: warning: duplicated 'if' condition [-Wduplicated-cond]

Source code is

   if (inst[1] >= 0x29 && inst[1] <= 0x2C)
        addr2 = regs->GR(b2) & ADDRESS_MAXWRAP_E(regs);
    else
    if (inst[1] >= 0x29 && inst[1] <= 0x2C)
        addr2 = regs->GR(b2) & ADDRESS_MAXWRAP(regs);

Second if cannot be executed. Suggest code rework.

double NaN to float conversion problem

I have been told to report here an issue I suspected could be a Hercules emulator problem, and was told it works (does not abort) on actual hardware. This small test program fails on hercules-3.08.2-1.fc20.x86_64 using a fedora 16 image:

extern void abort(void);

void
test(float f)
{
    float fnan = 0.0f/0.0f;
    if (*(int*)&f != *(int*)&fnan)
        abort();
}

int
main(int argc, char *argv[])
{
        double dnan = 0.0/0.0;
        test(dnan);
        return 0;
}

ctcadpt.c has wrong ethertype for RARP

(Originally found by Joe Monk)

The ethertype value defined for a RARP frame is 0x0835.

#define  ETH_TYPE_IP        0x0800
#define  ETH_TYPE_ARP       0x0806
#define  ETH_TYPE_RARP      0x0835
#define  ETH_TYPE_SNA       0x80D5

The correct ethertype value for a RARP frame is 0x8035. Ref:
https://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml#ieee-802-numbers-1

the CTC adapter will not process RARP frames correctly, because it doesnt recognize them! (ctc_lcs.c).

--- a/ctcadpt.h
+++ b/ctcadpt.h
@@ -129,7 +129,7 @@ typedef struct _ETHFRM ETHFRM, *PETHFRM;
 
 #define  ETH_TYPE_IP        0x0800
 #define  ETH_TYPE_ARP       0x0806
-#define  ETH_TYPE_RARP      0x0835
+#define  ETH_TYPE_RARP      0x8035
 #define  ETH_TYPE_SNA       0x80D5

Why has Hercules turned into a Fish Fest ?

Can you please explain to me why a project that started off as a decent working Linux project has morphed into a broken Windows project.

There is no need whatever for a gui interface nor loadable modules. It seems to me that whole direction is merely to support a crappy Windows GUI that nobody asked for nor needs. Once you are working in the emulation so to speak, all there is is devices and terminals. And devices and terminals have no obvious need for any gui whatsover.

It is impossible to build the spinhawk Hercules without dynamic module support, and this is utterly ridiculous.

You can tell by looking at the hyperion fork that Hercules has become a joke. In order to support a crappy Windows telnet client, the console telnet code has become so broken that vanilla telnet clients do not work any longer.

Why have you allowed any of this mess to happen? And what are you going to do about it?

Obviously not every contribution in the New Era has been disastrous, but there have been enough to warrant a complete overhaul of the code and a return to sanity.

dasdutil.c: capacity_calc possible bug (not enforcing track record limits)

Hi Roger,
I'm probably in the tiny minority (besides Tommy Sprinkle) of Hercules users', who does a lot of "No. O.S." / "bare metal testing / exploring / learning".

E.g. for a KL=0, DL=74 I'm getting :

Device KeyLen DataLen(dec) Records Written Expected (allowed) Number of Records

-------- ------ ------- --------------- ------------------------------------

#3330 0 74 161
#3340 0 74 105 ??
#3350 0 74 236 74
#3375 0 74 255 74
#3380 0 74 255 74
#3390-1 0 74 255 ???
#3390-3 0 74 255 ???
#3390-9 0 74 255 ???
#9345 0 74 255 ???

https://github.com/mstram/hercules-bare-metal-DASD-tests

https://github.com/mstram/hercules-bare-metal-DASD-tests/blob/master/test-results.txt

Virtual interval timer incorrectly updated with ECPS:VM active

The following bug issue and text is reproduced from Hyperion issue 187 so that this bug may be documented and fixed in Spinhawk.

Ivan,

I believe that I have found the cause of the interval timer problems that occur when ECPSVM is YES. I took me about 3 days to track it down but I think I have it. The solution is simple, but I would like to run it by you since you were the author of the ECPS support to see if you agree with my conclusions.

Briefly, a little background. Around three years or so ago, a member of the VM group posted that he could not get STIMER to work correctly in CMS when running ECPS, and of course it worked fine when ECPS was turned off. That issue is easy to recreate and I was able to validate that user's claims. The cause was that with ECPS active, the real machine's interval timer value was being stored in the virtual interval timer for the CMS user. This means that CP's time slice value in the real timer was being imposed upon the virtual machine user, regardless of what value the user wanted to store in his own timer for the purposes of his STIMER application. (In this case, the user wanted to wait for a few seconds). No matter what a CMS application stored in the virtual timer, ECPS overrode it with the real timer's value.

Of course that should not occur. The virtual timer should be decremented at the same rate as the real timer, but the contents of the virtual timer should not be reset to the real value.

The cause appears to be in the Hercules source in clock.c. A small snippet of code is pasted below for reference. The last statement shown is the one in question.

 #if defined(FEATURE_INTERVAL_TIMER)
  static INLINE void ARCH_DEP(_store_int_timer_2) (REGS *regs,int getlock)
  {
  S32 itimer;
   .
   .
   itimer=int_timer(regs);
   STORE_FW(regs->psa->inttimer, itimer);

 #if defined(FEATURE_ECPSVM)
  if(regs->ecps_vtmrpt)
  {
     vtimer=ecps_vtimer(regs);
     STORE_FW(regs->ecps_vtmrpt, itimer);

What is happening here is that the current real interval timer value is obtained and stored in PSA, and then if ECPS is active, the current virtual timer value is obtained but the real timer value is stored in the virtual timer location.

The problems are resolved if I change the last statement to

   STORE_FW(regs->ecps_vtmrpt, vtimer);

It is that simple. All of your timer calculations elsewhere and in ecpsvm.c appear to be done correctly. I can find no errors in processing after this change, and the STIMER example works perfectly and with whatever proper interval I specify. The results are now the same with ECPS on or off.

I'm pretty certain of my facts and findings here, but I always have a bit of doubt because I am not sure of your intentions at the time and there are no comments in the code to help clarify. If you could weigh in and say whether you agree with this change or not it would be very helpful.

Regards,
Bob

general2.c:2308: bad condition ?

general2.c:2308:28: warning: logical 'and' of mutually exclusive tests is
always false [-Wlogical-op]

Source code is

if(utf16[2] < 0xdc && utf16[2] > 0xdf)

maybe better code:

if(utf16[2] < 0xdc || utf16[2] > 0xdf)

Request to add S37x to 3.13

I have my own customized MVS 3.8 system and thus TK4- is not a good fit for me. I am looking to make some mods to implement Dual address space and would like a stable (not regularly changing) hercules to do this work on. As I am currently unemployed due to layoffs I can work on this but it will take me a few months to relearn my way around the code. I also seem to remember that Jurgen has made some updates to his version of hercules.

PATCH: Fix all-zero MAC bug in LCS cmd reply frame handling

http://hercgui.pedroramos-si.com/downloads/3.10-lcs-zeromac.patch

Further details:

https://groups.yahoo.com/neo/groups/hercules-390/conversations/messages/73963

Issues with write_socket() and read_socket() in hsocket.c

I am attempting to create a new device emulator for Hercules based on commadpt.c I am compiling under VMS but I believe the same issues exist on the major Hercules platforms. Like commadpt.c, I am calling write_socket() and read_socket(). My main reason for using these functions is that it seems to be implied that using send() / recv() or read() / write() directly would cause difficulties on one or other of the main platforms. Firstly, I wonder if this is really the case? Both sets of routines seem to work equally well on my platform. I wonder would send() and recv() be suitable for use on any and all platforms?

Secondly and more importantly, it appears that write_socket() and read_socket() only work correctly with blocking sockets. In the case of non-blocking sockets, such as in commadpt.c and in my code, write_socket() can end up writing some of the requested data to the network and returning -1 to indicate an error, with errno containing EWOULDBLOCK. The problem is that there is no indication of how much data was successfully written and how much was not, so there is no possibility of retrying the write later and getting it to work correctly. If the code is modified to always return the amount of data successfully written, the problem can be avoided and errors can still be detected by the caller examining errno if the returned value indicates the amount of data written was less than requested.

A similar issue exists with read_socket() but I have not been able to come up with a suitable fix as there is an added complication of detecting when the connection has been closed by the other end.

ARCHLVL ENABLE BITxx

Hi,
Are there any plans to implement ARCHLVL ENABLE BITxx ?

This is apparently what is needed to run Linux390 guests.

At least that's what I've found with Sles 11.

I know it's present in Hyperion, but I'm running into another error running Hyperion (HHC00136W Error in function setenv(): cannot set CUU: Not thread safe--setting disabled)

Mike

Minor bug issues in ECPS:VM assist DISP2

The following bug issue and text is reproduced from Hyperion issue 228 so that this bug may be documented and fixed in Spinhawk.

Several errors have been discovered in the original DISP2 assist code that manifest themselves in strange ways and probably explains why they have not been found until recently.

For starters, this block of code from DISP2:

/* If an Extended VM, Load CRs 3-13 */
/* CR6 Will be overwritten in a second */
if(B_VMESTAT & VMV370R)
{
    for(i=4;i<14;i++)
    {
        regs->CR_L(i)=EVM_L(F_ECBLOK+(3*4)+(i*4));
    }
}

There are three things wrong in the code above:
1.The referenced byte B_VMESTAT is incorrect; bit VMV370R is part of byte VMPSTAT.
2.The comment says to load CR 3-13. The code loads CR 4-13. Which is it? The correct answer is CR 4-13, per the corresponding code in DMKDSP.
3.The expression F_ECBLOK+(3 * 4)+(I * 4) is incorrect. The +(3 * 4) term of the expression specifically is the part that is bad. This incorrectly displaces the load point causing the wrong control register contents to be loaded into the corresponding control registers.

The corrected code will be:

/* If an Extended VM, Load CRs 4-13 */
/* CR6 Will be overwritten in a second */
if(B_VMPSTAT & VMV370R)
{
    for(i=4;i<14;i++)
    {
        regs->CR_L(i)=EVM_L(F_ECBLOK+(i*4));
    }
}

The next issue refers to the following block of code:

SET_PSW_IA(regs);
/* Dispatch..... */
DEBUG_CPASSISTX(DISP2,logmsg("DISP2 - Next Instruction : %2.2X\n",ARCH_DEP(vfetchb)(regs->psw.IA,USE_PRIMARY_SPACE,regs)));

The reported problem was after turning on ECPSVM debugging messages (via the command 'ecpsvm DEBUG disp2'), then upon IPLing CMS, the following messages were received from CP on the virtual machine console:

OPERAND MISSING OR INVALID
CMS HRC SYSTEM - MAY 2, 2006
DMSITP143T OPERATION EXCEPTION OCCURRED AT 01D280 IN SYSTEM ROUTINE WAITRD, RE-IPL CMS.
CP ENTERED; DISABLED WAIT PSW '00020000 4001B0D6'

It turns out that the DEBUG_CPASSISTX statement in the code block above is the culprit. This statement is only executed when DISP2 debug is active. Within the statement, the ARCH_DEP(vfetchb) is the real source of the problem.

The ARCH_DEP(vfetchb) is attempting to fetch the opcode byte of the next instruction to be executed once the virtual machine is dispatched by DISP2; the DEBUG_CPASSISTX statement simply serves to display the message within and the opcode byte since debugging was active. This becomes an issue when a page fault occurs trying to fetch the opcode byte.

This is difficult to explain, but recall that as this code is executed, the CP assist DISP2 is in progress. The real machine is in supervisor state and CP is in control; from the real machine's point of view all it has done is execute a E611 opcode assist instruction (which happens to invoke DISP2). DISP2 is preparing to dispatch a virtual machine user and just before this code block is executed, the user's register contents have been loaded into the GPRs, and the real PSW has been built and loaded into the PSW (the 'runpsw'). This means that DAT is now on and the machine has been switched to problem state.

Then the flow encounters this DEBUG_CPASSISTX statement containing the ARCH_DEP(vfetchb). If the referenced storage is paged in there is no problem. If it is not, then a page fault occurs. When ARCH_DEP(vfetchb) encounters a page fault, it does not return to the caller (that is, it will not return to the MSGBUF( ) function within DEBUG_CPASSISTX( ). This also means the debug message is not displayed). Instead, ARCH_DEP(vfetchb) presents a page fault to the real machine. CP gets control at this point, at the program check new PSW. CP notes the page fault and believes a user was in control (because the interrupt PSW is in problem state) and dutifully adjusts the instruction address in the virtual PSW based on the ILC and performs the usual page fault handling to bring in the page. Then the dispatcher is called again to dispatch the same user and eventually re-executes the E611 instruction to invoke the DISP2 assist. This time, as the offending DEBUG_CPASSISTX( ) function containing the ARCH_DEP(vfetchb) is encountered, there is no page fault as this has been resolved by CP. The debug message is displayed and DISP2 proceeds to dispatch the virtual machine user.

Except that the virtual PSW had been decremented by the length of the E611 instruction! The actual virtual machine user's opcode was never executed in order to set a proper ILC! The page fault occurred before the actual dispatch completed. The last instruction executed by the real machine was E611 which has an ILC of 6. But because the run PSW had been loaded before the page fault occurred, CP thought it was a valid page fault coming from a virtual machine and CP correctly adjusted the virtual PSW for the ILC. This incorrect backing up of 6 bytes caused the virtual machine to be dispatched with an instruction address pointing in the middle of another instruction. This caused the error messages on the virtual machine console shown above. (Additionally, the DEBUG_CPASSISTX message that was finally displayed instead displays the wrong opcode - six bytes earlier).

There are two possible solutions for this issue. The simplest is to simply remove the offending DEBUG_CPASSISTX( ) statement in its entirety. There is little value in showing the next opcode at dispatch anyway, and we cannot prevent ARCH_DEP(vfetchb) from not returning upon page fault. As long as it might not return, this problem can occur.

The other solution could be to place another statement prior to DEBUG_CPASSISTX( ) to call ARCH_DEP(translate_addr) to determine if the virtual address pointing to the opcode byte is available. This function provides a return code and does not generate a program check. If the return code says translation was not available then simply avoid issuing the DEBUG_CPASSISTX( ). However, given the dubious value of displaying the opcode of the first dispatched instruction as well as the high frequency of calls to DISP2, the added overhead to do the ARCH_DEP(translate_addr) does not seem worth it. Therefore, I will resolve this issue by removing the offending DEBUG_CPASSISTX( ) function.

Many thanks to Peter Coghlan for discovering these issues and making me aware of them. Peter did the vast majority of the legwork to discover what was happening and why these code blocks were causing problems, including the remarkable scenario of the bogus page fault in a CP assist instruction! I was able to recreate the scenarios and verify that the solutions resolved the issues.

No configure script (only configure.AC, which doesn't work cleanly)

The project has a configure.AC and config.sh script, but no configure script, and no instructions on how to successfully use autoconf to create one.

Environments

MacOS Sierra (10.12.6)
Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-87-generic x86_64)

Steps to Reproduce

Install autoconf via MacPorts (macOS) or apt-get (Ubuntu)
Install Sierra xcode (MacOS).
execute autoconf

Expected Result

Working configure script is created which can configure a build of Hercules.

Actual Result

Numerous undefined variable messages, and broken configure script is created.

./configure missing

I just cloned the repository and the configure program (./configure) is missing.

Building a 32-bit Windows Version fails

Intermittent PRG001 failures in VM/370 with ECPS:VM active

The following bug issue and text is reproduced from Hyperion issue 191 so that this bug may be documented and fixed in Spinhawk.

Very occasional PRG001 failures can occur in CP when using the ECPS:VM assist. A prominent feature of all of these failures is what appears to be an address stored into location 0 real, in the PSA. This address points to a word within a dispatcher parameter list used by the assist. This alteration of word 0 does not itself cause the system to fail. However, there are other storage overlays that are also occurring and sometimes CP code is overlaid resulting in the PRG001 abends.

It turns out that the cause of these overlays are mis-coded C macros in ecpsvm.c in the DISP2 assist. If that assist encounters a problem trying to fret a CPEXBLOK, a special code path is invoked to store some CPEXBLOK values into a dispatcher parameter list so that CP itself can fret the block upon resuming control from the assist. The mis-coded macros cause the CPEXBLOK values to be used as addresses of "where to store", and the parameter list addresses are the values that are splattered all over real storage. One of the CPEXBLOK words contains 0 ( CPEXBKUP[13] ), resulting in the store of a parameter list address at location 0.

The solution is simple. Reverse the operands in the EVM_ST macros. The original code is below, followed by the corrected code.

Original code:

  EVM_ST(dl+40,CPEXBKUP[12]);
  EVM_ST(dl+44,CPEXBKUP[13]);
  EVM_ST(dl+48,CPEXBKUP[14]);
  EVM_ST(dl+52,EVM_L(F_CPEXB+12)); /* DSPSAVE + 12 = CPEXADD */
  EVM_ST(dl+56,CPEXBKUP[0]);
  EVM_ST(dl+60,CPEXBKUP[1]);

Corrected Code:

  EVM_ST(CPEXBKUP[12],dl+40);
  EVM_ST(CPEXBKUP[13],dl+44);
  EVM_ST(CPEXBKUP[14],dl+48);
  EVM_ST(EVM_L(F_CPEXB+12),dl+52); /* DSPSAVE + 12 = CPEXADD */
  EVM_ST(CPEXBKUP[0],dl+56);
  EVM_ST(CPEXBKUP[1],dl+60);

Other notes about this problem:
Most users that choose to run the ECPS:VM assist as is would not encounter this abend because the FREE/FRET storage trap (HRC0035DK) is enabled in VM 5-pack and in VM Sixpack. When this trap is enabled, several of the ECPS:VM assists are disabled because of incompatibility with the trap. One of these disabled assists is DISP2. Hence, trap users would never invoke the assist DISP2 and would never encounter the storage overlay due to the errors in the assist code.

QDIO/QETH backport request from Hyperion

This is an enhancement request.
Is there any plan to backport to Spinhawk the QDIO/QETH enhancements present in Hyperion ?
The current QETH code in hyperion (including QDIO interrupt control in channel code) is functional at least for Layer 2 support (tested under z/VM 6.3, in VSWITCH mode or direct z/VM TCPIP control mode as well as Linux running directly under hercules and under z/VM)
The code in hyperion channel code includes QDIO Signaling, QDIO interrupt support and QDIO Thin Interrupt support (Thin Interrupt means an interrupt does not require a TSCH to be cleared).

I have however no idea the amount of work it would take to port that particular portion without importing anything else that might have been changed in the I/O support code.

Cannot define a full 3350 emulated disk (somitcw)

C:\Trash>dasdinit -z test.3350 3350 test50 1026
HHCDU044I Creating 3350 volume TEST50: 1026 cyls, 30 trks/cyl, 19456 bytes/track
HHCDU028E device type 3350 not found in dasd table
HHCDI002I DASD initialization unsuccessful

ECPS:VM incorrectly altering behavior of SPKA instruction

The following issue and text is reproduced from Hyperion issue 186, so that this bug can be documented and fixed in Spinhawk.

About 4 years ago, I reported a bug to the Hercules Yahoo discussion group about the ECPS:VM feature when running VM under VM because the SPKA instruction was failing to execute (that is, acting as if it were a no-op).

Several days ago I decided to track down this issue because it keeps me from running ECPS and I really want to do so (with ASSIST ON SVC NOTMR of course). I'll spare you all the gory details of how I tracked it down but the bottom line is I found the problem and I have a solution. I'll try to keep it reasonably brief but some explanatory detail is necessary.

It turns out that when VM is IPLed, there is a LCTL 2,3 instruction very early in the IPL. This code is in DMKCKP just past label CKP002. Just after IPL, the LCTL instruction is located at or near absolute address x'DB8'. CP is trying to initialize the channel masks in CR2 and for some reason also loads FFs into CR3.

The ECPSVM support code in ecpsvm.c in function ecpsvm_dolctl( ) has case statements in order to handle the loading of individual control registers. In particular, this brief snippet of code:

  case 3: /* DAS Control regs (not used under VM/370) */
  case 4:
  case 5:
  case 7:
  ocrs[j]=crs[j];
  rcrs[j]=crs[j];
  break;

Basically, nothing special is being done for CR3-7 and the contents requested by the LCTL instruction are copied into the virtual machine control registers (ultimately to the ECBLOK), and also copied to the real machine's (Hercules') control registers.

The flaw here I believe, is the copying of the new control register values into the real control registers. Remember, we're issuing the LCTL in a virtual machine and ECPS wants to handle it because a supported privileged instruction was issued in real problem state. In no case should a virtual machine be allowed to directly alter the real control registers. If CP were simulating the LCTL execution instead of ECPS, CP loads CR3 into the ECBLOK for the virtual control registers for the virtual machine, but does not alter the real CR3. But the ECPS:VM support is allowing the alteration because of the "rcrs[j]=crs[j];" statement above.

Once CR3 is filled with FFs, then the SPKA instruction begins to fail, and it fails in all virtual machines. This is because there is code in the SPKA instruction handling in control.c that checks CR3 and the PSW key mask bits. Since there is now a non-zero value in CR3, this has meaning. More on this in a bit.

I believe the solution to the problem is to reject the LCTL instruction handling by ECPS when CR3-7 are specified and kick it back to CP to simulate it. At first I thought that just removing the second from last line and therefore not update the real CRs would be sufficient, and that certainly works great in my testing. But then it occurred to me that such a solution might not be a good idea because someone might be running VM/SP. As I recall VM/SP supports DAS. But I am not sure what CP's exact handling is for LCTL in VM/SP with regards to CR3-7, so the safest route would be to kick it back and let whatever CP version it is handle it. All other LCTL instructions issued that do not specify these CRs could benefit from ECPS, and this is the vast majority of LCTLs issued.

Therefore, my proposed solution is:

  case 3: /* DAS Control regs (not used under VM/370) /
  case 4:
  case 5:
  case 7:
  return 1; / let CP deal with it */

Regarding the SPKA instruction handling in control.c and the check it does against CR3: this is questionable behavior in my mind. I think CR3 should only be checked when the DAS feature is available. Otherwise, it should not be checked because CR3-CR7 are not defined in earlier S/370 machines. However, I am also aware that the Principles of Operation says that undefined bits should be set to zero and in this case that more or less takes care of the issue if that guideline is followed. I mention it here for completeness.

not possible to build with disable-dynamic-load

pr@dev64:~/hercules/hercules/archiv/x/hercules-3.10$ ./configure --disable-dynamic-load --disable-external-gui && make
...
/bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I./decNumber -I./softfloat -DPKGDATADIR="/usr/local/share/hercules" -DMODULESDIR="/usr/local/lib/hercules" -W -Wall -O3 -march=k8 -fomit-frame-pointer -MT hdl.lo -MD -MP -MF .deps/hdl.Tpo -c -o hdl.lo hdl.c
gcc -DHAVE_CONFIG_H -I. -I. -I./decNumber -I./softfloat -DPKGDATADIR="/usr/local/share/hercules" -DMODULESDIR="/usr/local/lib/hercules" -W -Wall -O3 -march=k8 -fomit-frame-pointer -MT hdl.lo -MD -MP -MF .deps/hdl.Tpo -c hdl.c -fPIC -DPIC -o .libs/hdl.o
hdl.c:60:36: error: unknown type name 'HDLINS'
hdl.c:58:13: warning: 'hdl_didf' declared 'static' but never defined [-Wunused-function]
make[2]: *** [hdl.lo] Fehler 1
make[2]: Verlasse Verzeichnis '/home/pr/hercules/hercules/archiv/x/hercules-3.10'
make[1]: *** [all-recursive] Fehler 1
make[1]: Verlasse Verzeichnis '/home/pr/hercules/hercules/archiv/x/hercules-3.10'
make: *** [all] Fehler 2

pr@dev64:~/hercules/hercules/archiv/x/hercules-3.10$ uname -a
Linux dev64 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:24:59 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

U - disassemble command not outputting line breaks on Win xp

I downloaded http://www.smrcc.org.uk/members/g4ugm/snapshots/Hercules-R3.08-W32-RC3.ZIP.

The U) disassemble command is not outputting line breaks for a u

.range, e.g. u 2000.20

I'm Win xp sp3.

Mike

typo in URL in man page

The hercules homepage URL in the manpage reads http://www.hercules-s390.eu instead of http://www.hercules-390.eu

The following pull requests fixes the typo

#88

MVS 3.8J intermittent hangs during IPL under VM/370 with ECPS:VM active

The following bug issue and text is reproduced from Hyperion issue 188 so that this bug may be documented and fixed in Spinhawk.

MVS will occasionally hang after message IGF992I when IPLed in a virtual machine with ECPS:VM active. The hang is very intermittent and perhaps occurs once in 10 tries.

This issue is caused by an incorrect address in ecpsvm.h of the CP field named APSTAT2 in PSA. The correct address of APSTAT2 is X'69B'. ecpsvm.h currently has the address as X'69D'.

Correcting the improper address in the define for APSTAT2 resolves the problems with MVS. This field is used by ECPS to indicate to CP when it is necessary to purge the TLB, as a well as other flag bits in that byte. In addition, the improper address was causing bits to be improperly set in a different byte in PSA, which could cause other unpredictable problems when using ECPS:VM.

Incorrect code in ecpsvm.h:

  /* PSA + 69D : APSTAT2 - Machine check recov & PTLB Required */
  #define APSTAT2 0x69D

Corrected code:

  /* PSA + 69B : APSTAT2 - Machine check recov & PTLB Required */
  #define APSTAT2 0x69B

Cross compilation not possible

Currently, cross compiling is not possible.
Usually, autoconf projects can be cross compiled by giving the configure option --target=....,
but this does not work with spinhawk code, as the configure tries to execute temporary compiles, which is not possible with cross compiles.
It seems that the autoconf files needs update.

Concrete:

use AC_CANONICAL_SYSTEM instead of AC_CANONICAL_HOST in configure.ac
(AC_CANONICAL_SYSTEM calls AC_CANONICAL_HOST internally).
This will solve most of the issues.
check the usage of host_os/host_cpu/host_vendor, as most probabely in reality target_os/target_cpu/target_vendor should be used

configure hangs

configure stops at this:

checking whether getopt wrapper kludge is necessary...

This is Ubuntu 13, using gcc 4.7.3. Please let me know if you need any more information.

Building 3.13 on Linux - where is autogen.sh?

I downloaded hercules-3.13.tar.gz from hercules-390.eu to build on Linux which is not my usual platform. My fading memory and the installation guide at hercules-390.eu/hercinst.html suggest I should execute ./autogen.sh but I cannot find autogen.sh in 3.13 while it is present in 3.12 I hunted around for any documentation that might alert to a change in the installation procedure but I can't find any either. Am I missing something?

Can I get away with going straight to ./configure ?

(I have successfully built 3.13 on VMS some time ago but I use my own build procedure for that)

ctcadpt.c:2240: bad test ?

ctcadpt.c:2240:20: warning: logical 'and' of mutually exclusive tests is
always false [-Wlogical-op]

Source code is

if( (argc < 3) && (argc > 5) )

Maybe better code:

if( (argc < 3) || (argc > 5) )

Program received signal SIGSEGV, Segmentation fault.

When redirecting stdin, Hercules crashes in panel.c.

0xb7ca9db3 in lines_scrolled () at ../hercules-3.11/panel.c:512
512         if (topmsg->msgnum <= curmsg->msgnum)
(gdb) where
#0  0xb7ca9db3 in lines_scrolled () at ../hercules-3.11/panel.c:512
#1  visible_lines () at ../hercules-3.11/panel.c:519
#2  is_currline_visible () at ../hercules-3.11/panel.c:524
#3  0xb7c87936 in do_panel_command (cmd=cmd@entry=0xb7fac7c0 <cmdline>) at ../hercules-3.11/panel.c:669
#4  0xb7caf140 in panel_display_r () at ../hercules-3.11/panel.c:2522
#5  0xb7ca3e28 in impl (argc=argc@entry=1, argv=argv@entry=0xbffff5f4) at ../hercules-3.11/impl.c:632
#6  0x08049197 in main (ac=1, av=0xbffff5f4) at ../hercules-3.11/bootstrap.c:32
(gdb) print topmsg 
$1 = (PANMSG *) 0x0
(gdb) print curmsg 
$2 = (PANMSG *) 0x0
(gdb) up 4
#4  0xb7caf140 in panel_display_r () at ../hercules-3.11/panel.c:2522
2522                                    do_panel_command(cmdline);
(gdb) print cmdline
$3 = "r 0", '\000' <repeats 253 times>
(gdb) print msgbuf 
$4 = (PANMSG *) 0xb726c008
(gdb) print wrapped
$5 = 0
(gdb) print numkept 
$6 = 0
(gdb) print sysblk .panel_init 
$7 = 1

32-bit Ubuntu 14.04

Hyperion also (still?) has this bug when not run in daemon mode. I haven't been able to figure out where it is. Hyperion is about to bypass logger and panel when run in daemon mode. This works perfectly with all kinds of redirections.

I've looked for uses of isatty(), but none seem to be related to stdin. Could this be a race between the user and the buffered messages from startup?

Herc 3.08 repeatedly issues HHCLC042E Port 00: Read error: Bad file descriptor

.. For a hardware network error (disconnected device etc).

Whereas the dev version (sandhawk) issues the below messages: only once :

Some kind of option would probably be best ?

(I'm not sure if there is an option to govern the message frequency. I looked at MSGLEVEL, but it doesn't seem ? to be it).

HHC00140E Invalid net device name
HHC00140E Invalid net device name
HHC00140E Invalid net device name
HHC00140E Invalid net device name

HHC00944E CTC: lcs device read error from port 00: Bad file descriptor
HHC00941E CTC: ioctl SIOCGIFHWADDR failed for device : Bad file descriptor

Dynamic loader, instructions do not get actually loaded.

I'm currently testing J. Winter dyn75 (75/TCPIP instruction for MVS3.8j) under hercules-3.13.

On both the github master and the reference hercules-3.13.tar.gz distribution, new dynamically loadable instructions get compiled and linked correctly and loaded into the optable, as "lsmod" show, with the "ldmod" command.

Unfortunately, although correctly loaded, they never get actually executed, as a simple test with the reference TESTINS.C/BARF instruction show.

I did some debugging and I found that the function "set_opcode_pointers()", in cpu.c, is called ONLY at emulator initialization time, from cpu.c:1989 and never called again.

So the set_opcode_pointers() in opcode.c is not called to initialize the "regs->s370_opcode_table" runtime real table AFTER the "ldmod" command load a new instruction and the instruction is NEVER actually seen at execution time.

A simple change in ipl.c:

`@@ -408,6 +408,9 @@ int i; /* Array subscript */
ARCH_DEP(store_int_timer_nolock) (regs);
#endif

/* Initialize opcode table pointers */
set_opcode_pointers (regs);
if(regs->host && regs->guestregs)
{
ARCH_DEP(cpu_reset)(regs->guestregs);`

allow to reload the reg->opcode_table just BEFORE an IPL (which seems the right, safe time to perform this action) and with this simple patch dynamically loadable instructions get executed correctly.

I now have the J. Winter dyn75/tcpip instructions correcly loaded (with "ldmod dyn75") and executed from MVS3.8J (personal SYSGEN), as a test with the TK4- J.Winter FTPD show.

Also the TESTINS.C begin to work correctly after applying this patch to ipl.c.

Regards, Orfheo (AKA Peppe/G. Vitillaro on hercules main group).

commadpt.c: BSC mode. Should stop writing after ETX/ETB

The IBM 2703 Transmission Control Component Description document specifies on
page 58 that a channel write shall stop if an ETX or ETB control character
is detected in the data stream.

It is also mentioned on page 48 in the same document that bytes following the ETX or ETB shall
not be transferred to the line if they are received from the channel.

This fix detects the presence of these characters and breaks out of the loop
if these occur in the data stream.

I have made a PR for this:
#93

PATCH: External GUI individual CPU Percent Utilization reporting

This patch allows external GUIs to display the percent utilization individually for each processor engine.

http://hercgui.pedroramos-si.com/downloads/3.10-gui-cpupct.patch

ECPS:VM assist DISP2 exiting back to CP incorrectly

The following bug issue and text is reproduced from Hyperion issue 192 so that this bug may be documented and fixed in Spinhawk.

The ECPS supported dispatcher assist DISP2 is incorrectly returning to CP rather than completing the dispatch of the run user. This condition occurs with virtual machines that themselves have virtual storage and therefore require shadow page tables. The DISP2 assist has code that checks for the presence of shadow tables and is supposed to exit back to CP if those shadow tables need to be invalidated.

However, that is not what is happening. DISP2 is exiting if shadow tables are present, regardless of whether they need to be invalidated. This means that DISP2 will never complete its assist for guest operating systems in a virtual machine.

The assist contains this statement:

if(B_VMESTAT & (VMINVPAG | VMSHADT))

The intention here is to execute the if-block when shadow tables are present AND they need to be invalidated. If they do not need to be invalidated the if-block should fall through. The compiler (Visual Studio 2012) is generating the following two lines of code:
TEST reg,81h
JE ..exit-the-if-block

The 81h is the combined value of the two flags VMINVPAG and VMSHADT. On an x86/x64 platform the TEST instruction works by ANDing the specified bits from the register and setting flags based on the result. If the resulting byte is all zeros, the zero flag (ZF) is set. The JE instruction takes the branch if ZF is set. A common value for B_VMESTAT during debugging is x'90'. Applying the TEST rules, the resulting byte is non-zero, and so the JE instruction will not branch thereby executing the if-block content. This is exactly the opposite of what the original intention is! And as a result DISP2 exits to CP (that's what is in the if-block).

This issue can be corrected by simply reworking the if-statement:

 if((B_VMESTAT & (VMINVPAG | VMSHADT)) == (VMINVPAG|VMSHADT))

which causes the compiler to generate proper code to analyze the correct bits and branch accordingly.

Halt Device -- wrong condition code

When using a "Teletype" (ASCII terminal) to log in to VM/370, every console Prepare command ends with an attention signal. Example:

VM/370 ONLINE
?!

.login maint
!
ENTER PASSWORD:
.cpcmsSSS

!
DASD 190 LINKED R/W; R/O BY OPERATOR
DASD 194 LINKED R/W; R/O BY OPERATOR
LOGON AT 20:36:41 GMT FRIDAY 07/10/15
!
.q n

!
OPERATOR - 01F, MAINT    - 0F1
!
.i cms

!
RELEASE 6 CMS 12/25/78
!
.

!
.#cp log

!
CONNECT= 00:36:35 VIRTCPU= 000:00.02 TOTCPU= 000:00.22
!
LOGOFF AT 21:13:16 GMT FRIDAY 07/10/15

There are artifacts, here, from the implementation of commadpt.c but, besides that, VM prints an exclamation mark in response to Attention. There are a lot of exclamation marks there. IPL CMS never gets to the Ready message, because the VM read is retried over and over. I used #CP to break out.

Prepare marks time when there is nothing else to do. If the terminal user presses Break, this is Attention; Prepare completes with CE+DE. Before starting a new console task, VM uses Halt Device to stop the Prepare. After that, Prepare is assumed to have completed without Attention.

The problem is that Halt Device signals, via dev->halt_device(dev), for the device to stop, then sets Condition Code 0, which indicates failure. Normal, successful completion should be CC 1, with zeroes in the CSW status field. Because Halt Device failed, VM (DMKCNS) takes the ensuing I/O interrupt as an Attention signal.

Quoting from the S/370 PoP (09/74, p.202):

"Condition Code 0 indicates that HALT DEVICE cannot signal the control unit until an interrupt condition on the same subchannel is cleared.

"Condition Code 1 with Control-Unit-Busy Status in the CSW indicates that HALT DEVICE cannot signal the control unit until the control-unit-end status is received from that control unit.

"Condition Code 1 with Zeroes in the Status Field of the CSW indicates that the addressed device was selected and signaled to terminate the current operation, if any.

"Condition Code 2 indicates that the control unit cannot be signaled until the end of a busy condition in the channel. The end of the busy condition can be detected by noting an interruption from the channel or by noting the results of repeatedly executing HALT DEVICE.

"Condition Code 3 indicates that manual intervention is required to allow HALT DEVICE to signal the control unit to terminate."

This can be cured in channel.c. Here is the diff:

361c361,364
<             cc=0;                               /* @ISW */

---
>             psa = (PSA_3XX*)( regs->mainstor + regs->PX );
>             psa->csw[4] = 0;    /*  Store partial CSW       */
>             psa->csw[5] = 0;
>             cc = 1;             /*  Set CC for CSW stored   */

After applying it, the VM login looks like this:

VM/370 ONLINE
?!

.l maint

ENTER PASSWORD:
.cpcmsSSS

DASD 190 LINKED R/W; R/O BY OPERATOR
DASD 194 LINKED R/W; R/O BY OPERATOR
LOGMSG - 22:05:10 GMT MONDAY 06/29/15
* WELCOME TO THE FIRST LEVEL 3330 SYSTEM.
LOGON AT 03:44:42 GMT SUNDAY 07/12/15
.i cms

RELEASE 6 CMS 12/25/78
.

R; T=0.01/0.01 03:44:48

.l dmkcns(d

FILENAME FILETYPE  FM  FORMAT    RECS BLOCKS   DATE   TIME
DMKCNS   ASSEMBLE  A1  F    80  2181   219   2/22/15 14:46
DMKCNS   UPDATES   A1  F    80    20     2   2/22/15 14:46
DMKCNS   UPDLOG    A1  F    97   129    16   2/22/15 14:46
R; T=0.01/0.01 03:44:59

.log hold

CONNECT= 00:00:44 VIRTCPU= 000:00.01 TOTCPU= 000:00.09
LOGOFF AT 03:45:25 GMT SUNDAY 07/12/15

VM/370 ONLINE

Typo in HHCCP027E error message in ipl.c

conneceted should read connected.

Fix for problems with commadpt when running on Windows

I don't run Windows myself but I am aware of long standing issues using the commadpt 2703 bisync adaptor emulation when running Hercules on Windows. While working on a different device emulator for Hercules, I asked someone who does run Hercules on Windows to test my code and we ran into similar issues.

Bob Polmanter and I came up with solutions to make my code work properly Windows and it it occurred to me that the same solutions would apply to the issues with commadpt when running on Windows.

The main problem is that when running on Windows, outgoing connections from commadpt result in messages like this one:

HHCCA001I ccuu:Connect out to a.b.c.d:p failed during initial status : A non-blocking socket operation could not be completed immediately.

Commadpt then goes on to assume the connection immediately failed when it actually may well succeed a short time later. Once that issue is overcome, commadpt can fail to be notice actual problems connecting. Both issues are because some Windows TCP/IP functions return different error values to those on other platforms in similar circumstances. For instance, Windows connect() returns the equivelant of EWOULDBLOCK when connect() on other platforms would return EINPROGRESS. Failures from connect() are also reported differently by select() on Windows compared to other platforms - they result in a bit being set in the exception file descriptor set, not in the write file descriptor set.

Bob and I have come up with a fix for commadpt which should address these issues with commadpt on Windows without any change to functionality on other platforms.

MVS wait code 00090064 during IPL under VM wih ECPS:VM active

The following bug issue and text is reproduced from Hyperion issue 193 so that this bug may be documented and fixed in Spinhawk.

When attempting to IPL MVS 3.8J under VM/370 with ECPS:VM active, a disabled wait code PSW of 00020000 00090064 is sometimes issued by MVS. When it occurs, the wait code appears after responding (just pressing ENTER) to the MVS message IEA101A SPECIFY SYSTEM PARAMETERS.

According to OS/VS2 System Codes, wait 064 is issued because of a program check during nucleus initialization (the x'09' in the code specifically means program check), and that the program old PSW points to the instruction that failed. The problem is, the program old PSW contains a wait PSW: 070E0000 00000004. You cannot have a program check while in a wait state, so something is amiss here.

Skipping over the details of hours of research, debugging, single stepping and so forth I tracked the issue to the DISP2 assist of ECPS:VM. It turns out that DISP2 is dispatching the run user (MVS) even though the user's virtual PSW is in a wait. DISP2 dutifully builds the dispatch PSW by merging in the virtual instruction address with a standard CP dispatch PSW. Since the virtual PSW instruction address is 0, the resulting dispatch PSW is 070D0000 00000000. Then DISP2 then exits so that control can be given to the run user. MVS immediately program checks because the instruction address is 0. The value 070E0000 ends up in MVS's program old PSW because that's what the virtual PSW was.

The bottom line is that DISP2 should not be dispatching a user that is in virtual PSW wait. Moreover, there are dispatchability flags in the VMBLOK that indicates that a user should not be dispatched for a number of reasons, and one of them is VMPSWAIT (in byte VMRSTAT) which means the user is in virtual PSW wait. The assist code in DISP2 is not checking this flag.

But even if DISP2 did check this flag, it would not resolve the problem. It turns out that the flag is not set anyway. I have been unable to resolve how the user can have a virtual PSW with the wait bit set and not have the VMPSWAIT bit set.

Regardless, adding a check in the DISP2 code to see if the wait bit is set in the virtual PSW is the likely solution. This will cause the user to be skipped and another runnable user to be selected or the machine idled. The solution is this code snippet:

if(EVM_LH(vmb+VMPSW) & 0x0002)
{
    	DEBUG_CPASSISTX(DISP2,logmsg("DISP2 : VMB @ %6.6X Not eligible : User in virtual PSW wait\n",vmb));
	continue;
}

This new code should be located immediately after this line in ecpsvm_do_disp2( ):

      for(vmb=EVM_L(FW1);vmb!=FW1;vmb=EVM_L(vmb))
 	  {

My justification for this solution is based on these points:
• There is no case where a user in a virtual wait state should be dispatched.
• While I cannot explain the reason for the discrepancy between the VMPSWAIT dispatchability flag and the wait bit in the virtual PSW, the rules throughout the ECPS code logic say: when in doubt about something, let CP handle it. The new code does exactly that. This is a dispatch case that cannot be reconciled, so let CP deal with it.
• I do think there is a problem somewhere that allows this discrepancy to occur but I have been unable to find it. Nevertheless, the solution code does resolve the issue. Thus, the safest course when something isn't right is to turn it over to CP.

The problem of the MVS wait 064 is resolved after implementing the solution above.

Build errrors - decNumber.c

I'm trying to build on

Linux ip-172-31-16-159 3.13.0-29-generic #53-Ubuntu SMP

gcc (Ubuntu 4.8.2-19ubuntu1

decNumber.c: In function 'decSetSubnormal':
decNumber.c:5947:14: warning: variable 'dnexp' set but not used [-Wunused-but-set-variable]
Int dnexp; // saves original exponent
^

mv: cannot stat ‘t-nl.gmo’: No such file or directory

** Warning: linker path does not have real file for library -lpthread.
*** using a file magic. Last file checked: /usr/lib/x86_64-linux-gnu//libpthread.so

*** Warning: linker path does not have real file for library -lrt.
using a file magic. Last file checked: /lib/x86_64-linux-gnu/librt-2.19.so

*** Warning: linker path does not have real file for library -lz.
*** using a file magic. Last file checked: /lib/x86_64-linux-gnu/libz.so.1.2.8

*** Warning: linker path does not have real file for library -lresolv.
*** using a file magic. Last file checked: /lib/x86_64-linux-gnu/libresolv-2.19.so

Warning: linker path does not have real file for library -lnsl.
*** using a file magic. Last file checked: /lib/x86_64-linux-gnu/libnsl-2.19.so

*** Warning: linker path does not have real file for library -lm.
using a file magic. Last file checked: /lib/x86_64-linux-gnu/libm-2.19.so

linker path does not have real file for library -ldl.
using a file magic. Last file checked: /lib/x86_64-linux-gnu/libdl-2.19.so

*** Since this library must not contain undefined symbols,
*** because either the platform does not support them or
*** it was explicitly requested with -no-undefined,
*** libtool will only create a static version of it.

Incorrect instruction trace operand data and "u" command disassembles incorrect virtual page

When tracing instructions or single stepping, incorrect instruction operand data can sometimes be displayed on the Hercules console. The data is incorrect because it is from the wrong virtual page. This is due to a stale page table entry being retrieved from a temporary translation lookaside buffer in translate_addr() due in turn to regs->tlbID being set to zero which appears to not be valid. Before calling translate_addr(), a temporary copy is made of part of the regs structure by copy_regs() but tlbID is not copied (because it appears in the regs structure after regs_copy_end) and the temporary tlbID is not initialised.

The "u" command is similarly affected in that it can end up disassembling code from the wrong virtual page.

The "v" command is not affected as it bypasses the temporary TLB.

Data used for actual instruction execution is not affected because regs->tlbID is never zero when this is being fetched.

S/370 mode and S/390 mode are both affected. I have not tested in Z/Arch mode.

The bug is elusive and difficult to provoke. It appears to only affects pages which have been remapped at least once. Fixed pages in an operating system kernel for example are not affected.

One possible fix appears to be to use ARCH_DEP(purge_tlb) to clear the new temporary TLBs in copy_regs() instead of memset(). Another possible fix is to always bypass use of the temporary TLB for data displayed on the Hercules console in the same way the "v" command does by calling virt_to_abs(), display_virt() etc with ACCTYPE_LRA instead of ACCTYPE_READ or ACCTYPE_INSTFETCH.

Some further thought may be required about the involvement of TLBs in fetching data to be displayed on the Hercules console.

Bugs in cipher message test cases

Taking the km* test cases to real iron (z12) shows:

KM fc52 shows an incorrect expected result. Hardware gives

r 600.10
00000600 52F67E0A 657CDCE5 A26ADC57 422AC064

kmo fc 3 similar:

r 600.8
00000600 C8A06C9E FF17D58A

Several fc 3 tests show fc2 in their initial comment.

Illegal instruction in panel_display_r () from /usr/local/lib/libherc.so

Fresh OpenSUSE 13.2, newly built Hercules, unable to start because of illegal instruction:

(gdb) run
Starting program: /usr/local/bin/hercules
Got object file from memory but can't read symbols: File truncated.
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.19-16.9.1.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7fd6700 (LWP 5904)]
Hercules Version 3.11
(c)Copyright 1999-2010 by Roger Bowler, Jan Jaeger, and others
Built on Apr 20 2015 at 17:41:57
Build information:
Modes: S/370 ESA/390 z/Arch
Max CPU Engines: 8
Using setresuid() for setting privileges
Dynamic loading support
Using shared libraries
HTTP Server support
No CCKD BZIP2 support
No ZLIB support
Regular Expressions support
Automatic Operator support
No HET BZIP2 support
Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8
Running on maywood-lx-0002 Linux-3.16.7-7-desktop.#1 SMP PREEMPT Wed Dec 17 18:00:44 UTC 2014 (762f x86_64 MP=4
HHCHD018I Loadable module directory is /usr/local/lib/hercules
Crypto module loaded (c) Copyright Bernard van der Helm, 2003-2010
Active: Message Security Assist
Message Security Assist Extension 1
Message Security Assist Extension 2
Message Security Assist Extension 3
Message Security Assist Extension 4
[New Thread 0x7ffff560f700 (LWP 5905)]
HHCHT001I HTTP listener thread started: tid=7FFFF560F700, pid=5900
HHCHT013I Using HTTPROOT directory "/usr/local/share/hercules/"
HHCHT006I Waiting for HTTP requests on port 8081
HHCCF065I Hercules: tid=7FFFF7FD7740, pid=5900, pgid=5900, priority=0
[New Thread 0x7ffff4aee700 (LWP 5906)]
HHCTE001I Console connection thread started: tid=7FFFF4AEE700, pid=5900
HHCTE003I Waiting for console connection on port 3270
[New Thread 0x7ffff49ed700 (LWP 5907)]
[New Thread 0x7ffff48ec700 (LWP 5908)]
HHCCP002I CPU0000 thread started: tid=7FFFF49ED700, pid=5900, priority=15
HHCTT002I Timer thread started: tid=7FFFF48EC700, pid=5900, priority=0
HHCCP003I CPU0000 architecture mode ESA/390
[New Thread 0x7ffff47eb700 (LWP 5909)]
[New Thread 0x7ffff46ea700 (LWP 5910)]

Program received signal SIGILL, Illegal instruction.
0x00007ffff78c9196 in panel_display_r () from /usr/local/lib/libherc.so
(gdb) bt full
#0 0x00007ffff78c9196 in panel_display_r () from /usr/local/lib/libherc.so
No symbol table info available.
#1 0x00007ffff78bc4c5 in impl () from /usr/local/lib/libherc.so

No symbol table info available.
#2 0x0000000000401992 in main ()

No symbol table info available.

TZOFFSET 0000 not accepted

TZOFFSET 0000 gives :

HHCCF023S Error in hercules.cnf line 7: 0000 is not a valid timezone offset

Either TZOFFSET +0000 or -0000 do work

V3.13 Does not replace strings in *.rc files

Hercules V3.13 does not replace strings in *.rc files. I hope this is a bug and not an intended feature.

Attached are three test files that work to demonstrate the problem under Windows in the TK4- setup. The file "test.bat" asks you to select a Hercules version, sets the path to that version of Hercules and a unique log file name component and then runs the selected version of Hercules using the test.cnf file after setting HERCULES_RC to point to the "test.rc" file. Hercules output is directed to the log dataset using the unique name component. "test.bat" should be placed in the TK4- root directory (where the "mvs.bat" file lives). "test.cnf" goes into the TK4\conf directory and "rest.rc" into the TK4\scripts directory.

The attached test.zip file contains the "test/bat", test.cnf" and "test.rc" files.

I have attached the log file results for the tests that I ran for four different versions of Hercules:

TK4- Hyperion version
Hyperion 4.0.0 release
SDL Hyperion version 4.1.0.9426
Spinhawk version 3.13

Peter Farley
Brooklyn, NY

test.h4x.log
test.h40.log
test.s3.log
test.tk4.log
test.zip

dasdls writes on Linux to stdin, does 'write(0,....)'

I use dasdls under Linux. First I saw that two lines of the dasdls output can't be redirected to a file, they always show up in the controlling terminal. Only with unbuffer(1) one gets the whole output redirected into a file. The lines are

  HHCDA004I opening mvsres.148 readonly
  HHCDA020I mvsres.148 cyls=560 heads=30 tracks=16800 trklen=19456

Running dasdls under strace(1) gives

  write(2, "Hercules DASD list program Versi"..., 
  write(2, "(c)Copyright 1999-2015 by Roger "..., 
  write(0, "HHCDA004I opening mvsres.148 rea"..., 
  write(0, "HHCDA020I mvsres.148 cyls=560 he"...,
  write(1, "mvsres.148: VOLSER=MVSRES\nZ99999"...,

So there are writes to stdout and stderr, but also two writes to stdin. The issue can be reproduced with the current HEAD (commit 59f0556) and can be traced to

ckddasd.c#L357-L359 --> logmsg (_("HHCDA004I opening...
logmsg.c#L159-L160 --> log_write(0,bfr)
logmsg.c#L247-L268 --> log_write(int panel,char *msg)

The logic of log_write finally poduces somehow a write(0,...) in the context of dasdls.

The dasdls coming with tk4- exhibits the same issue, while the dasdls from hercules-390/hyperion does no writes to stdin.

What about x86_64 and x86 binaries on hercules-390.org?

Are these builds obsolete? Can you give SHA sums for the builds?

ECPS:VM causing 2nd level CP abends when IPLing guest MVS at 3rd level

The following bug issue and text is reproduced from Hyperion issue 189 so that this bug may be documented and fixed in Spinhawk.

The ECPS support is causing 2nd level CP (VM/370) to experience a PRG017 failure and restart moments after attempting to IPL MVS 3.8 at 3rd level. The PRG017 is caused by the SVC assist loading the SVC new PSW from MVS's page 0 instead of 2nd level CP's page 0. CP program checks because the new PSW loaded is not correct for its environment; the new PSW has DAT on as well as an invalid virtual instruction address as known to CP. This results in a page fault which CP cannot handle and results in the PRG017 abend. The solution is below; for more details please continue reading.

The SVC assist is loading the incorrect SVC new PSW because an LCTL 1 instruction issued shortly before by MVS was also assisted and caused MVS's CR1 content to be incorrectly placed into the CR1 contents of 2nd level CP. When the SVC assist code attempts to get the internal pointer to page 0 to fetch the new PSW it gets the internal pointer for MVS page 0 (because that's what CR1 points to) and therefore the wrong SVC new PSW.

All of this is occurring because of a missing check during instruction assist to see if the virtual PSW is in problem state. When instruction assist is operating, the real machine must be in problem state by definition of the assist. However, at second level (be it any CMS user or guest operating system) any execution of a privileged instruction must be issued in virtual supervisor state. No problem there. But for a guest operating system such as VM at second level, the 2nd level CP will dispatch 3rd level users (CMS or other guest systems) in virtual problem state so that 2nd level CP can field instruction simulation requests for priv ops. Because the ECPS is not checking for virtual problem state in the virtual PSW, ECPS tries to assist the instruction. This results in registers and/or control registers being updated for 2nd level CP even though a third level machine issued the priv op.

This is difficult to explain easily but suffice to say that the assist is designed to help the real machine and 1st level CP by avoiding simulating the supported privileged instructions by 2nd level users. The assist is not supposed to help 3rd level or higher users; these must be fielded by the operating system at the next lowest level.

This issue is corrected by adding a quick check to see if the 2nd level user is in virtual problem state. If so, return control to CP and do not do the assist. This check below is added to SASSIST_PROLOG in ecpsvm.c :

  /* 2017-01-24 Reject if Virtual PSW is in problem state / 
  / All instruction assists should be rejected if VPSW is in problem state / 
  / and be reflected back to CP for handling. This affects 2nd level VM with / 
  / 3rd level guests. / 
  if(CR6 & ECPSVM_CR6_VIRTPROB) 
  { 
  DEBUG_SASSISTX(_instname,WRMSG(HHC90000, "D", "SASSIST "#_instname" reject : Virtual problem state\n")); 
  return(1); 
  } 
  /* End of 2017-01-24 */ \

At present, ecpsvm.c does contain a virtual problem state check for the LPSW and SSM assists. But the other assisted instructions are not checked. This fix adds the check for all instructions by adding it to the prolog code and removes the individual checks in the LPSW and SSM assist code.

Issues with 3330 disks?

I've tried attaching a 3330 device containing an install of TSS to several instances both running and non-running, regardless of device number or current state the machine will eventually get stuck in a WAIT state and respond to nothing.

This has been reproduced on PowerPC (OS X) only.

http://gewt.net/tssres.250 is the offending disk. Let me know if I'm just nuts or if I found a bug.

rbowler / spinhawk Goto Github PK

spinhawk's People

Contributors

Stargazers

Watchers

Forkers

spinhawk's Issues

Device KeyLen DataLen(dec) Records Written Expected (allowed) Number of Records

-------- ------ ------- --------------- ------------------------------------

Environments

Steps to Reproduce

Expected Result

Actual Result

mv: cannot stat ‘t-nl.gmo’: No such file or directory

Recommend Projects

Recommend Topics

Recommend Org