Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Facility for z/Architecture #650

Open
Fish-Git opened this issue Apr 9, 2024 · 97 comments
Open

Vector Facility for z/Architecture #650

Fish-Git opened this issue Apr 9, 2024 · 97 comments
Labels
Discussion Developers are invited to discuss a design change or solution to a coding problem. Enhancement This issue does not describe a problem but rather describes a suggested change or improvement. Ongoing Issue is long-term. Variant of IN PROGRESS: it's being worked on but maybe not at this exact moment.

Comments

@Fish-Git
Copy link
Member

Fish-Git commented Apr 9, 2024

This issue was created for discussing development of the z/Architecture Vector Facility.

All discussion regarding this effort should take place HERE, in THIS GitHub Issue, and not in Issue #77, which is a generic GitHub Issue regarding all yet-to-be-developed z/Architecture facilities.

Please refrain from discussing z/Architecture Vector Facility development anywhere else, and discuss it here instead.

Thank you.

@Fish-Git Fish-Git added Enhancement This issue does not describe a problem but rather describes a suggested change or improvement. Discussion Developers are invited to discuss a design change or solution to a coding problem. Ongoing Issue is long-term. Variant of IN PROGRESS: it's being worked on but maybe not at this exact moment. labels Apr 9, 2024
@Fish-Git
Copy link
Member Author

Fish-Git commented Apr 9, 2024

My first approach was to use another area for VR and REFRESH/UPDATE from/to AFPR at every use. Then @Fish-Git proposed the shared area in POC SWAP128 (in this thread).

My development of instructions for zVector, involving bigendian storage, and the REFRESH/UPDATE mechanism. So I will absolutely have to change it.

Not necessarily! What I would like to do is determine which way -- yours or mine -- is more efficient. Thus I would be inclined to leave your current implementation as-is for the time being.

Implementing my shared registers proposal might be less efficient. I don't know. Maybe. Maybe not. I proposed it only because I thought (believed) it would be more efficient, but that remains to be seen!

I would prefer to see BOTH designs implemented (controlled via a temporary #define build option) so that we could then compare the performance of each one. It might well be that your current design is more efficient! I don't know! It might be. It might not be. It remains to be seen.

The idea here is I don't want to paint ourselves into a corner. I don't want to commit to one technique or the other until we know which one is best.

@Fish-Git
Copy link
Member Author

Fish-Git commented Apr 9, 2024

... and I don't have access to a real mainframe to test some complex instructions.

That's something else we will need to eventually do too: verify the correctness of each implemented instruction on real hardware. I seem to recall that one of our developers (I forget who) has access to a real mainframe. We will eventually need to test/debug our implementation on a real machine.

Then, once verified, we will of course ALSO need to develop a QA (Quality Assurance) runtime test ("runtest" make check test) as well, to ensure any future changes don't break our implementation.

So there is definitely enough "meat" in this project for multiple people to bite off their own piece of it. The more people we have contributing the greater the chance of our succeeding in our effort.

@salva-rczero
Copy link
Contributor

salva-rczero commented Apr 10, 2024

@Fish-Git I agree with the double implementation design.

@mcisho

Is this your understanding too?

Yes, this is how it was proposed, for byte size instructions it is not necessary or some logical operations (AND/OR), moves.... But in general you always have to be doing: BIG -> LIT -> BIG.

The FP registers contents in the regs structure are currently kept in the endianness of the host. If vector registers must be kept as big endian, then fp registers will also have to be kept as big endian. Which will have an impact on the design and usage of the shared area for vector/fp registers.

I think FP must continue with the current behaviour with regard to endianness. It will be easier to adapt zVector to the double treatment proposed by Fish.

p.s. Is you name Salva, am I addressing you correctly?

Yes! it's a common short name for Salvador.

Ian, tell me what you think!

Regards, salva.

@mcisho
Copy link
Contributor

mcisho commented Apr 11, 2024

Can you please have a look at the attached proposal of the Hercules changes for shared zVector/FP registers. All comments. suggestions, etc are welcome.

@Fish-Git
Copy link
Member Author

Can you please have a look at the attached proposal of the Hercules changes for shared zVector/FP registers. All comments. suggestions, etc are welcome.

Nice! I like it!

@mcisho
Copy link
Contributor

mcisho commented Apr 12, 2024

Can you please have a look at the revised attached proposal of the Hercules changes for shared zVector/FP registers. I had forgotten we would need to move data between the instruction processors variables and the zVector registers preserving host endianness. Again, all comments. suggestions, etc are welcome:

p.s. Fish, how did you add the bullet point before the link? I can't see it in the Github formatting syntax.

@Peter-J-Jansen
Copy link
Collaborator

Peter-J-Jansen commented Apr 12, 2024

I too am interested to participate in the Vector Facility and have read the proposal text with interest. So far I only have some probably very basic questions which I'm seeking an answer to:

  1. Do I assume correctly that the current FPR registers are only ever used for floating point numbers, but that when overlaid with the VR registers they will, for certain VR instructions, also contain integers, i.e. non-floating point numbers?

  2. Can anyone with more historical Hercules information perhaps offer some insight as to why the FPR instructions were implemented using the "softfloat" external package vs. using the host's IEEE 754 floating point support like available on e.g. X86-64 and ARM?

  3. Is it the intention to keep using "softfloat" also for the VR instructions (instead of the host's IEEE 754 floating point support)?

  4. If the answer to question #3 above is no, would there be any hope of using the host's SIMD instructions to implement (at least some of) the IBM VR instructions?

Thanks !

Cheers,

Peter

P.S.: I'll be off-line next week.

@salva-rczero
Copy link
Contributor

salva-rczero commented Apr 12, 2024

@mcisho:   Can you explain to me why this line for LITTLE-ENDIAN?

int iv = ~(_v) & 0x1f;

@mcisho
Copy link
Contributor

mcisho commented Apr 12, 2024

@salva-rczero:  Whoops, confusion on my part, taking little endian way too far! You are quite right, the register number does not need to be flipped. Well spotted.

@Peter-J-Jansen:

  1. At any one instant the a 128-bit VR/FPR will contain either 128-bits of vector register data, or 64-bits of floating point data and 64-bits of unpredictable data. If an FPR instruction was the last thing to place a value into the VR/FPR, the VR/FPR will contain a floating point value. If a VR instruction was the last thing to place a value into the VR/FPR, the VR/FPR might contain a string, an integer value, a decimal value, or even a floating point value. I'm not clear whether the vector register elements have to be the same type, or even size.

  2. I don't know.

  3. I simply intend to change those statements that use regs->fpr to use regs->FPR_x, i.e Hercules will still use "softfloat".

  4. I don't know what @salva-rczero plans for the future, or whether the hosts SIMD is part of those plans.

@mcisho
Copy link
Contributor

mcisho commented Apr 12, 2024

Can you please have a look at the revised attached proposal of the Hercules changes for shared zVector/FP registers, with the corrections for the errors pointed out by @salva-rczero. Yet again, all comments. suggestions, etc are welcome.

p.s. Fish, how did you add the bullet point before the link? I can't see it in the Github formatting syntax.

@salva-rczero
Copy link
Contributor

salva-rczero commented Apr 12, 2024

@mcisho While I appreciate your effort, I really don't understand the need for all these macros.
Currently the working code only need:

#define VR_B(_v,_i)     regs->vr[(_v)].B[(_i)]
#define VR_H(_v,_i)     regs->vr[(_v)].H[(_i)]
#define VR_F(_v,_i)     regs->vr[(_v)].F[(_i)]
#define VR_G(_v,_i)     regs->vr[(_v)].G[(_i)]

We would only need to add a lit-endian mode:

#define VR_B(_v,_i)     regs->vr[(_v)].B[(15-_i)]
#define VR_H(_v,_i)     regs->vr[(_v)].H[(7-_i)]
#define VR_F(_v,_i)     regs->vr[(_v)].F[(3-_i)]
#define VR_G(_v,_i)     regs->vr[(_v)].G[(1-_i)]

@salva-rczero
Copy link
Contributor

@Peter-J-Jansen The first goal is to get it working, but yes, I have thought about using x86 SIMD for performance. In fact, a couple of Galois arithmetic instructions already use it.

@Fish-Git
Copy link
Member Author

Fish-Git commented Apr 12, 2024

p.s. Fish, how did you add the bullet point before the link? I can't see it in the Github formatting syntax.

Asterisk or dash (minus sign) followed by a blank, which is the markdown code for an unordered list:

  • one,
  • two,
  • buckle...
    • ...my...
      • ...shoe.

@Fish-Git
Copy link
Member Author

Fish-Git commented Apr 12, 2024

2. Can anyone with more historical Hercules information perhaps offer some insight as to why the FPR instructions were implemented using the "SoftFloat" external package vs. using the host's IEEE 754 floating point support like available on e.g. X86-64 and ARM?

I believe Steve Orso (@srorso) would probably be the best person to answer this question, but as I recall, it was basically because of 2 things:

  1. A compiler's IEEE 754 floating point instruction/hardware support did not behave the same way as what the architecture (Principles of Operation) required out-of-the-box. I believe it had mostly to do with rounding modes. In order to use the host CPU's IEEE 754 floating point hardware/instructions support, you would have to set the proper rounding mode beforehand, which might prove to be tricky.

    (Usually you define your desired rounding mode as a compiler option and the compiler uses/presumes that rounding mode for all instruction sequences that it generates. To change the rounding mode dynamically (at run time), one would have to insert hardware instructions to change the desired default rounding mode beforehand, which, as I said, might prove to be tricky when the compiler is the one deciding which FP instruction sequences to generate and in which order they are to be executed.)

  2. I also seem to recall that IBM's Principles of Operation also defined several new non-standard rounding modes as well. That is to say, the formal specification of how IEEE 754 floating point was to behave (with respect to its defined rounding modes) either differed slightly from IBM's definition, and/or IBM defined in their architecture several new rounding modes that were not formally defined in the official IEEE 754 floating point specification.

    So, in order to support those new and/or different rounding modes, SoftFloat would be need to be used anyway, so why not just keep it simple and use SoftFloat for everything?

But those are just guesses. The truth is, I don't remember what the real reaso(s) was/were. Ask Steve. He might remember the details better than me since I believe he did a lot of work on our SoftFloat code.

@mcisho
Copy link
Contributor

mcisho commented Apr 13, 2024

@mcisho While I appreciate your effort, I really don't understand the need for all these macros.

I proposed the macros as an aid for endianness, but if you think they are superfluous that's fine, I'll forget about them.

The most important thing is we all agree on how the shared VR/FPR are defined in REGS.

@mcisho
Copy link
Contributor

mcisho commented Apr 14, 2024

Can you please have a look at the fourth and hopefully final revision of the proposal of the Hercules changes for shared zVector/FP registers. The superfluous stuff has been removed, and the suggestions from @salva-rczero have been incorporated. As always, all comments. suggestions, etc are welcome.

@mcisho
Copy link
Contributor

mcisho commented Apr 23, 2024

As it appears that no one disagrees with the proposal I will proceed. In the next few days I will branch the SDL-Hercules-390 hyperion develop branch into a branch named sharedvfp, where the changes to the floating-point instructions will be implemented.

The z/Architecture Principles of Operation says:

"Whenever a floating-point instruction or floating point support instruction writes to a floating point register, or a floating point instruction that reads a register pair reads from floating-point registers, bits 64-127 of the corresponding vector register are unpredictable.".

However, in a March 2015 presentation to SHARE titled "z13 Vector Extension Facility (SIMD)", IBM said:

"Be very aware that any use of a FPR will change all 16 bytes of the corresponding VR (this includes even LD)".

Empirical evidence from instructions executed on a z15 shows that use of a FPR changes bits 64-127 of the corresponding VR to zero.

So should Hercules set bits 64-127 of the corresponding VR to zero, or leave them unchanged (i.e. unpredictable), when an instruction writes to a FPR? Leaving the bits unchanged is simpler and less prone to coding error, but Hercules wouldn't be emulating the actions of real machines (or at least the machines to date).

@salva-rczero
Copy link
Contributor

salva-rczero commented Apr 23, 2024

@mcisho Great!

As soon as you make the branch and push the changes to esa390.h and structs.h, I'll start changes to zvector.c for endianness independence.

On 64-127 bits, I would prefer to leave them unchanged. IMHO, Hercules should mimic z/Arch not real machines.

Regards, salva.

@Fish-Git
Copy link
Member Author

Fish-Git commented Apr 23, 2024

So should Hercules set bits 64-127 of the corresponding VR to zero, or leave them unchanged (i.e. unpredictable), when an instruction writes to a FPR? Leaving the bits unchanged is simpler and less prone to coding error, but Hercules wouldn't be emulating the actions of real machines (or at least the machines to date).

I agree 100% with Salva. Hercules does not -- and indeed IMHO should not -- try to emulate any particular model of mainframe, whether manufactured by IBM or anyone else. It's sole responsibility is to only try to accurately emulate the published mainframe architecture as defined in the Principles of Operation.

The behavior of mainframes varies from model or model. The behavior of the architecture does not.

Stick to the architecture.

@mcisho
Copy link
Contributor

mcisho commented Apr 24, 2024

The sharedvfp branch has been created, and the esa390.h and hstructs.h changes have been pushed.

Please note that the REGS structure still contains the old U32 fpr[32] variable. It will be removed when the numerous references to it have all been changed to the new shared QW vfp[32] variable.

@salva-rczero
Copy link
Contributor

I've just created a pull request for the changes needed for vector instruccions (E7xx).

@salva-rczero
Copy link
Contributor

@Fish-Git Will you provide the changes for U128, vfetch16, vstore16... from swap128 or should I do it myself?

Thanks in advance.

@Fish-Git
Copy link
Member Author

Will you provide the changes for U128, vfetch16, vstore16... from swap128 or should I do it myself?

I will have to review my original implementation. What I originally coded might no longer be correct/appropriate for our current design. Maybe it is. Maybe it isn't. I don't know. I'll have to brush off the dust and take a look at it.

If you want to do it, please feel free to do so! You might actually be able to do it faster than me. Personal issues have been affecting my ability to contribute as of late. (Don't worry, it's nothing serious.)

@mcisho
Copy link
Contributor

mcisho commented Apr 30, 2024

The FP instructions using the shared zVR/FPR are complete, and the tests that we have pass. All of the changes have been committed to the sharedvpr branch.

@mcisho
Copy link
Contributor

mcisho commented May 5, 2024

After several days of testing I haven't discovered any problem with FP instructions using the shared VR/FPR. I would like to pull the FP changes into the develop branch, so that the changes can be exposed to a wider range of environments than I have available. Does anyone object, feel it's premature, etc?

@Fish-Git
Copy link
Member Author

Fish-Git commented May 6, 2024

Does anyone object, feel it's premature, etc?

No objection here! Sounds like a good plan to me!

@Fish-Git
Copy link
Member Author

How about splitting ieee.c into ieee.h+ieee.c and exposing the necessary types, macros and functions declarartions, and then include ieee.h from zvector.c ?

Yes, that IS the correct way.

Alternatively, the vector fp instructions could be moved from zvector.c to ieee.c? Might be simpler than splitting ieee.c?

That would work too.

@Fish-Git
Copy link
Member Author

As an aside the following instructions probably should have the 64_bit removed from their function names.

Agreed.

@mcisho
Copy link
Contributor

mcisho commented May 12, 2024

I have attached my changes to ieee.c and zvector.c so that you can see what I have done so far, and discuss/decide whether we should continue on this path? The changes to ieee.c add the vector fp instructions and implement some of them, the changes to zvector.c remove the vector fp instructions and add some comments re where they can be found.

@Fish-Git
Copy link
Member Author

QUICK QUESTION:

Is the sharedvfp branch obsolete now? That is to say, is all current VFP development now being done in the normal develop branch now? Is the sharedvfp branch "finished"? Has the reason (purpose) for its creation been completed now? I just need some clarity on this. Thanks!

@Fish-Git
Copy link
Member Author

Fish-Git commented May 12, 2024

I have attached my changes to ieee.c and zvector.c so that you can see what I have done so far, and discuss/decide whether we should continue on this path?

Looks okay to me, Ian! And IMO yes, it seems to be a valid working path that we should probably continue on. I'm thinking the bulk of the Vector instructions should of course continue to be in zvector.c, with the few exceptions to the rule that deal with floating point moved into ieee.c just like you have them in your example .zip.

@mcisho
Copy link
Contributor

mcisho commented May 12, 2024

Is the sharedvfp branch obsolete now?

No. The develop branch doesn't have zVector support. If you want to try zVector you need to use the sharedvfp branch, and the latest commit of progress by @salva-rczero was to the sharedvfp branch.

@salva-rczero
Copy link
Contributor

QUICK QUESTION:

Is the sharedvfp branch obsolete now? That is to say, is all current VFP development now being done in the normal develop branch now? Is the sharedvfp branch "finished"? Has the reason (purpose) for its creation been completed now? I just need some clarity on this. Thanks!

For my part, I believe that my contribution to this project has come to an end. I have already warned that I do not have the necessary skills and I find everything related to the discussion/design very difficult. It is better to leave that task to those of you who know it.

Farewell and thank you very much for your time and advice (especially to @Fish-Git).

Good luck and long live to Hercules!

@JamesWekel
Copy link
Contributor

JamesWekel commented May 12, 2024

I'm working on the E6 z/vector instructions which has a lot of change to the infrastructure just as the E7 z/vector instructions did. My work is based on the sharedvfp branch. I'm hoping to be at a stable place late next week for a pull request for your review. It will need more review as ecpsvm.c implements E6 instructions for S370 which overlap with new E6 z/vector instructions.

The E6 instructions will be in zvector2.c. Rather than move vector decimal instructions to decimal.c, I was planning on changing some of the functions in decimal.c from static void to void with new function prototypes in opcode.h.

Do we have a consistent type definition for U128? For some instructions, I need to do 128 bit arithmetic.

Jim

@Fish-Git
Copy link
Member Author

I'm working on the E6 z/vector instructions ...

Thank you, James! I still say you should consider becoming an official Hercules developer. Your contributions over the past many months (past year?) have been invaluable.

Do we have a consistent type definition for U128?

AFAIK, type U128 does not exist in Hercules. gcc and clang both support the __int128 type, but unfortunately Microsoft's compiler still does not (even though people have been complaining about it for years now).  :(

@Fish-Git
Copy link
Member Author

For my part, I believe that my contribution to this project has come to an end. I have already warned that I do not have the necessary skills and I find everything related to the discussion/design very difficult. It is better to leave that task to those of you who know it.

We will miss you, Salva!  :(

Farewell and thank you very much for your time and advice (especially to @Fish-Git).

You are VERY welcome, Salva! We all thank you from the bottom of our hearts for all of the tremendous contributions you have made to Hercules! You are a true Herculean in my book! If you send me your full real name, I will be very happy to add you to our Herculeans list.

Good luck and long live to Hercules!

Abso-fricking-lutely!  :)))

@mcisho
Copy link
Contributor

mcisho commented May 13, 2024

For my part, I believe that my contribution to this project has come to an end.

That's a pity, I thought you were doing a great job.

... I find everything related to the discussion/design very difficult.

Don't worry, you're not alone there.

@JamesWekel
Copy link
Contributor

JamesWekel commented May 16, 2024

mcisho

As part of pull request [https://github.com//pull/661], I have enabled the following features in feat900.h:

#define FEATURE_134_ZVECTOR_PACK_DEC_FACILITY
#define FEATURE_135_ZVECTOR_ENH_FACILITY_1
#define FEATURE_148_VECTOR_ENH_FACILITY_2
#define FEATURE_152_VECT_PACKDEC_ENH_FACILITY
#define FEATURE_165_NNET_ASSIST_FACILITY
#define FEATURE_192_VECT_PACKDEC_ENH_2_FACILITY

as all/most of the E6 instructions are defined as part of or enhanced with these facilities. I suspect that is causing some of the windows build problems, as you are referencing FEATURE_135_ZVECTOR_ENH_FACILITY_1.

Hope I haven't caused too many problems, but I wanted to get the basics in for the E6 instructions to minimize merge conflicts.

Jim

@Fish-Git
Copy link
Member Author

FYI: James's changes to the sharedvfp branch have been merged.

@JamesWekel
Copy link
Contributor

The z/vector E6 instructions, for example VECTOR FP CONVERT TO NNP, reference NNP-Data-Type-1 Format. From z/Architecture Principles of Operation, SA22-7832-13, page 26-1 states:

Neural Network Processing Data

The NEURAL NETWORK PROCESSOR ASSIST
instruction, as well as the related convert instructions
described in this chapter, perform operations on
model-dependent data types.

NNP-Data-Type-1 Format

NNP-data-type-1 format represents a 16-bit signed
floating-point number in a proprietary format with a
range and precision tailored toward neural-network
processing. Other models may use other data formats.

But the NNP-data-type-1 format is not described. Does anyone have additional reference information on the format? The closest that I've found is a DLFLOAT presentation: https://pdfs.semanticscholar.org/5359/1b203af986668ca6586f80d30257d3ee52d7.pdf

Thanks,
Jim

@Fish-Git
Copy link
Member Author

Does anyone have additional reference information on the format?

I'm not aware of any, no. But then I haven't tried looking for it either.

The closest that I've found is a DLFLOAT presentation: https://pdfs.semanticscholar.org/5359/1b203af986668ca6586f80d30257d3ee52d7.pdf

THAT looks to me like that's probably it! Great find, James! I say go with it!

@JamesWekel
Copy link
Contributor

JamesWekel commented Jul 29, 2024

Fish,

I've coded initial versions of the five E6 vector "neural network processing assist" instructions (VCNF, ...). As part of this implementation, I use SoftFloat f32_to_f16 and f16_to_f32 routines. But when I do a 'make', I received:

 CCLD     hercules
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `f32_to_f16'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `f16_to_f32'

Whoa... Got to be my problem! Yep, the routines are in the source, the softfloat.h has prototypes... It took me quite a while to determine that the Hercules softfloat libraries only contain routines used by Hercules!

Why these routines? These vector instructions convert to/from Tiny (F16) Binary Floats.

I would appreciate if the softfloat libraries could be refreshed to include f32_to_f16 and f16_to_f32 routines.

Thanks,
Jim

@Fish-Git
Copy link
Member Author

Fish-Git commented Jul 29, 2024

I would appreciate if the softfloat libraries could be refreshed to include f32_to_f16 and f16_to_f32 routines.

10-4. I'll get right on it.

Can you provide for me your Hercules changes in the form of a patch, so I can test my softfloat changes before actually committing them?

That is to say, I'd like to try building Hercules with your changes for myself, so I can see (recreate) your reported link error, and then temporarily make my softfloat changes and then rebuild Hercules (with your changes again), to verify that the problem is now fixed.

Then I can commit my changes with absolute confidence to the softfloat repository.

Thanks.

@JamesWekel
Copy link
Contributor

Fish,

I'll post a patch tomorrow. I'm in the middle of moving the zvector instructions to a new file nnpa.c which will include NNPA: Function Code 0: NNPA-QAF (Query Available Functions). All the NNPA stuff will then be in one source file.

Jim

@JamesWekel
Copy link
Contributor

JamesWekel commented Jul 30, 2024

Fish,

As requested, here is a patch with my current nnpa.c with associated updates to 15 files.

As always, comments / suggestions are appreciated.

Jim

@Fish-Git
Copy link
Member Author

Fish-Git commented Jul 30, 2024

As requested, here is a patch...

Thanks. I'm on it!

It looks like this "simple" change is going to take me longer than originally expected though. My first attempt to just move the f32_to_f16 and f16_to_f32 functions into "hercsource" directory (and update the sources.txt appropriately of course) failed with yet even more unresolved link errors:

softfloat_normSubnormalF16Sig  referenced in function f16_to_f32
softfloat_f16UIToCommonNaN     referenced in function f16_to_f32
softfloat_roundPackToF16       referenced in function f32_to_f16
softfloat_commonNaNToF16UI     referenced in function f32_to_f16

So now I'm going to have to do the same thing for the source files containing those functions too. I'm hoping this "simple" change doesn't end up snowballing into some huge complicated mess!

In any case, I'll let you know when I eventually have something for you to test with.

Fish-Git added a commit to SDL-Hercules-390/SoftFloat that referenced this issue Jul 30, 2024
@Fish-Git
Copy link
Member Author

Fish-Git commented Jul 30, 2024

SoftFloat fix committed!

"Fix for GitHub z/Arch Issue #650"
Commit: c114c53e672d92671e0971cfbf8fe2bed3d5ae9e

Tested on both Windows and Linux (with your nnpa.patch applied): Both now build cleanly! (whereas before they got "unresolved" errors).

You should now be good to go!

@Fish-Git
Copy link
Member Author

Fish-Git commented Jul 30, 2024

NOTE:

You will of course need to git update your SoftFloat external package repo and rebuild it in order for your libs directory to get updated with the new softfloat libs, so that Hercules links correctly. You know how to do that, yes? You just use the extpkgs script to either "update" or re-"clone" package "s" (i.e. softfloat). Enter "extpkgs /?" (or "extpkgs.sh --help") for more information.

Or you can simply use Bill's Hercules Helper, of course.

@JamesWekel
Copy link
Contributor

Fish,

Thank you for the SoftFloat update..

I'm currently just using the SoftFloat X64 libraries that are part of the 'develop' branch. I have build the external packages but it has been a while.

Just to be clear, the 'develop' branch does not have updated SoftFloat libraries. Once I commit the nnpa code, everyone doing X86-64 development on the 'develop' branch will have to update their SoftFloat libraries with a new version from https://github.com/SDL-Hercules-390/SoftFloat.git. Is this going to cause some confusion/downstream support issues?

Jim

@Fish-Git
Copy link
Member Author

Fish-Git commented Jul 30, 2024

I'm currently just using the SoftFloat X64 libraries that are part of the 'develop' branch.

Then you should be okay. The last commit I made was to update those lib files. So for Windows, you should be okay, as well as any x86 Linux user that is able to use the Herc libs.

It's just for some Linux users that might have to update and rebuild their softfloat repo/libs if they're unable to use the ones that come with Herc, such as those who have a non-x86 system (such as ARM for example).

Make sense?

@Fish-Git
Copy link
Member Author

Just to be clear, the 'develop' branch does not have updated SoftFloat libraries.

That was true, but is now no longer true as of a couple hours ago, since, as I said, Herc's libraries have since been updated:

  • 855f126 "Update Hercules' SoftFloat libs to new version".

@Fish-Git
Copy link
Member Author

Once I commit the nnpa code, everyone doing X86-64 development on the 'develop' branch will have to update their SoftFloat libraries with a new version from https://github.com/SDL-Hercules-390/SoftFloat.git.

ONLY if they're running on non-x86 hardware (or otherwise are unable to use the libs that come with Hercules).

Is this going to cause some confusion/downstream support issues?

Possibly.

If they build Herc themselves the hard way, then yes, they will have to update and rebuild their softfloat external package libraries.

If they build Herc using Bill's Hercules Helper however, then probably not. I believe Bill's Hercules Helper builds Hercules just fine for most all non-x86 systems. @wrljet Bill? Is that true? Does your script always refresh (git pull) for all of the external package repos each time? (and rebuild them if they've changed?)

But if they, like you, simply link with the libs that come delivered with Hercules, then no, they should be unaffected.

@wrljet
Copy link
Member

wrljet commented Jul 30, 2024

Fish,

Hercules-Helper rebuilds the extpkgs from source, with a fresh git clone, on all systems except Windows.
(unless --no-clone option is used)

Bill

@JamesWekel
Copy link
Contributor

Fish,

Thank you. Thank you for your last commit to update the SoftFloat libraries! My nnpa.c code compiles and links. Now to work on some tests.

Whenever I've used hercules-helper to install hercules on my Raspberry PI 5, all the external packages are built.

Jim

@mfsysprog
Copy link

mfsysprog commented Aug 18, 2024

The z/vector E6 instructions, for example VECTOR FP CONVERT TO NNP, reference NNP-Data-Type-1 Format. From z/Architecture Principles of Operation, SA22-7832-13, page 26-1 states:

Neural Network Processing Data

The NEURAL NETWORK PROCESSOR ASSIST
instruction, as well as the related convert instructions
described in this chapter, perform operations on
model-dependent data types.

NNP-Data-Type-1 Format

NNP-data-type-1 format represents a 16-bit signed
floating-point number in a proprietary format with a
range and precision tailored toward neural-network
processing. Other models may use other data formats.

But the NNP-data-type-1 format is not described. Does anyone have additional reference information on the format? The closest that I've found is a DLFLOAT presentation: https://pdfs.semanticscholar.org/5359/1b203af986668ca6586f80d30257d3ee52d7.pdf

Thanks, Jim

I came across this patent from IBM that describes the whole workings of the neural networks assist processing. It seems it also explains the NNP-data-type-1.

https://patents.justia.com/patent/11669331

Edit: This links to a pdf version that also has the images:
https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/11669331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Developers are invited to discuss a design change or solution to a coding problem. Enhancement This issue does not describe a problem but rather describes a suggested change or improvement. Ongoing Issue is long-term. Variant of IN PROGRESS: it's being worked on but maybe not at this exact moment.
Projects
None yet
Development

No branches or pull requests

8 participants