compiling to z-machine code ?

G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

Is there a somewhat easy to understand spec of the z-machine? Just for
fun, I am interested in writing a simple compiler that can produce
basic z-machine runnable code (ZIP? ZIL? i forget the terminology) with
commands like variable assignment (string/integer), printing to the
screen, asking the user for input, if/then, etc. Any pointers
appreciated...
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

On that note: has anyone ever done a program that could take a story file
and generate a basic program (or any other language) from it?
 

samwyse

Distinguished
Feb 9, 2002
166
0
18,680
Archived from groups: rec.games.int-fiction (More info?)

On or about 2/5/2005 6:58 PM, Mad Scientist Jr did proclaim:
> Is there a somewhat easy to understand spec of the z-machine? Just for
> fun, I am interested in writing a simple compiler that can produce
> basic z-machine runnable code (ZIP? ZIL? i forget the terminology) with
> commands like variable assignment (string/integer), printing to the
> screen, asking the user for input, if/then, etc. Any pointers
> appreciated...

First of all, there's the Inform compiler, which produces either Z-code
or Glulx (which is, sort-of, kind-of, a 32-bit version of Z-code,
although its author would disagree). You can find it here:
http://www.inform-fiction.org/

Given that, if you still want to write your own compiler then you need
"The Z-Machine Standards Document" found at:
http://www.inform-fiction.org/zmachine/standards/index.html
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

>> On that note: has anyone ever done a program that could take
>> a story file and generate a basic program (or any other
>> language) from it?
> Not off the top of my head. Back in '97-'98, there was a
> big effort to decypher storyfiles, as part of the project
> to figure out how storyfiles were put together and exactly
> what all of the Z-machine opcodes were supposed to do.
> All of the Ztools were created at that time; you can
> download them here:

Inform 1 was released to the public on 10 May 1993, and I recall Graham
Nelson claiming that the didn't decode the Zcode story file format
himself - others had already done that before him. Your acclaimed date
is probably wrong by at least five years.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

>> On that note: has anyone ever done a program that could take
>> a story file and generate a basic program (or any other
>> language) from it?
> Not off the top of my head. Back in '97-'98, there was a
> big effort to decypher storyfiles, as part of the project
> to figure out how storyfiles were put together and exactly
> what all of the Z-machine opcodes were supposed to do.
> All of the Ztools were created at that time; you can
> download them here:


Inform 1 was released to the public on 10 May 1993, and I recall Graham
Nelson claiming that the didn't decode the Zcode story file format
himself - others had already done that before him.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

Not that I am in a great hurry to do this, but I can't imagine it NOT
being possible to disassemble z-code. Are you saying that due to this
compressed format, any meaningful variable or function/procedure names
would be lost, and instead you would be left with A,B,C, functionA,
functionB, functionC, etc.? Because if the z-machine can run it, a
z-disassembler ought to be able to render source code from it (albeit
without friendly var/function names).

>You are especially out of luck if you want a file that you can modify
and re-compile, since storyfiles use a compressed format to store
pointers to objects, strings and procedures.
 

samwyse

Distinguished
Feb 9, 2002
166
0
18,680
Archived from groups: rec.games.int-fiction (More info?)

On or about 2/5/2005 9:25 PM, Terry Olsen did proclaim:

> On that note: has anyone ever done a program that could take a story file
> and generate a basic program (or any other language) from it?

Not off the top of my head. Back in '97-'98, there was a big effort to
decypher storyfiles, as part of the project to figure out how storyfiles
were put together and exactly what all of the Z-machine opcodes were
supposed to do. All of the Ztools were created at that time; you can
download them here:

http://www.inform-fiction.org/zmachine/ztools.html

InfoDump will display all of the strings in a storyfile, while txd will
give you just about every byte formatted for easy viewing. Nothing will
look at a procedure and create high-level code; the best you can ever
hope for is Inform assembler op-codes. You are especially out of luck
if you want a file that you can modify and re-compile, since storyfiles
use a compressed format to store pointers to objects, strings and
procedures. This means that deciding which variables and object
properties are pointers to objects and which are just numbers would be
especially uncertain. Any changes to the file that caused things to
move would invalidate any pointers that had been misidentified as
numbers (and vice versa), causing interesting errors at run time.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

okay i think i understand that explanation. thanks to everyone who has
posted. i read some articles by the infocom guys that showed some
Muddle (aka MDL, the LISP like language they used) source code and it's
ugly! Did they ever make available any of their complete source codes
anywhere?

>The interpreter doesn't have to distinguish a value's meaning until it

actually get used by an instruction. When the interpreter sees the
opcode to print local variable 1, it can assume that the variable
contains the address of a string. (And if it doesn't, it can signal a
run-time error.) But the value may have been stored in that variable
at
a completely different time.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

Here, Mad Scientist Jr <usenet_daughter@yahoo.com> wrote:
> Not that I am in a great hurry to do this, but I can't imagine it NOT
> being possible to disassemble z-code. Are you saying that due to this
> compressed format, any meaningful variable or function/procedure names
> would be lost, and instead you would be left with A,B,C, functionA,
> functionB, functionC, etc.?

You have that problem, yes.

A bigger problem is distinguishing addresses from constants. If you
see a z-code opcode "load $4A8E into local variable 1", you have to
figure out that $4A8E actually refers to a string at that address,
rather than a numeric constant. This is important for a human reading
the code. It's even more important if you plan to modify the code and
recompile it, because in the new binary that string may be at a
different address. If you guess wrong about what that instance of
$4A8E means, you'll either print garbage or change an important
numeric constant in the program.

You can use some reasonable heuristics. For example, if local variable
1 is immediately printed, it's a very good bet that $4A8E is a string
address. If there *is* a string with address $4A8E, then again it's a
fair bet. But none of these heuristics is perfectly reliable, so any
true disassembling effort will require some going-over with human eyes.

(If that problem seems easy, then consider property identifiers. These
have the same difficulty, but they're small numbers like 4 or 7, which
-- unlike $4A8E -- are as likely to be *intended* as numbers as they
are to be property ids.)

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*
I'm still thinking about what to put in this space.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

"Andrew Plotkin" <erkyrath@eblong.com> wrote in message
news:cu5knp$fu$2@reader2.panix.com...
> Here, Mad Scientist Jr <usenet_daughter@yahoo.com> wrote:
>> Not that I am in a great hurry to do this, but I can't imagine it NOT
>> being possible to disassemble z-code. Are you saying that due to this
>> compressed format, any meaningful variable or function/procedure names
>> would be lost, and instead you would be left with A,B,C, functionA,
>> functionB, functionC, etc.?
>
> You have that problem, yes.
>
> A bigger problem is distinguishing addresses from constants. If you
> see a z-code opcode "load $4A8E into local variable 1", you have to
> figure out that $4A8E actually refers to a string at that address,
> rather than a numeric constant. This is important for a human reading
> the code. It's even more important if you plan to modify the code and
> recompile it, because in the new binary that string may be at a
> different address. If you guess wrong about what that instance of
> $4A8E means, you'll either print garbage or change an important
> numeric constant in the program.

>
> You can use some reasonable heuristics. For example, if local variable
> 1 is immediately printed, it's a very good bet that $4A8E is a string
> address. If there *is* a string with address $4A8E, then again it's a
> fair bet. But none of these heuristics is perfectly reliable, so any
> true disassembling effort will require some going-over with human eyes.

Ok, so a big problem is distinguishing addresses from constants. How then
does an interpreter distinguish them? A disassembler can see just as much of
the story file as an interpreter. So why wouldn't it be able to distinguish
the cases from each other? If an interpreter can do it, so can a
disassembler.

> (If that problem seems easy, then consider property identifiers. These
> have the same difficulty, but they're small numbers like 4 or 7, which
> -- unlike $4A8E -- are as likely to be *intended* as numbers as they
> are to be property ids.)

And again, same reasoning. If an interpreter can distinguish them, then why
couldn't a disassembler?
 

samwyse

Distinguished
Feb 9, 2002
166
0
18,680
Archived from groups: rec.games.int-fiction (More info?)

On or about 2/6/2005 12:00 PM, Rioshin an'Harthen did proclaim:
> "Andrew Plotkin" <erkyrath@eblong.com> wrote in message
> news:cu5knp$fu$2@reader2.panix.com...
>>You can use some reasonable heuristics. For example, if local variable
>>1 is immediately printed, it's a very good bet that $4A8E is a string
>>address. If there *is* a string with address $4A8E, then again it's a
>>fair bet. But none of these heuristics is perfectly reliable, so any
>>true disassembling effort will require some going-over with human eyes.
>
> Ok, so a big problem is distinguishing addresses from constants. How then
> does an interpreter distinguish them? A disassembler can see just as much of
> the story file as an interpreter. So why wouldn't it be able to distinguish
> the cases from each other? If an interpreter can do it, so can a
> disassembler.

The interpreter doesn't have to distinguish a value's meaning until it
actually get used by an instruction. When the interpreter sees the
opcode to print local variable 1, it can assume that the variable
contains the address of a string. (And if it doesn't, it can signal a
run-time error.) But the value may have been stored in that variable at
a completely different time. Also note that if the variable was
originally named 'temp', then it will likely have different types of
values at different times, which can futher bolix up the disassembler.
And static analysis of the entire program doesn't help since there might
be dead code and/or undiscovered bugs. Just because there's an
instruction somewhere to print local variable 1 doesn't mean that it
will ever be executed.

BTW, Infocom used a LISP-based language that is very different from
Inform. There are storyfiles (Seastalker comes immediately to mind)
that the Inform compiler would be unable to duplicate, because it won't
let you put procedures and strings in readable memory.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

samwyse wrote:
> A really interesting (though very short) discussion of ZIL (Zork
> Implementation Language) can be found here:
>
> http://lambda-the-ultimate.org/node/view/429
>
> It indicates that Activision *does* have the source code for both
> ZIL and all of the Infocom games. "A lot of people might not
> realize it, but as late as 1993, Activision's "Return to Zork"
> was still written in a (heavily extended) version of ZIL, i.e. a
> dialect of Lisp."

Having been a beta-tester for "Return to Zork", I would have to
disagree with the quote from this page... RtZ was written in a language
that _looked_ Lispy, but it had zero to do with MDL or ZIL... no common
code, no common authorship, etc.

-ethan
 

samwyse

Distinguished
Feb 9, 2002
166
0
18,680
Archived from groups: rec.games.int-fiction (More info?)

On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
> okay i think i understand that explanation. thanks to everyone who has
> posted. i read some articles by the infocom guys that showed some
> Muddle (aka MDL, the LISP like language they used) source code and it's
> ugly! Did they ever make available any of their complete source codes
> anywhere?

LISP, APL and Forth are all interpreted languages that are incredibly
powerful, but which have incredibly steep learning curves. LISP-like
languages are particularly suited for natural language applications,
which is why they are used so much in Artificial Intellegence and were
judged well suited for interactive fiction.

The excerpts that you've seen published are pretty much all that there
is, unless Activision has some code hidden away in a vault somewhere.
The circumstantial evidence is pretty strong, however, that all they
have are the binaries, the same as us. Someone posted last year that
they were going to try to recreate the Infocom compiler; nothing has
been heard on the subject since then.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

Here, samwyse <dejanews@email.com> wrote:
> On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
> > okay i think i understand that explanation. thanks to everyone who has
> > posted. i read some articles by the infocom guys that showed some
> > Muddle (aka MDL, the LISP like language they used) source code and it's
> > ugly! Did they ever make available any of their complete source codes
> > anywhere?
>
> LISP, APL and Forth are all interpreted languages that are incredibly
> powerful, but which have incredibly steep learning curves. LISP-like
> languages are particularly suited for natural language applications,
> which is why they are used so much in Artificial Intellegence and were
> judged well suited for interactive fiction.

The bits of Infocom code we have are not LISP. They're in a compiled
(not interpreted) language, which is imperative in form (not
functional or dynamic). It's a lot like early Inform, which is not a
surprise. Only the syntax is LISP-like.

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*
I'm still thinking about what to put in this space.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

Here, Rioshin an'Harthen <rioshin@breakthru.com> wrote:
> "Andrew Plotkin" <erkyrath@eblong.com> wrote in message
> news:cu5knp$fu$2@reader2.panix.com...
> >
> > A bigger problem is distinguishing addresses from constants. If you
> > see a z-code opcode "load $4A8E into local variable 1", you have to
> > figure out that $4A8E actually refers to a string at that address,
> > rather than a numeric constant. This is important for a human reading
> > the code. It's even more important if you plan to modify the code and
> > recompile it, because in the new binary that string may be at a
> > different address. If you guess wrong about what that instance of
> > $4A8E means, you'll either print garbage or change an important
> > numeric constant in the program.
>
> Ok, so a big problem is distinguishing addresses from constants. How then
> does an interpreter distinguish them?

The interpreter doesn't distinguish them. It just passes the value
through blindly, and assumes that the right thing will happen.

Note that I specified the tasks of "making human-readable code" and
"modifying and recompiling code". Those are tasks which an interpreter
doesn't have to do. They are tasks which the Z-code format is not
designed to make easy. Unsurprisingly, therefore, they're *not* easy.

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*
I'm still thinking about what to put in this space.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

In article <cu5knp$fu$2@reader2.panix.com>,
Andrew Plotkin <erkyrath@eblong.com> wrote:
>A bigger problem is distinguishing addresses from constants. If you
>see a z-code opcode "load $4A8E into local variable 1", you have to
>figure out that $4A8E actually refers to a string at that address,
>rather than a numeric constant. This is important for a human reading
>the code. It's even more important if you plan to modify the code and
>recompile it, because in the new binary that string may be at a
>different address. If you guess wrong about what that instance of
>$4A8E means, you'll either print garbage or change an important
>numeric constant in the program.

My knowledge of decompilers has deteriorated a lot in the last 25 years, but
I'm fairly sure this is a "standard problem" solved by some variant of type
inference or some other form of dynamic code analysis. I think there are
similar issues with byte-code-to-native compilers. I'm not saying it's easy,
just that there is literature in the compiler community that addresses similar problems.

>You can use some reasonable heuristics. For example, if local variable
>1 is immediately printed, it's a very good bet that $4A8E is a string
>address. If there *is* a string with address $4A8E, then again it's a
>fair bet. But none of these heuristics is perfectly reliable, so any
>true disassembling effort will require some going-over with human eyes.

IIRC the above-mentioned amount to formalizing these kind of heuristics and
verifying that the code really does treat the value that way.

>(If that problem seems easy, then consider property identifiers. These
>have the same difficulty, but they're small numbers like 4 or 7, which
>-- unlike $4A8E -- are as likely to be *intended* as numbers as they
>are to be property ids.)

The interesting trick is if some clever compiler treats them as property
identifiers in one place and simple integers in another. I have vague
memories of some peephole optimizers generating such multi-use situations.
--
"Yo' ideas need to be thinked befo' they are say'd" - Ian Lamb, age 3.5
http://www.cs.queensu.ca/~dalamb/ qucis->cs to reply (it's a long story...)
 

samwyse

Distinguished
Feb 9, 2002
166
0
18,680
Archived from groups: rec.games.int-fiction (More info?)

On or about 2/6/2005 4:06 PM, Andrew Plotkin did proclaim:
> Here, samwyse <dejanews@email.com> wrote:
>
>>On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
>>
>>>okay i think i understand that explanation. thanks to everyone who has
>>>posted. i read some articles by the infocom guys that showed some
>>>Muddle (aka MDL, the LISP like language they used) source code and it's
>>>ugly! Did they ever make available any of their complete source codes
>>>anywhere?
>>
>>LISP, APL and Forth are all interpreted languages that are incredibly
>>powerful, but which have incredibly steep learning curves. LISP-like
>>languages are particularly suited for natural language applications,
>>which is why they are used so much in Artificial Intellegence and were
>>judged well suited for interactive fiction.
>
> The bits of Infocom code we have are not LISP. They're in a compiled
> (not interpreted) language, which is imperative in form (not
> functional or dynamic). It's a lot like early Inform, which is not a
> surprise. Only the syntax is LISP-like.

Quite so, but I was mostly discussing why a language was chosen that was
"ugly". I've used a few variants of LISP, including (in reverse order)
CommonLisp, Scheme and an IBM mainframe version that was probably a
close decendent of McCarthy's original implementation. A really
interesting (though very short) discussion of ZIL (Zork Implementation
Language) can be found here:

http://lambda-the-ultimate.org/node/view/429

It indicates that Activision *does* have the source code for both ZIL
and all of the Infocom games. "A lot of people might not realize it,
but as late as 1993, Activision's "Return to Zork" was still written in
a (heavily extended) version of ZIL, i.e. a dialect of Lisp."
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

On 2005-02-06 09:21:52 -0800, "Mad Scientist Jr"
<usenet_daughter@yahoo.com> said:

> Are you saying that due to this
> compressed format, any meaningful variable or function/procedure names
> would be lost, and instead you would be left with A,B,C, functionA,
> functionB, functionC, etc.? Because if the z-machine can run it, a
> z-disassembler ought to be able to render source code from it (albeit
> without friendly var/function names).

No, but many things are referred to as simply numbers, and you can't
tell what those numbers mean unless they're being used in their
functions.

For example, suppose that I have the following code:

[Foo direction;
print "That way is ", (name)location.direction;
];
[Bar obj;
print "That is ", (the)obj;
];
[Baz verb;
if (verb == ##Jump)
"You jump.";
"You don't jump.";
];
[Quux;
Foo(n_to);
Bar(platypus);
Baz(##Climb);
];

Now, in Quux, you're passing a number to Foo, a number to Bar, and a
number to Baz. The compiler is kind enough to let you use n_to (a
property name), platypus (an object identifier), and ##Climb (a verb
identifier) to refer to their numbers, but in the compiled code,
they're just numbers-- and they overlap. (In fact, the standard Inform
library relies on the fact that n_to and n_obj have the same number--
something I found out, much to my dismay, when I violated this just
before putting my program in beta.) A disassembler examining Quux
wouldn't know which is which, that is, it wouldn't know if the number
for n_to was meant to represent a property, object, verb, etc.

"But can't it look at Foo, Bar, and Baz?" Well, kind of. That would
take a considerable amount of analysis. (It's not entirely impossible;
Spidey, a Scheme analyzer, does some such analysis. But I suspect it's
impossible to always perform such analysis). Also, the routines may
not clearly define what type of object they're expecting. In the
program I'm working on now, some subroutines can be passed dictionary
words or objects, for example.

There are decompilers for some languages, but they're also unreliable
for similar reasons. It's more of a problem for Inform, since it
relies on lots of types. Decompilers are most common for languages
like C, which is almost a macro assembler anyway.

Cheers,
Piquan
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

In article <1107693861.155522.259590@c13g2000cwb.googlegroups.com>,
Fredrik Ramsberg <f.r@mail.com> wrote:
>>> On that note: has anyone ever done a program that could take
>>> a story file and generate a basic program (or any other
>>> language) from it?
>> Not off the top of my head. Back in '97-'98, there was a
>> big effort to decypher storyfiles, as part of the project
>> to figure out how storyfiles were put together and exactly
>> what all of the Z-machine opcodes were supposed to do.
>> All of the Ztools were created at that time; you can
>> download them here:
>
>Inform 1 was released to the public on 10 May 1993, and I recall Graham
>Nelson claiming that the didn't decode the Zcode story file format
>himself - others had already done that before him. Your acclaimed date
>is probably wrong by at least five years.

The Ztools date back to 1992, and I believe they were the _second_
generation of Z-machine story file tools.

However, there was a later push to decode V6 and fill in
some of the gaps.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

In article <1107710512.185343.281190@l41g2000cwc.googlegroups.com>,
Mad Scientist Jr <usenet_daughter@yahoo.com> wrote:
>Not that I am in a great hurry to do this, but I can't imagine it NOT
>being possible to disassemble z-code. Are you saying that due to this
>compressed format, any meaningful variable or function/procedure names
>would be lost, and instead you would be left with A,B,C, functionA,
>functionB, functionC, etc.? Because if the z-machine can run it, a
>z-disassembler ought to be able to render source code from it (albeit
>without friendly var/function names).

Certainly Z-code can be disassembled. But at present, only to a
low-level assembler. Nobody has made a totally successful decompiler,
though there have been partially successful attempts.

In theory, most Z-code shouldn't be all that hard to decompile. It
comes down to a matter of bookkeeping; some manual help would be
needed when information was totally lost and to fill in variable and
function names, but not as much as you might think. In practice, no
one has done it. It doesn't help that TXD isn't really suitable as
the starting point for a decompiling effort, so you'd have to re-write
the disassembler part before you got to the fun stuff.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

In article <cu666g$6jj$1@knot.queensu.ca>,
David Alex Lamb <dalamb@qucis.queensu.ca> wrote:
>
>My knowledge of decompilers has deteriorated a lot in the last 25 years, but
>I'm fairly sure this is a "standard problem" solved by some variant of type
>inference or some other form of dynamic code analysis. I think there are
>similar issues with byte-code-to-native compilers. I'm not saying it's easy,
>just that there is literature in the compiler community that
>addresses similar problems.

Most recent byte-code to native work has been done on type-aware VMs
like Java. This simplifies the task enormously.

On the flip side, the Z-machine's paucity of types simplifies that
side quite a bit.

>IIRC the above-mentioned amount to formalizing these kind of heuristics and
>verifying that the code really does treat the value that way.

Right. This is what I call "bookkeeping". If you can figure out that
parameter two of routine R_A562 must be a string (because it's used in a
print_paddr), then you can figure out that anything passed in as the
actual second parameter to R_A562 is a string also. You can work
things like that forward and backwards, and come up with the types of
a lot of things. With Infocom (but not Inform) code, you can also
assume that global variables have the same type throughout the code,
and that all properties of a given number have the same type.

>The interesting trick is if some clever compiler treats them as property
>identifiers in one place and simple integers in another. I have vague
>memories of some peephole optimizers generating such multi-use situations.

Fortunately, the ZIL compiler does not do this.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

Alas if I only had a slightly bigger brain. I would think the value in
dcompiling z-code would mainly be to see how the masters wrote the
classic Infocom games - more for learning and game design purposes. I'm
sure a lot of that can be done from newer Inform games whose source is
freely available, but I still would love to start by learning from the
original Zork.

> In theory, most Z-code shouldn't be all that hard to decompile. It
> comes down to a matter of bookkeeping; some manual help would be
> needed when information was totally lost and to fill in variable and
> function names, but not as much as you might think. In practice, no
> one has done it. It doesn't help that TXD isn't really suitable as
> the starting point for a decompiling effort, so you'd have to
re-write
> the disassembler part before you got to the fun stuff.
 
G

Guest

Guest
Archived from groups: rec.games.int-fiction (More info?)

If I only had a slightly bigger brain (and more free time) I would try
to write the decompiler myself. As it is, I looked at the z-machine
spec and it's beyond my current knowledge and time... I once knew some
8-bit assembly and my understanding of low level computing ends there.
It's just a better use of my time to learn Inform and start writing
games. My original hope was I could write a simple z-code compiler to
translate my existing text games (which are no Zorks, I assure you) to
z-code from their current homebrewed language (a sub BASIC scripting
tongue at best). I think it best to learn Inform, and then if I'm
*still* intersted in porting these ancient games of mine over for comic
relief, it would be easier to write something to translate them to
Inform code. I think it's best to just write new games and attempt
something good. I begin reading the Inform manual tonight...
 

samwyse

Distinguished
Feb 9, 2002
166
0
18,680
Archived from groups: rec.games.int-fiction (More info?)

On or about 2/6/2005 4:37 PM, David Alex Lamb did proclaim:
> My knowledge of decompilers has deteriorated a lot in the last 25 years, but
> I'm fairly sure this is a "standard problem" solved by some variant of type
> inference or some other form of dynamic code analysis. I think there are
> similar issues with byte-code-to-native compilers. I'm not saying it's easy,
> just that there is literature in the compiler community that addresses similar problems.

Quite so. Java is very strongly typed, and (to support the
retrospection APIs) the class files keep a ton of data that would
normally be of interest to only to a debugger. As a result, Java
decompilers are a dime a dozen, because you can get by with a purely
static analysis. The Z machine is the exact opposite of Java, looking
more like an Motorola 6502. The architecture even seems to support
self-modifying code, although no one seems to have ever tried writing any.

Another problem is that there are multiple compilers. Turning a Java
class file back into Java is aided by the fact that there's really only
been a couple of versions of Java, and those have only very minor
differences. The major versions of Inform are quite different from each
other, there is a C compiler (AFAIK only used for "Silicon Castles"),
and then there's ZIL, of which little is known but was used to create
"the canon". Turing a storyfile compiled using Inform 4 into Inform 5
source would be very difficult; turning something that started out as C
or ZIL would, I suspect, be almost impossible.