Friday, April 5, 2013

The Perfect Programming Language: Introduction

This is part one of a series: The Perfect Programming Language.

I grew up hacking away in BASIC on what I believe was a 386. I always found the creation and resolution of problems similar to the tension and release of musical compositions. I love that before you solve a problem you must first create that problem to solve. The relationship between creating problems and solving them within a logical context is both a lot of fun and a great mental exercise.

This is where a very blatant problem in the software development world comes to light- User Interface. When we talk about UI we are typically talking about creating UIs for end-users to interact with. I'm infinitely more concerned with the UI that I am using to develop software- the Language and the Paradigm. Most programming languages are made either by hobbyists, academe or engineers- typically to implement a specific paradigm. Unfortunately, many paradigms force certain problems to be solved in a manner that may not be very elegant for that particular problem. You're typically faced with a choice- switch languages, emulate another paradigm, or use an inappropriate but highly familiar paradigm to solve the problem. As a programmer, I don't ever want to face this issue. It's surprising that after 40 years of languages no one has even attempted to address this problem in a formal manner. We're either bloating new features into C++ or hoping things work as intended in Python.

There is a point in which a programmer doesn't have any issues switching paradigms or languages. Whether it's assembly or a domain-specific language, imperative or functional, each addresses the underlying logic of Boolean algebra in familiar ways. Nonetheless, many of these paradigms are developed in iterations, adding features that appear to be missing or patching problems up with standard libraries. This approach has given rise to a massive quantity of keywords in languages like C++ and Java, where it isn't very clear to the user what the best way to solve a problem is. For example, Tail Calls  in C++ aren't a feature of the language, but of a compiler. That compiler may or may not optimize Tail Calls depending on the arrangement of keywords. It's silly because the language does NOT explicitly communicate to the programmer what is going on. Tail Calls should be a specification of the language rather than optimizations by the compiler. In this way, C++ achieves the functionality that it intends to, but the language itself does not send a coherent message to the user whether their implementation is ideal or not. VM languages usually address the problem of language consistency across tool-chains, but there are always instances where Linux and Windows implementations of the JVM behave differently (the most recent for me had to do with high-resolution timers).

Then there are instances where we simply don't know how a VM or Compiler is interpreting our code to solve the problem. We know that the problem will solve, but we don't know if it is optimal. Python is a very expressive language, but that expressiveness comes at the expense of unknown engineering. The idea of code being Pythonic is wonderful: the shortest solution to a problem should always be the most readable and resource efficient. However, the design of the language doesn't communicate what will be ideal. We don't know how our instructions are being optimized for any particular set of commands, therefore we can't write programs with an expectation that they will be optimized in a predictable manner. This is why python is great for hobbyists, prototyping, math, and scripting, but in no way am I going to build anything with it that needs performance. I don't really know how it is processing my code.

A language shouldn't just perform tasks as we describe, but describe to us how a task will perform. A language should, by the nature of its design, promote optimal code. This optimal code should be obvious to the programmer and coherent to the compiler. The language should also not leverage or specify strict requirements on the developer, but facilitate freedom of expression within a paradigm that is inherently optimal for the problem that we're solving. How could we possibly describe a paradigm that could do that? Instead of specifying a language around a particular paradigm, we specify a meta-language around a meta-paradigm.

There are already a lot of neat meta-languages out there that give us a lot of meta-programming power, but they aren't created for the Meta-Paradigm. By specifying a unique paradigm for any given problem we can easily and directly solve aggregations of problems. Our solution may emerge through specificity of a paradigm rather than applying a general paradigm to a specific problem.

The next article in this series will describe the meta-paradigm in greater detail, with the implementation and technical specification to follow in subsequent articles.


  1. On high-resolution timers, that isn't the bailiwick of a language. It really can't be as a language needs to run on (ultimately) real hardware at some point (even emulated, it's still running on some physical computer hardware). About a year or two ago, I ran into the problem of high-resolution timers on Linux and Solaris, and I work in C, not Java. Annoying yes, but I could still work around the issue.

    Hardware differences exist and it can be hard to hide *all* the details. Even on a single computer differences can arise. The Amiga 500 had two slightly different variations---the North American version and the European version. The difference was *very* slight---a few Hz difference in running speed, and some differences in video (because of the different video standards in North America and Europe). In 99% of the cases, it made very little difference but I did come across a few programs that failed to function properly because of said difference (basically, it was code that hit the hardware *hard* to achieve certain effects---the demo scene---and this had *nothing* to do with the language used (in all cases it was the same language, MC68000 assembly)).

    On tail-calls---yes, some languages don't specify it (C/C++), some go out of their way to *not* support it (Python) and some *require* it (Scheme/Lua). All three are arguable positions. Guido van Rossum does not want Python to support tail calls as he feels it's an impediment to debugging (it leads to an inconsistent call stack), and if left unspecified, you could end up with code that depends upon a particular implementation of the language. C and C++ do not specify it, so a compiler (like GCC or the Solaris Sun Studio C) *can* support it with the right command line options since certain constructs can benefit from the optimization. This gives compiler writers liberties to include it if they can, but doesn't force the compiler writers into supporting it on a platform that it may be difficult to implement. Lua and Scheme require tail call optimizations. In Lua's case, it's because the VM was designed to support it and it is very useful in many instances, but anyone providing a Lua language *must* support it (like LuaJIT).

    The problem here is there's a fine line between under-specification (like C before it was standardized in 1989) and over-specification. A good example of over-specification (and knowing how a task will be performed) is Chill, a proprietary language used in telephony. Because it runs on an 20886, the language is permeated with 64k-segments (a quirk of how things are addressed on the 80286) and because the programmers "know" how it will translate Chill into 286 assembly, the compiler is now prohibited from making many common optimizations because that would interfere with live system patching (Chill is used to program phone switches, which can only have a downtime of about 5 minutes per year). In this case, "optimal code" isn't "fastest" or "smallest" but "easiest to patch".

    1. Thank you for your replies! I obviously have no formal experience in language design, so many of the concepts I'm talking about already exist in other contexts. I appreciate any and every thought on the matter.

      High-Res timers: It isn't a big deal- it's just an example of how I expected a VIRTUAL Machine to be consistent across platforms, as advertised. For something as huge as the JVM, I'd expect them to make platform-specific translations of system time into something that is consistent across platforms.

      In the case of Lua though, if you don't want to Tail Call it's trivial to avoid it with your Syntax. I think that the optimization should be an expectation provided by the language. Whether you want to use it or not, for whatever reason, should be up to the programmer. That C/C++ doesn't specify precisely how compilers should work, your experience can feel non-deterministic, especially when it comes to unusual platforms.

  2. You probably didn't have an 80186, they weren't PC compatible and as a result were extremely, extremely rare. Most of them were used outside the US by a handful of businesses. They also came out in 1982. Since you're 27, it's much more likely that you used a 386 or a 486, probably a 486 because they were extremely common, but many people did have 386s.

    Anyway, I can tell you've given the rest of this a lot of thought, and really believe in it. So, I'm a little hesitant to comment on it... But...

    For me personally, a programming language is a programming language is a programming language. Asking which one is better is like asking "which is better, English or German or French?" They're all Turing complete, interchangeable. Perfect is subjective. What's perfect to you will be imperfect to a lot of other people, and vice versa.

    In terms of C and C++, they're actually very different languages. People often group them, but if you want to get down to the nuts and bolts they're very different. Unlike C++, C is amazingly simple. It has one of the fewest numbers of keywords of any language, and they're all used with cold, hard consistency. It's a really remarkably simple language.

    C was designed for use cases that many modern languages don't consider. It is very, very close to the hardware. C can write to any memory address that the processor has access to (and even those it doesn't) directly, and the way you do so is really elegant. It's literally identical to writing to any normal variable. In fact, there is no difference in C between writing directly to any memory address and writing to a variable. That's not so useful these days, but imagine you have a system like the NES with no operating system, no system functions, no libraries, everything done through writing to specific addresses in memory by communicating directly with the processor. C was designed for environments like that where there is absolutely nothing between the program and the processor and the memory.

    Java and Lua weren't designed for situations like that. They have no good way to deal with memory addresses or the processor directly. This is why C is considered "dangerous," and these other languages "safe." C can very easily do anything, even things that can't or shouldn't be done. Don't get me wrong, I like Java and Lua. I've even used Lua as a scripting language in one of my own programs, and it was very good. Keep in mind though, both Java and Lua were extremely heavily influenced by C++. They are practically descendants of C++ much as C++ is a descendant of C.

    1. Thanks for replying-- I believe we had a 386 ^_^ (just corrected it).

      I agree that all programming languages can accomplish the same tasks if they share Turing completeness, however, that doesn't mean every language is equivocal in utility. Languages aren't designed with a focus on UI specifically, which I think could result in a powerful solution to many problems programmer's deal with. Of course, I don't think such a thing could be perfect-- I just used 'perfect' as a buzz-word. I have a more sincere description of the overall goals here

      I actually had a brief correspondence with Bjarne about the differences between C and C++. I was concerned that C++'s objectives of being both a superset of C and compatible with C resulted in an irreconcilable contradiction. C++ is a ridiculously powerful language, but it's own design goals compromise usability without accomplishing its design goals. I tend to pick C over C++ to avoid the unnecessary obfuscations- and pick other high-level languages when I need high-level concepts (I've recently fallen for Lua, C with Lua is very very strong).

      I don't think C++ had a strong influence on Lua, at least not in terms of usability (both because Lua is written in C and because Lua is functionally almost functional, despite being imperative). I think that the low-level memory management that C provides can be accomplished in implicitly with syntax in an intuitive manner- I describe this to some degree in a later post in this series. I've been working on a Thesis lately, though, so I haven't gotten to work on this lately.