Friday, April 5, 2013

The Perfect Programming Language: Killing C/C++

This is part three of a series: The Perfect Programming Language.

Although the previous article doesn't describe, in any detail at all, how such a language would work, I believe that I've brought up some general contentions that will allow us to approach the problem of solving the meta-paradigm both from the engineering side and the UI side. For now, let's focus on what the properties of such a language would be.

Expression-Table: The Gender Fluid Data Type
In LISP, the program itself is the evaluation of a list that represents an expression. In such a way, I believe that there is a single "gender fluid" data type that can be used to conceptually encapsulate grammar rules, functions, classes, expressions, data structures, arrays, et al. This type would be a Table that is an expression that describes itself- not unlike a particle-wave duality. In C++, classes are originally programmed as vTables in C. A vTable just describes the linking of functions and variables within a table, from which special grammar can then be used to interact with those expressions. In a very genuine way, an object in C or C++ is analogous to a table in Lua or more generically as a Dictionary or hash-table (so long as functions are first-class data types). When we access a member of a class, what we're really doing is looking up an expression in a table whose key is the name of that member.

The following expressions are the same (Syntax is lua-like)-
Class.Member = Class.Subroutine(param);
Class['Member'] = Class['Subroutine'](param);


This works exactly the same way for functions. A pure function, that is one with no side-effects, has a 1-to-1 mapping of inputs and outputs. Is this not the same as creating an expression to describe the relationship of all key-value pairs in a hypothetical associative array? Dictionaries are then just functions whose key-value pairs are explicitly declared- this is also just an expression.

The difference between an associative array and a function is mainly appearance,
Class.Member = Class.Subroutine(param);
Class['Member'] = Class['Subroutine'][param]; --not like Lua


If we can conceptually think of all functions and types as expression-tables, then there is quite a lot of flexibility we can achieve. It's also important to realize that a program in such a language would also be an expression-table. The implications of this are very interesting- we can easily export the state of any program as a data structure, wrap programs within each other for sandboxing or exception handling, stream either routines or data across networks, and interface with databases in a very natural way. These sort of table-expressions can also function as a sort of file system- instead of everything is a file, everything is an expression-table. This design suggests a lot of unnecessary overhead, as can be the case in LISP. In a future article in the series I will talk about how such a paradigm can be engineered without generating any additional overhead. Right now I'm focusing on the UI and how a language should communicate its own functionality to the programmer. Obviously, at some point we need to be able to represent primitives. It's important to note that there aren't any primitives that are exceptional to this idea. Every last bit can also be thought of as an expression-table.

Tables as Scope
Scope is just a measure of what expressions are currently occupying the call stack. When we modify the value of a reference type in a function, we are cheating how scope should work*- this is usually to get around the limitation of a function returning a single type. Ideally, scope should sandbox an expression as a discrete processing chunk. In this way, all parameters or keys are value types and the evaluation of the expression, or the return value, is a table. If we want to modify by reference, it's more data-safe to pass by value and return a modified version that we bubble up to replace the originally passed parameter. For this reason (to become apparent later), it's important for an expression in a table to access, but not necessarily modify, its parent table. Through scoping mechanisms like this, we can engage in automatic memory management without running a garbage collector as a process. That is, scope implicitly defines how memory will be managed (which is partially true true for many languages already). This will be discussed in further detail when I get into the engineering-side.

* This is my own assertion. I can go into detail about why I feel this way if anyone is interested.

 Concurrency, Threading, and Co-routines
What happens when you have two expressions whereby neither expression depends on the other to be evaluated?

What should happen?
Expression1 = someEvaluation(4);
Expression2 = someOtherEvaluation(4);


Given the pseudo-specification of our Scope above, there is no way that either Expression1 or Expression2 will need to be evaluated before the other. In situations like these, a compiler should use concurrent call stacks while leveraging whatever advantage it can from hardware (ie. threading). Values can be read from concurrent processes through the parent-table, though they cannot be modified (though an expression could be called from the parent-table that modifies a parent-table- oh look, privacy). Using tables of greater scope, it should  be easy to infer how co-routinessynchronicity,  and threading can be extrapolated from the way in which Scoping would work. While these aren't new ideas, most languages demand that you explicitly, either with compiler flags or a variety of keywords, declare how something is going to be used. I think that the ideas behind each of such keywords can be implied by the structure of the program in a way that is clear to the programmer.

Imperative is Arbitrary
edit: See Lazy Evaluation.

Consider the following expressions,
X = 10
X = fib(10000000)
X = 0
X = fib(X)
X = 4
print(X)

Assuming that fib is a pure function with no side-effects, what should a compiler do? An imperative language would try to evaluate and store fib(10000000) and fib(0), despite the fact that these evaluations aren't used anywhere. As we step through the evaluation of an expression-table, it isn't necessary to evaluate every expression in that table as it appears. We only need to evaluate expression-tables that are necessary for other expression-tables. Since print(X) has a side-effect, we need to know the value of X. My question to the reader: What is X?
X is a key in the parent-table. Since our expressions are managed in a table, all we need to do is update the expression that is represented by X. We don't actually need to evaluate those expressions when they are assigned (RAII). Instead, we just point to the location of the expression in memory and, at compile-time, we can safely reduce the above program to print(4). It's important to realize that we can achieve imperative behavior by embedding assertions into our expressions that enforce evaluation. Since a program is an expression-table itself, we can create an expression-table that wraps any other expression-table into an imperative paradigm without any change in the original program's language (we can also do this to disable or manipulate concurrency). While this in itself is cool, the ramifications include the creation of grammar rules. So long as we have a syntactically distinct way to not only declare what expression a table represents but also how that table will be used grammatically, we can make things sweet in whatever way we want.

edit: Instancing
i don't think it's really necessary to explicitly discuss how an expression-table would create instances of itself to emulate data-types, but it really is no different from a constructor function. We create an expression-table that evaluates to an expression-table with specific and desired qualities. We can modify the grammar to make it look like traditional type definitions and even include flags or a wrapper to enforce strong type checking.

The idea is really quite simple-
  • All data, subroutines, programs, chunks, processes, scope, types et al. can be described with dual-type expression-tables.
  • Scope implicitly and intrinsically handles concurrency and memory during the evaluation of an expression-table.
  • The syntax for defining grammar and expressions are unified.

For now, we have a conceptual UI with a rational basis behind it and an idea about how it might work. The expression-table is a very powerful idea that, when engineered appropriately, should carry no significant overhead and allow a genuinely fluid expression of ideas.

In the next few articles I'll specify, more explicitly, how this language's syntax would look and how it could be engineered (the fun and interesting parts). At some point, I'll try and create a formal specification and go about implementing this language.

The Perfect Programming Language: Premise

This is part two of a series: The Perfect Programming Language.

In my previous article, I provide a short opinion-piece about how it is a problem that languages are not designed around the user-interface- that is, the programmer's UI for developing software is the language. In reality, the paradigm that you program in is a facade that helps you organize your thoughts in a way that can be understood by a compiler. A facade should not be designed by an engineer.

I didn't talk about LISP, Haskell, or Lua in the previous post, all of which deserve mention. LISP makes it very clear what is happening. Since all expressions are lists and all lists are expressions, the translation between programmer and compiler is very easy to understand. Unfortunately, the aesthetics of the language are terrible- it is, very ugly and symbolically verbose. Haskell is semantically beautiful, but the syntax and the appearance of the language are troublesome and complicated. Functions are expressions that are evaluated when they need to be, but we're building expressions upon expressions to achieve a greater goal, which can be conceptually difficult for a programmer to manage. Lua, in my opinion, solves the problems of both languages, but it does so with a few quirks. The learning curve is nonetheless very short, despite boasting a greater level of expressiveness than python and performance that can come close to C in a lot of cases. Functions are first-class in Lua as in Haskell, but expressions could also be thought of as Tables (the only aggregate data structure in Lua) in the way that Lists work in LISP. A lot of my inspiration comes from Lua, but I don't like how Lua's type system works. A table in Lua can be modified to work as a function, but we can not implicitly transcribe a function, or any other data type, to a table- despite the fact that all types appear to have similar qualities. A Number feels like a Table with a meta-table that has arithmetic operators- but that's not really how it works. This, to me, can generate some type inflexibility where it doesn't feel like there should be- or rather, the syntax of the language doesn't suggest that there should be. Similarly, we aren't quite using tables as expressions as we would be in LISP, but we get that sensation about 80% of the time. Lua's facade has a direct correlation to how it was engineered, rather than being designed around how it would look or feel. Therefore, many things look and feel the same, but act different.

Most programming languages provide us with a specific paradigm, but this isn't necessarily going to be the best way to solve a problem. The question is- do we really need a new language to handle the optimal paradigm for a specific problem? In most cases, a library or framework sufficiently ameliorates any differences, but the result can be a lot of bloat in regards to the quantity and variety of symbol usages and syntax. Another big issue can be the flexibility of whatever framework you're using. In most cases, for every program there is a unique paradigm that most effectively solves that problem. Any language that is designed around a general paradigm will, inherently, not necessarily include, as a subset, the unique paradigm that most effectively solves a particular problem. All problems could be solved within a single paradigm, but that paradigm isn't going to be as concise as the solution itself. The solving of a problem with programming is the accretion of a general paradigm into a specific one.

The question of solving a problem with programming starts with the meta-question of what paradigm to use. It follows that the most inclusive programming paradigm is one that attempts to answer the paradigm question directly. The implications of this are clear-- the ability to express grammar, types, data structures and memory management all implicitly through syntax thereby allowing a programmer to quickly and easily develop their own rules for their own particular problems. We also want to accomplish this in the simplest way that allows for the greatest amount of emergence. Want special rules for if-statements? There is no reason why 'If', as a keyword, should receive special treatment- it can be expressed either as a function, an expression, a statement, or a data structure. As a programmer, I want to be able to encapsulate all of these ideas within a single meta-paradigm- and it can all be done with a single type and unified syntax.
I'm not yet talking about the engineering aspect- but the User Interface. I want to think of all types as expressions that represent how aggregations of data are evaluated.

Consider the following:
int foo() { return 4; }
const int bar = 4;


There isn't a good reason why a compiler should make a distinction between these two ideas. There also isn't a good reason why a compiler shouldn't optimize them to inline, immutable, const, static, or whatever else would make them work better when translated to machine code. As such, I think it's important for such constructs to not exist in an explicit manner at all- relative to the UI. Since these two things essentially serve the same purpose, why am I bound by types and syntax with their usage? They're nothing more than expressions. Expressions that, when evaluated or referenced, describe a set of data. In this sense, the meta-language for our meta-paradigm only really needs a single type that describes expressions. Expressions need to be manageable in a way that makes sense to the programmer, and the idea behind vTables does just that. In this sense, we'll find that the best way to represent expressions is with tables, and the best way to represent tables is with expressions. Thus we have the atomic building block for our meta-paradigmatic meta-language-- the dual expression-table.

In the next article I will describe how, with just a single type, we can express some of the most important concepts in programming paradigms.

The Perfect Programming Language: Introduction

This is part one of a series: The Perfect Programming Language.

I grew up hacking away in BASIC on what I believe was a 386. I always found the creation and resolution of problems similar to the tension and release of musical compositions. I love that before you solve a problem you must first create that problem to solve. The relationship between creating problems and solving them within a logical context is both a lot of fun and a great mental exercise.

This is where a very blatant problem in the software development world comes to light- User Interface. When we talk about UI we are typically talking about creating UIs for end-users to interact with. I'm infinitely more concerned with the UI that I am using to develop software- the Language and the Paradigm. Most programming languages are made either by hobbyists, academe or engineers- typically to implement a specific paradigm. Unfortunately, many paradigms force certain problems to be solved in a manner that may not be very elegant for that particular problem. You're typically faced with a choice- switch languages, emulate another paradigm, or use an inappropriate but highly familiar paradigm to solve the problem. As a programmer, I don't ever want to face this issue. It's surprising that after 40 years of languages no one has even attempted to address this problem in a formal manner. We're either bloating new features into C++ or hoping things work as intended in Python.

There is a point in which a programmer doesn't have any issues switching paradigms or languages. Whether it's assembly or a domain-specific language, imperative or functional, each addresses the underlying logic of Boolean algebra in familiar ways. Nonetheless, many of these paradigms are developed in iterations, adding features that appear to be missing or patching problems up with standard libraries. This approach has given rise to a massive quantity of keywords in languages like C++ and Java, where it isn't very clear to the user what the best way to solve a problem is. For example, Tail Calls  in C++ aren't a feature of the language, but of a compiler. That compiler may or may not optimize Tail Calls depending on the arrangement of keywords. It's silly because the language does NOT explicitly communicate to the programmer what is going on. Tail Calls should be a specification of the language rather than optimizations by the compiler. In this way, C++ achieves the functionality that it intends to, but the language itself does not send a coherent message to the user whether their implementation is ideal or not. VM languages usually address the problem of language consistency across tool-chains, but there are always instances where Linux and Windows implementations of the JVM behave differently (the most recent for me had to do with high-resolution timers).

Then there are instances where we simply don't know how a VM or Compiler is interpreting our code to solve the problem. We know that the problem will solve, but we don't know if it is optimal. Python is a very expressive language, but that expressiveness comes at the expense of unknown engineering. The idea of code being Pythonic is wonderful: the shortest solution to a problem should always be the most readable and resource efficient. However, the design of the language doesn't communicate what will be ideal. We don't know how our instructions are being optimized for any particular set of commands, therefore we can't write programs with an expectation that they will be optimized in a predictable manner. This is why python is great for hobbyists, prototyping, math, and scripting, but in no way am I going to build anything with it that needs performance. I don't really know how it is processing my code.

A language shouldn't just perform tasks as we describe, but describe to us how a task will perform. A language should, by the nature of its design, promote optimal code. This optimal code should be obvious to the programmer and coherent to the compiler. The language should also not leverage or specify strict requirements on the developer, but facilitate freedom of expression within a paradigm that is inherently optimal for the problem that we're solving. How could we possibly describe a paradigm that could do that? Instead of specifying a language around a particular paradigm, we specify a meta-language around a meta-paradigm.

There are already a lot of neat meta-languages out there that give us a lot of meta-programming power, but they aren't created for the Meta-Paradigm. By specifying a unique paradigm for any given problem we can easily and directly solve aggregations of problems. Our solution may emerge through specificity of a paradigm rather than applying a general paradigm to a specific problem.

The next article in this series will describe the meta-paradigm in greater detail, with the implementation and technical specification to follow in subsequent articles.