Friday, April 5, 2013

The Perfect Programming Language: Killing C/C++

This is part three of a series: The Perfect Programming Language.

Although the previous article doesn't describe, in any detail at all, how such a language would work, I believe that I've brought up some general contentions that will allow us to approach the problem of solving the meta-paradigm both from the engineering side and the UI side. For now, let's focus on what the properties of such a language would be.

Expression-Table: The Gender Fluid Data Type
In LISP, the program itself is the evaluation of a list that represents an expression. In such a way, I believe that there is a single "gender fluid" data type that can be used to conceptually encapsulate grammar rules, functions, classes, expressions, data structures, arrays, et al. This type would be a Table that is an expression that describes itself- not unlike a particle-wave duality. In C++, classes are originally programmed as vTables in C. A vTable just describes the linking of functions and variables within a table, from which special grammar can then be used to interact with those expressions. In a very genuine way, an object in C or C++ is analogous to a table in Lua or more generically as a Dictionary or hash-table (so long as functions are first-class data types). When we access a member of a class, what we're really doing is looking up an expression in a table whose key is the name of that member.

The following expressions are the same (Syntax is lua-like)-
Class.Member = Class.Subroutine(param);
Class['Member'] = Class['Subroutine'](param);


This works exactly the same way for functions. A pure function, that is one with no side-effects, has a 1-to-1 mapping of inputs and outputs. Is this not the same as creating an expression to describe the relationship of all key-value pairs in a hypothetical associative array? Dictionaries are then just functions whose key-value pairs are explicitly declared- this is also just an expression.

The difference between an associative array and a function is mainly appearance,
Class.Member = Class.Subroutine(param);
Class['Member'] = Class['Subroutine'][param]; --not like Lua


If we can conceptually think of all functions and types as expression-tables, then there is quite a lot of flexibility we can achieve. It's also important to realize that a program in such a language would also be an expression-table. The implications of this are very interesting- we can easily export the state of any program as a data structure, wrap programs within each other for sandboxing or exception handling, stream either routines or data across networks, and interface with databases in a very natural way. These sort of table-expressions can also function as a sort of file system- instead of everything is a file, everything is an expression-table. This design suggests a lot of unnecessary overhead, as can be the case in LISP. In a future article in the series I will talk about how such a paradigm can be engineered without generating any additional overhead. Right now I'm focusing on the UI and how a language should communicate its own functionality to the programmer. Obviously, at some point we need to be able to represent primitives. It's important to note that there aren't any primitives that are exceptional to this idea. Every last bit can also be thought of as an expression-table.

Tables as Scope
Scope is just a measure of what expressions are currently occupying the call stack. When we modify the value of a reference type in a function, we are cheating how scope should work*- this is usually to get around the limitation of a function returning a single type. Ideally, scope should sandbox an expression as a discrete processing chunk. In this way, all parameters or keys are value types and the evaluation of the expression, or the return value, is a table. If we want to modify by reference, it's more data-safe to pass by value and return a modified version that we bubble up to replace the originally passed parameter. For this reason (to become apparent later), it's important for an expression in a table to access, but not necessarily modify, its parent table. Through scoping mechanisms like this, we can engage in automatic memory management without running a garbage collector as a process. That is, scope implicitly defines how memory will be managed (which is partially true true for many languages already). This will be discussed in further detail when I get into the engineering-side.

* This is my own assertion. I can go into detail about why I feel this way if anyone is interested.

 Concurrency, Threading, and Co-routines
What happens when you have two expressions whereby neither expression depends on the other to be evaluated?

What should happen?
Expression1 = someEvaluation(4);
Expression2 = someOtherEvaluation(4);


Given the pseudo-specification of our Scope above, there is no way that either Expression1 or Expression2 will need to be evaluated before the other. In situations like these, a compiler should use concurrent call stacks while leveraging whatever advantage it can from hardware (ie. threading). Values can be read from concurrent processes through the parent-table, though they cannot be modified (though an expression could be called from the parent-table that modifies a parent-table- oh look, privacy). Using tables of greater scope, it should  be easy to infer how co-routinessynchronicity,  and threading can be extrapolated from the way in which Scoping would work. While these aren't new ideas, most languages demand that you explicitly, either with compiler flags or a variety of keywords, declare how something is going to be used. I think that the ideas behind each of such keywords can be implied by the structure of the program in a way that is clear to the programmer.

Imperative is Arbitrary
edit: See Lazy Evaluation.

Consider the following expressions,
X = 10
X = fib(10000000)
X = 0
X = fib(X)
X = 4
print(X)

Assuming that fib is a pure function with no side-effects, what should a compiler do? An imperative language would try to evaluate and store fib(10000000) and fib(0), despite the fact that these evaluations aren't used anywhere. As we step through the evaluation of an expression-table, it isn't necessary to evaluate every expression in that table as it appears. We only need to evaluate expression-tables that are necessary for other expression-tables. Since print(X) has a side-effect, we need to know the value of X. My question to the reader: What is X?
X is a key in the parent-table. Since our expressions are managed in a table, all we need to do is update the expression that is represented by X. We don't actually need to evaluate those expressions when they are assigned (RAII). Instead, we just point to the location of the expression in memory and, at compile-time, we can safely reduce the above program to print(4). It's important to realize that we can achieve imperative behavior by embedding assertions into our expressions that enforce evaluation. Since a program is an expression-table itself, we can create an expression-table that wraps any other expression-table into an imperative paradigm without any change in the original program's language (we can also do this to disable or manipulate concurrency). While this in itself is cool, the ramifications include the creation of grammar rules. So long as we have a syntactically distinct way to not only declare what expression a table represents but also how that table will be used grammatically, we can make things sweet in whatever way we want.

edit: Instancing
i don't think it's really necessary to explicitly discuss how an expression-table would create instances of itself to emulate data-types, but it really is no different from a constructor function. We create an expression-table that evaluates to an expression-table with specific and desired qualities. We can modify the grammar to make it look like traditional type definitions and even include flags or a wrapper to enforce strong type checking.

The idea is really quite simple-
  • All data, subroutines, programs, chunks, processes, scope, types et al. can be described with dual-type expression-tables.
  • Scope implicitly and intrinsically handles concurrency and memory during the evaluation of an expression-table.
  • The syntax for defining grammar and expressions are unified.

For now, we have a conceptual UI with a rational basis behind it and an idea about how it might work. The expression-table is a very powerful idea that, when engineered appropriately, should carry no significant overhead and allow a genuinely fluid expression of ideas.

In the next few articles I'll specify, more explicitly, how this language's syntax would look and how it could be engineered (the fun and interesting parts). At some point, I'll try and create a formal specification and go about implementing this language.

4 comments:

  1. I think you are limiting scope too much; there's more to it than just local to block to local to process. Other types of scopes can (and do) exist. What about thread scope (aka "thread specific data" in pthreads)? There even exists scopes beyond a single process. What about node (or computer) scope? You know, like an IP address of an interface, or the computer name? There's local network scope. Intra-network scope, and inter-network scope (which finally brings in a true definition of "global").

    Heck, now can have inter-global scope, what with all the probes on Mars.

    About concurrency? Congratulations, you have now invented lazy evaluation, a feature that quite a few functional programming languages have.

    ReplyDelete
    Replies
    1. The main 'feature' I'm pushing for a single type to describe everything. Permutations of that type with itself can allow different types to emerge. An intimate understanding of this single type will provide a lucid understanding of all lingual derivatives.

      How would my explanation of scope make higher-level abstractions of scope impossible? I was looking for a simple rationalization of scoping rules that could implicitly manage memory and concurrency. From there, any higher-level constructs could be specified by an arrangement of expression-tables. I don't see how network scope causes problems. Inter-computer communication relies on messages that ultimately become scoped relative to a given machine.

      Delete
  2. Here's an approach to describing everything with a single type: http://codon.com/programming-with-nothing

    ReplyDelete
    Replies
    1. Yay! Thanks. That's the sort of stuff I'm interested in. If you have a mutual interest, I found a thesis paper that tries to achieve some similar concepts http://www.chrisseaton.com/katahdin/. It isn't built from a single type, but it allows you to define grammar rules within a similar (it isn't quite the same to be honest) syntax as a part of the definition of objects.

      Delete