The question is what would make a good introductory language now? It certainly isn't assembler-lite like CESIL was, but neither is it a language like Python that throws in everything including the kitchen sink and has idiosyncrasies that make it non-representative of programming languages as a whole.
I'm really not convinced that this idea that Python is fundamentally a bad language for introducing Computer Science holds because it 'throws in everything'. The argument seems to be that 1. rigour and 2. lack of alternatives contributes to a language's success as a teaching language. In other words, a language that forces your hand into doing things 'properly', because there simply is no other way. I'm not sure this is actually good for learning the underlying 'whys', though, and actually understanding the fundamental concepts that a computer science education is trying to give you.
My objections to Python:
- the syntax. The most minor of my complaints, but I want it out of the way
- there is toooooo much language to learn. Too many constructs, too many rules. You spend more time learning the language than learning how to use it. Sure, Java and C++ and almost everything else is worse. Scheme, assembly language, and Logo are better. Maybe SmallTalk is better too, but it's weird. Maybe Forth or Postscript.
- there is too much that is magical, that you can't see how it works. Arrays, dictionaries, strings ... all that stuff.
- anyone who learns with Python is going to come out of it with no idea what a pointer is. No idea about memory management using heaps or stacks.
- you can't replicate the magical stuff, make your own version of it. In some cases you can, but it will to 50x slower than what is built-in.
- in general, Python is a language for gluing other people's libraries together, not for writing the libraries. OK, that might be what a lot of people spend their working lives doing, but someone has to know how to write the libraries. And I think that should be anyone with "Computer Science" in the title of their degree. The pure library users can learn that at a trade school, not at a Computer Science degree in a proper university. Or in that one CS paper that people in other subjects have to do.
I think assembly language is good to start with. Not an extremely complex one such as x86_64 or armv8 with manuals several thousand pages thick. And not something like CESIL. I had to look that up. It's got 14 instructions; an implied accumulator register; as many named (6 char names) variables as you want (implicitly declared); simple I/O and arithmetic; an unconditional jump; branch if negative and branch if zero; named labels with the same naming rules as variables (no way to tell them apart visually). No arrays, no strings (except string literals you can only print), no pointers or concept of a memory array, no subroutines.
I think RISC-V RV32I is an excellent intermediate position. Only 37 instructions, and only half a dozen instruction types in 4 binary instruction formats, as load & JAL share format with ALU immediate and store shares format with conditional branch. You can write *any* kind of program in it. multiply and divide and floating point need library functions, but you can write your own and if you do a good job they can be as good/fast as the standard ones.
Perhaps more importantly, you can tell gcc or llvm to compile any C or C++ (or Rust, ...) program to RV32I, which makes it very easy to incrementally move to another language once you get more advanced, see how things are compiled without getting surprised by instructions you haven't learned etc.
As small as it is, I did an experiment a few weeks ago where I tried to subset RV32I to the bare minimum. I got it down to the 10 most common instructions which can emulate any of the missing instructions efficiently -- no more than 3 or 4 needed. And the missing instructions are mostly relatively rare, statically and dynamically.
I hand-translated a compiled version of my Primes benchmark to this 10 instruction subset, and the code expanded by less than 30% -- and the execution time by much less than that.
You could start people programming using this 10 instruction subset that can be learned in half an hour (5 minutes for the bright ones).
And it does *everything*. Registers and RAM. Pointers and arrays and structs. Stacks and recursive function calls. Objects and virtual functions. You just need to provide some library for I/O and maybe for malloc/free (or implement your own). You can run the resulting programs on emulators or on real, cheap, hardware at full native speeds.
GCC and LLVM could be taught how to use only these instructions. I don't know how easy it would be to get that upstreamed. Maybe too hard.
As the inconvenience of their lack is felt, the missing RV32I instructions can be emulated with a very simple macro facility with I think probably just one dedicated temporary register reserved for it.
My earlier message:
https://www.eevblog.com/forum/microcontrollers/what-is-the-right-way-to-learn-programming/msg4328953/#msg4328953NB. one of those 10 instructions, NAND, is not in fact a RISC-V instruction. I wanted to get rid of AND/OR/XOR, at least temporarily. They can all be synthesised with 2-4 NANDs, as seen in any digital logic textbook. NAND could be replaced a bit less conveniently by ANDN (x & ~y), which *is* a RISC-V instruction, in the B extension. You'd have to keep -1 in a register semi-permanently (e.g. load it once in the prolog of any function that needs bitwise instructions) to get a NOT(x) operation by ANDN(-1,x). And NAND(x,y) = ANDN(-1,ANDN(x,ANDN(-1,y))).
What it also means, more practically, is that such a language will have a narrow scope - it might be suitable for object-oriented programming, but not functional programming, and so on, because that would offer too many choices and the flexibility to do stupid things that don't make sense. This leads to a lot of time and energy wasted on learning the syntax and boilerplate associated with the 'appropriate' language for the given topic, rather than the actual computer science you're there to learn. Instead of computer science, you spend a lot of time learning a handful of languages poorly, and precious little learning concepts (ie. 'programming' and not 'computer science'). In fact, you make this point yourself ('too many introductory courses to programming are introductory courses to programming in language x'),
I agree with all of that.
so I am surprised that you don't see that the alternative to this is a flexible language like Python that includes the constructs necessary for any sort of programming.
Or assembly language :-)
It would be good to have an assembler with a macro language at least capable of making pseudo-instructions and also boilerplate for assigning register names to local variables and saving callee-save registers in a function prologue (and restoring them at the end). Being able to declare structs and fields in them is helpful too.
IBM mainframes had a really good assembler. Apple copied a lot of it in the MPW assembler in the late 80s, which was capable of implementing Object-Pascal and C++ classes (and subclassing library functions) and virtual functions using macros.
You don't want to take it too far, or you've just reintroduced a lot of complexity to learn. But something a little more than EQU would be nice. Just to ease implementing reasonably complex algorithms and data structures in asm and so postpone moving to C a little.
I believe that such an education, like any other, should be guided by a teacher, who can introduce the computer science concepts in a natural progressive fashion, while providing feedback and guidance based on the students' choices.
Absolutely. Entirely self-guided learning tends to head off in bad directions.