I personally hold the view that designing, writing, maintaining and enhancing software is best described as a "craft" rather than "science" or "engineering", I liken it to architecture myself, there are aspects that are based in physics and mechanical engineering yet there are also aspects that are aesthetics, style, atmosphere, art and so on - both aspects are essential.
Spoken like a true Windows-only C# programmer who has zero idea about verifiability, robustness, or security. Congratulations!
I don't recall when I last read such idiotic drivel, and I often read people spouting conspiracy theories at Rense and AboveTopSecret.
It's like listening to a bridge-builder describe how they scoff at stuff like engineering calculations, and instead prefers to call themselves a
craftsman. Would you trust such a bridge? I wouldn't.
The world is already chock full of crappy software written by craftsmen and monkeys, ignoring all engineering principles and scientific thinking and research.
We do not need a single more. Please!
My last few posts have been an experiment. As I've mentioned in other discussions, I sometimes entertain ideas and concepts I do not believe in at all, like conspiracy theories, just to find out where they lead to, and how their proponents see the world around them. In this case, I inverted everything I have learned in practice about teamwork, portability, maintainability, and so on, and wanted to see if OP has any pushback to the known non-working ideas I would describe.
The concepts I described:
- #743: token stream, user-defined mapping between tokens and their human-readable/printable representations, ZIP archives containing multiple files, encoding the source as a Gödel number, encrypting source files, centralized vendor storage for encryption keys for keeping source code 'hostage'
- #761: replacing interaction and common rules ('laws') by each team doing their own thing, minimizing inter-team interaction; using localization as a barrier between teams by encouraging them to use different languages during development; leveraging political fads instead of technical suitability to entice/force developers to use (this) particular programming language
- #803: leave the token pattern/sequence to AST mapping to the users to define; use of a centralized source code management to score and enforce political fads and views on developers; use of ideology as the central idea in the programming language to attract attention, while completely ignoring all technical aspects
- #805: banning comments, using remote and often unavailable resources for 'documentation' and 'explanation' in code; also, re-read the post subject
- #811, as the discussion is concentrating on ridiculously dysfunctional or irrelevant details: clearly referencing Intercal (subject) and Malbolge. Apparently throwing buzzwords with a couple of examples of ridiculous and dysfunctional programming languages, and concentrating on old (FORTRAN77, COBOL) and single-vendor controlled, basically single-OS environments (CIL), raises no objections or comments at all.
None of these work in practice, but are enticing if and only if one does not consider the implementation or functionality and the related consequences.
Human-readable text is converted to tokens using
lexical analysis. The sequence of tokens are converted to some form of abstract syntax, usually an
abstract syntax tree (AST), by a
parser performing syntactic analysis. Many – if not most – interpreters and compiler input frontends use C code generated from a Backus-Naur form specification by
flex and
Yacc or
GNU Bison. The latter can also generate lexers-parsers for C++ and even Java.
It is the mapping from this abstract form, to the abstract forms that can be generated into machine code, that defines a programming language. For example, there is no such concept as a "function" in machine code. There are
subroutines, and an instruction that will call a subroutine, which ends at some kind of a return statement, which causes the execution to continue at the instruction following the call. So, to map a function definition to a form comprehensible to the backend, you need to include things like "include this symbol (the name) in local symbol table using type 'function', ..." and so on.
There is a lot of scientific research on how this can be done. Leaving it to users is like giving an English-only speaking student a Hungarian dictionary, and telling them to write a novel in Hungarian. Perhaps one can consider writing to be a craft, but unless you know the rules and science, all you generate is word soup –– or horribly crappy software.
The above means that
syntax and grammar, including the selection of reserved keywords and operators –– all tokens recognized ––, is something that can and should be finalized
last, just before the language specification is released. Until then, the syntax and grammar can be varied at will, without affecting the core of the language:
exactly what it does, i.e. the mapping between the language-specific AST and the language-and-architecture independent AST used by the code generating backend. Starting at syntax and grammar is like deciding to create a new spoken language like Esperanto, and prioritizing deciding whether the language will use
Oxford comma or not. Utterly ridiculous and irrelevant! Such details are best resolved
last, not
first.
A lot of the practical progress in the last decade or so in computer science has been in
optimization at both the frontend (language-specific AST) and backend (architecture and language-independent AST), using (old) discoveries in graph theory, and basically the exponentially increasing computational power allowing slower/more computationally costly algorithms to be used to simplify the ASTs. Using resources like
Compiler Explorer (godbolt.org) to examine the machine code generated by different compilers and different optimization levels, shows both how much work has already been done here, but also how much work there is still to be done with the current compilers.
For the GNU Compiler Collection, the backend language is called GENERIC (and GIMPLE is its simplified version, it being possible to map GENERIC to GIMPLE, splitting certain constructs into multiple ones); see
here. LLVM
is the backend language, although 'intermediate representation' (IR) is used when 'LLVM' would be ambiguous; see
here.
In a typical complex application/service/firmware build, there are multiple stages. It is not only the 'compiler' that accesses the source code, various tools from code generators (like flex and yacc/bison) to
static code analysis to dependency checkers (to speed up builds) and so on. Having the exact source code in plain text formats allows the widest possible set of tools here; including project-specific custom tools (like the many ones used in building the Linux kernel, for example; from configurators to formatting checks to locking scheme verification).
Even script languages like Python use the plain text source, and 'compile' to internal token/bytecode format dynamically, although it can also store the binary representation of the source code in
.pyc files if so requested, to reduce load latencies; even then, it does verify that the binary file is more recent than the text source file. This is not just an arbitrary decision, but a decision based on lots and lots of experience. Plain text sources will always be more versatile (and therefore useful to users and developers) than binary formats.
While ZIP archives sound good, they are basically write-only. To modify just one file, basically the entire archive has to be recompressed; or at least recopied. Thus, it makes zero sense to use any kind of packed archive format for source code during development, because it simply slows down I/O. Using a sensible directory/folder structure will always work better. Learning how distributed source management tools like
Git work, will give you a much better picture of the existing knowledge and experience in this subfield.
Gödel number was an indication that I'm being surreal; I couldn't just be serious, because that could actually lead people who find my contributions worthwhile reading, astray. It means that instead of a sequence of natural numbers (identifying the tokens), the source code is encoded as a single huge natural number: a product of the prime powers of the token. The
i'th multiplier is the
i'th prime raised to the power of the natural number representing the
i'th token. While encoding is reasonably fast, decoding involves prime factorization of a natural number possibly hundreds of thousands of bits long. There is absolutely no reason to do this!
Vendor lock-in and vendor control of our projects is something you might not be afraid of at all as a developer,
until it bites you. Then you'll avoid it like the plague it is. The vendor owning encryption keys (and essentially having access to all clients' encrypted sources) is probably the scariest scenario of rent-seeking vendor behaviour that I can think of. No sane experienced developer will subject themselves to it, it is simply just too risky: years of work may be irretrievably lost because of 'minor vendor policy change'.
As described by others, source code comments are worthless if they describe what or how the code does what it does. We know from decades-old projects that are still alive and maintainable, that the most useful and practical comments are those that describe developer intent and assumptions. For example, if you refactor a function, you typically should not need to change the comment describing the purpose and assumptions of the function, unless the assumptions (for example locking scheme) changes.
I have done my best work in teams with wildly differing members. I have also taught basic computer skills (to both youngsters at the turn of the century, as well as to teachers older than myself), mentored dozens of programmers (from just about every continent except Antarctica), worked for two different universities with lots of exchange students (so much so that typically even lunch discussions were in English, to not exclude anyone present – it was especially funny when we noticed everybody present was fluent in Finnish). I have lead teams, interviewed and hired people.
In my experience, synergistic collaboration – where teams become more than just the sum of their parts – requires effective, clear, honest, direct, and free discussion, debate, and even argument. Everybody participating must agree that arguments/debates are done at the logical level, and must only concern
things; and peoples' personalities must be kept out of it. The rules must be clear, easily checked, and apply to everyone equally.
Intersectionalist and cultural relativist approach where each person is governed by rules depending on non-technical aspects of their personality will not work, and will destroy teams and significantly degrade their work product and performance, because humans are social animals with inherent,
biological concept of
fairness, specifically
procedural justice and
distributive justice (as observed in several animal species also).
Competitiveness based on
meritocracy is useful, but may lead to internal conflict within a team, so I recommend larger emphasis on team rewards than individual rewards in most cases; it does vary on the situation, though.
In simple terms, the current fad of political correctness is dysfunctional in development environments. It leads to reduced discussions (because all discussions have a risk of
offense, so avoiding discussions is always the safer option in PC-conscious environments). We can see this in how certain open source projects (funded by Red Hat/IBM in particular) are already asking for isolation from users, especially in regards to problem reports. They would prefer to receive only automatically generated problem reports, even though they contain far less information.
The period of stagnation in the GCC community in the first decade of the current century was largely because the key GCC developers refused to consider anything external to the project, unless the reporter had at least a PhD in Computer Science. I do not know how those developers were retired, whether it was mutually agreed upon, or some kind of machination between the GNU foundation and the GCC project, and how much of the change was driven by the eventual shift from C to C++, but overall, it is a perfect example of how isolated open source developers can kill projects.
Also, the longest-lived, most 'successful' (for various definitions of 'successful') projects, have traditionally been led by 'dictatorial' leaders, often with analogs of 'advisory councils'. As usual, the leader mostly sets the social atmosphere and long-term goals in the project, but also seems to be the quality enforcer: responsible for rejecting sub-standard suggestions. For the Linux kernel, Linus Torvalds has been berated for his harsh language, not for his 'autocratic control'. Democratic processes are easily subverted (as seen by for example the Debian project regarding init systems; just read the discussions prior to the general resolution, and compare to the consequences observed today), and many respectable developers occasionally produce utter garbage that deserves to be rejected. See for example Greg KH's suggested kdbus subsystem, discussed at the linux-kernel mailing list. (RedHat proponents at the time believed that if they got Greg KH to suggest it, his
authority would be enough to silence any objections. No, that didn't work, because the contribution was crap. The good thing about it was that Greg KH just accepted the rejection, and that was it: it did not, and should not, affect his later contributions, because contributions are and must be evaluated on their own merits, and not based on who submitted them. He is still very highly regarded Linux kernel developer, and I personally 'trust' him; nevertheless, whatever he submits as a new feature, requires exactly the same scrutiny as any other submission. This is how a divergent, distributed team functions well.)
I could also describe why I find complex runtimes like Java and CIL objectionable, but I'm not sure it is worth going into here. Simply put, they are an unnecessary abstraction layer that attempts to gloss over differences in environments and operating systems and processor architectures. In my experience, this leads to the phenomenon historically referred to as "writing FORTRAN code in any programming language". While it makes development available to less skilled developers, and therefore cheaper, they really bring no real benefits (compared to native code written in a comparable language) to the end users.
I just care more about end users than developers myself; specifically, the end users that contribute their time, effort, or money to my open source projects; and the paying end users for the proprietary projects I've contributed to. With sufficient skills and ability to implement anything I want in about a dozen programming languages, the Java RE and MS CIL are unnecessary and unwanted abstraction layers to me: waste of resources.
To repeat once again: to develop a new language, you need to consider both how the concepts and algorithms are implemented in that language, and what kind of machine code one should expect them to generate. Starting from the syntax and grammar is starting from how the code should look like; something that logically should be decided last, just before the language specification itself is finalized. You need to ask yourself
"Why would I want to implement algorithm X or system Y in this language, as opposed to some other language?". If your answer is
"Because it's better", you do not have the skill in design, logic, and software engineering to pull it off, sorry; and are just daydreaming about appearances without any actual content.