Instead of flat text files, the source code of this language would be better stored as a token stream, alongside with token-to-text mappings in each language for that project, say in a zip archive like Microsoft Office OOXML files.
Source code editors would handle lexical analysis, with the compiler, JIT, or interpreter processing the token stream instead.
Each token could be typographical (say, "Newline", "Indentation of one level", "Indentation of two levels"), syntactical (say, beginning of a quoted string, end of a quoted string, object member reference), operator (addition, subtraction, negation, multiplication, division, assignment, equality comparison), name, language keyword, and so on. This would allow things like one developer seeing quoted strings as "Thus" whereas another would see it as «Thus» and yet another as “Thus”; assignment could be := or = or even equals, for each developer working on the same source, depending on their personal preferences (by just modifying their personal token mapping). The editor would be responsible for ensuring whatever the developer writes is unambiguously tokenized.
Literal strings themselves could be included in the token mapping, although it probably should be independent of the developer token mapping, as it could be used for runtime localization (say, multi-language user interfaces).
One option for these mappings would be to store source as its Gödel number, assuming each token (and literal string) is assigned an unique natural number.
For security, the ZIP files could be encrypted with a per-project key, a reverse public key. (That is, the public key is used to encrypt the contents, and the private key used to decrypt the contents.) Or you could use a centralized project key storage. The latter would be very interesting in the business sense, as obviously the development of both the editing environment and the toolchain requires resources, so the vendor managing the project keys would ensure licensee validity, even while letting toolchain and IDE downloads be freely available.
This is quite true, this has crossed my mind too. But then we would not be able to peruse raw text files, we'd need tooling to replace the abstract token codes with some real human vocabulary.
How else would multicultural and multilingual development teams cooperate?
Consider this the programming equivalent of personal pronouns: each subgroup gets to define how the code looks to them, without any oppressor forcing them to use a specific form. I'm quite sure this is the future of socially aware software development.
(This might also open up interesting possibilities for funding the development of such a programming language.)
Yes, good questions and I don't know exactly how a team might choose to work, let me elaborate on what I've been doing then try to answer you.
At this stage I've proven that a self-consistent non-ambiguous grammar can be devised that is insensitive to the exact spelling of its keywords, I wasn't 100% sure but suspected this was possible for a grammar that has no reserved words (like PL/I, Fortran etc.). I also wasn't sure if current parser tools were able to do what I needed, I've used them before and they can block progress sometimes once some idea is tried in a grammar, also hand crafted lexers and parsers are possible (I've written these before) but they make experiments, changes very hard, very slow going, so slow that one ends up not experimenting much.
Anyway Antlr is beyond my highest expectations, it is truly very powerful and far beyond anything I could craft by hand, in a couple of seconds I can tweak a grammar rule and regen the parser source code and test it, very powerful indeed.
At parse time I specify the keyword lexicon code "en" (English), "fr" (French) etc and the generated lexer knows what the keywords should be for that language code, the parser code is agnostic, has no knowledge of keyword spelling at all other than in the abstract sense.
So, how to use this?
Well the language code could be a compiler option or a preprocessor setting within a source file or inferred from the name of the file, there are several ways one could do that, not a huge effort.
Also the parser code is a class library (Java or C#), so can be used as part of a compiler or within another tool, for example I'm looking at a simple command line tool that can consume a source file in one language and create an output in another, this is not very hard to do, yes there's code involved to recreate a text file from the parse tree, spaces, comments, line endings etc but that is - in principle - not much of a problem, I asked the Antlr team, I've not looked at that in earnest yet but I might do soon.
Anyway it should be easy to build such power into an IDE or editor, there are numerous editors that support all kinds of extensibility (VS Code is very good) so one could just open and edit a *.ipl" file and click a dropdown "Convert to..." where we can choose some target and the tool will instantly refresh the file with the source code in that chosen language.
Or just as easily, we can envisage "Save As..." where we can save "test_1_abc.ipl" as (say) "test_1_abc.ru.ipl" (or any name really) and specify "Russian" as the choice of keyword lexicon, the tool would regen that file being saved into the equivalent in Russian using the same tool I describe above.
Or "Open As" could open a file in whatever language, into whatever language one wanted to edit the file in!
I've been thinking about a way to detect the lexicon used in a source file, there are ways to do this.
These ideas mean we could live in a world where we can open any file in any language yet see it only in our chosen language and when we save it it gets saved back in its original language, all invisibly, these are all serious possibilities.
These and other reasons are why I've been focused on the grammar, one cannot start to implement semantic processing (and code generation really) until one commits to a grammar and once you've written that stuff it is very very very hard indeed to go back and adjust the grammar without often complex rework on the semantic processor etc, the goal has been to get to a grammar that can support essential features and then start to design the semantic processing and then the rest of the parts in a compiler's back end.
The back end is to all intents and purposes a relatively routine phase of a project like this, back end's are decoupled (largely) from the front end grammar but we do need the middle - the semantic phase, optimizers etc.
Antlr does not generate (or assist with the generation of) an abstract syntax tree, that code is an essential part of a full compiler and as you can appreciate the AST generator must consume the parse tree so if the grammar had to change much, the parse tree would and then we'd have to rework the AST generatror along with any code we'd written that was consuming that AST, lots of needless work, wasted time.
These and other real world issues are being glossed over by some of the naïve detractors posting in this thread, they asked several times "why are you fixated on syntax rather than the nitty gritty compiler and code generator" I'm afraid such a question only reveals their naivety about real world "gloves off" compiler design.
Just to stress, the multiple keyword lexicon was not initially on my list of goals for the grammar, I only added it after realizing it was possible and with very little effort when using powerful tooling like Antlr. An eventual compiler would let one work wholly in a single language if that's all they wanted to do, seamlessly, there's no impact on simple basic use by having this multi language feature, if one doesn't care for it then disregard it.