Sorry, I wasn't totally clear in the topic, but a lexer is basically a lexical scanner: the lexer transforms program source into tokens eventually. The lexer doesn't generate AST or bytecode, it just derives informations that a parser would normally need, which are really tokens.
The lexer I posted in GitHub uses a singleton TokenFeed class for updating token informations, such as the token kind (from the Token enumeration) and values (double, String and Booolean).
This last lexer I made is a bit more easy to extend. Instead of calculating numeric literals manually I decided to just use parseInt as it won't be a performance worry for now. (Wait, I forgot about parseFloat too.). Also, in earlier lexers I used to do punctuator scan in a hierarchic derived way.
This lexer is okay for usage, and is multi-lingual, but I didn't implement messages for languages other than English until now (Portuguese I know, but I think English is more preferred than it). It's basically complete: it follows ECMA-262 rules.
I'm guessing an AS3 lexer would be good if you do not want to depend on a specific operating system and then you don't need mxmlc? But MXMLC is a compiler. If the lexer is AS3 then you can put it on the web or desktop or mobile?
Well, like I explained above, the lexer just transforms raw AS3 source into tokens, so it won't serve as a MXMLC compiler alternatiive, but it can be used to prototype a parser, then prototype a compiler in AS3, which can be compiled with MXMLC and then bootstrapped if preferred.
This lexer can be used as syntax highlighter, but it's neccessary to scan tokens correctly. Its main method is lex(), which scans a single token, but... there are plus three methods: scanRegexpLiteral(), XMLTagMode::lex() and XMLContentMode::lex(). These XML scan modes are for E4X lexical grammar, whlist scanRegexpLiteral() must be called when you find a slash punctuator (division operator such as / or /=) in an expression context...
const token: TokenFeed = TokenFeed.token;
if ((token.kind === Token.Slash) || (token.kind === Token.SlashAssign))
// Now, token.kind === Token.RegexpLiteral
The XML modes concept are exactly the InputElementXMLTag and InputElementXMLContent goals of the ECMA-357 standard (2nd edition), the E4X language.
My lexer isn't oriented to syntax highlighting, even due to the manner it pushes ignored comments, but you have full access to line and start/end column (truly start/end indices) informations. I think it just needs a update to handle dynamic changes.