Pure ActionScript 3.0 lexer


#1

I’ve written many AS3 lexers secretly, but here’s one in pure AS3.

Quick show:

const hey: Ãufin = NothingEverDies, yeah

Some sample errors:

\\u0000 .999f


#2

Would it not be simpler to take the Java sources of ASC and port those to AS3 ?

also, because a compiler is in general a command-line tools
I would suggest to use Redtamarin


#3

@zwetan ASC is very hackish and its AST depends in ASC semantics, what complicates the learning curves.

About RedTamarin, maybe it could be a good idea. https://github.com/Corsaair/redtamarin/issues/66 If I execute a SWF, then I’m able to have normal package dependency. Maybe I need to strip SWF content and only keep DoABC content, so RedTamarin is able to execute it instead of running ASC, which doesn’t handle packages flexiblily.

E/: Nice. RedTamarin parses SWF DoABC and DoABC2 already!


#4

Ok, I’m done with the bullshit

despite that you keep posting without giving a shit on this forum
your comment above is what really get me pissed off

you’re judging ASC source code while on your side you seems to be 12yo, no experience in programming,
and never wrote a compiler before

so here a bit of advice you arrogant prick, have a bit more respect for the work of other more experienced developers

I could go in length explaining this and that but it seems it is over your head, so whatever


#5

OK so I gonna gives more context to all that

the user @Hydp keep posting on this forum

  • without selecting a category
  • use many different account, for ex: @hydroper @SuperAnimexKai
  • post stuff and then delete them afterward

and let be clear, I’m not “losing it” over a couple post without categories,
this occurred many many times under different account, one such account was warned
and blocked after continuing such behaviour

that was I meant by “despite that you keep posting without giving a shit on this forum”

Now about being pissed off
@Hydp is obviously new to programming, and that’s OK I would not blame anyone for that
but … when you’re new to stuff you don’t come and shit on other developers work
especially when you pick quite advanced subject that are apparently above your head

so yeah when I see @Hydp commenting “ASC is very hackish”
I’m pretty sure he does not know what he is talking about

so here I gonna give a couple of advices, instead of shitting on developers work who probably have decades of programming experience over you, just say that you don’t know or you don’t understand and you need help

don’t pretend you are “above it” because it does feel a lot like “you are full of it” (and yes I mean shit)

and to be complete, the final comment
“I could go in length explaining this and that but it seems it is over your head, so whatever”

it is a reaction about you not having a clue what you’re talking about

in details

About RedTamarin, maybe it could be a good idea. https://github.com/Corsaair/redtamarin/issues/66 If I execute a SWF, then I’m able to have normal package dependency.

executing a SWF is not related to have package dependency which is not related to compile source code to byte code

Maybe I need to strip SWF content and only keep DoABC content, so RedTamarin is able to execute it instead of running ASC, which doesn’t handle packages flexiblily.

you are confusing everything

Redtamarin is a runtime that does interpret bytecode either from ABC or SWF
and to some extend can also interpret raw source code from AS file

But ASC is the opposite, it is a compiler, it takes a set of symbols (source code) and convert it to another set of symbols (bytecode)

Simply put when I suggest to use Redtamarin is to be able to program in AS3
but still get a command-line executable and somewhat similar API to Java
to read/write file for example

And I would have gladly explained stuff and nudge you in the right direction like I did numerous times before,
but you started with that ASC remark …


#6

I’m not entirely new to programming, I started in 3.1 years ago (december of 2014), and my first language was ActionScript 2 or Lua (with which I could script Transformice modules). I’ve many different things to do, but, well, this project of building a own compiler (but truly, I was just trying to write the compliant parser firstly), starting after looking the luaparse parser, was with the objective of ensuring I’ve a consistent base for my projects. Now, right now, right in that day, I’m with much headache and am still unsure on whether I’ll continue with this journey. I’m done with the lexer: I worked hard for having my own lexer. When I restart something, I preferable do it from the scratch. And there were many different ways I did that lexer. But in the end I must confess I wasted good time doing that, between 2016 and 2018, and this affected what I used to do.

Note that by consistent base I meant something that does good things directly for me, which is very portable and modern. Basically I knew ASC recently because I always used Macromedia/Flash Professional (and cracked by someone who cracks Adobe applications). ASC is almost good because it’s done in Java, but it’s not like MXMLC, which solves package definitions before full AS3 AOT/strict verification. Then I could just use MXMLC, but I found it quite inelegant… But right now in this day I’m thinking differently.

executing a SWF is not related to have package dependency which is not related to compile source code to byte code

I know, I said If I execute a SWF, then I’m able to have normal package dependency. because SWF is the unique output format of MXMLC. (There’s also SWC, but it’s a ZIP.)

you are confusing everything

You didn’t understand what I said. By instead of running ASC, I meant I’d not do a command such as redshell myInput.as, which invokes ASC automatically (AFAIK).


#7

I have many questions. So what happens if someone is able to create a Lexer? You can compile AS3 source code into byte code or abstract syntax tree or show errors in AS3 source code?

I’m guessing an AS3 lexer would be good if you do not want to depend on a specific operating system and then you don’t need mxmlc? But MXMLC is a compiler. If the lexer is AS3 then you can put it on the web or desktop or mobile?


#8

Sorry, I wasn’t totally clear in the topic, but a lexer is basically a lexical scanner: the lexer transforms program source into tokens eventually. The lexer doesn’t generate AST or bytecode, it just derives informations that a parser would normally need, which are really tokens.

The lexer I posted in GitHub uses a singleton TokenFeed class for updating token informations, such as the token kind (from the Token enumeration) and values (double, String and Booolean).

This last lexer I made is a bit more easy to extend. Instead of calculating numeric literals manually I decided to just use parseInt as it won’t be a performance worry for now. (Wait, I forgot about parseFloat too.). Also, in earlier lexers I used to do punctuator scan in a hierarchic derived way.

This lexer is okay for usage, and is multi-lingual, but I didn’t implement messages for languages other than English until now (Portuguese I know, but I think English is more preferred than it). It’s basically complete: it follows ECMA-262 rules.

I’m guessing an AS3 lexer would be good if you do not want to depend on a specific operating system and then you don’t need mxmlc? But MXMLC is a compiler. If the lexer is AS3 then you can put it on the web or desktop or mobile?

Well, like I explained above, the lexer just transforms raw AS3 source into tokens, so it won’t serve as a MXMLC compiler alternatiive, but it can be used to prototype a parser, then prototype a compiler in AS3, which can be compiled with MXMLC and then bootstrapped if preferred.

This lexer can be used as syntax highlighter, but it’s neccessary to scan tokens correctly. Its main method is lex(), which scans a single token, but… there are plus three methods: scanRegexpLiteral(), XMLTagMode::lex() and XMLContentMode::lex(). These XML scan modes are for E4X lexical grammar, whlist scanRegexpLiteral() must be called when you find a slash punctuator (division operator such as / or /=) in an expression context…

import hydroper.asparse.*
import hydroper.asparse.lexer.*

const token: TokenFeed = TokenFeed.token;

myLexer.lex()

if ((token.kind === Token.Slash) || (token.kind === Token.SlashAssign))
{
    myLexer.scanRegexpLiteral()
    // Now, token.kind === Token.RegexpLiteral
}

The XML modes concept are exactly the InputElementXMLTag and InputElementXMLContent goals of the ECMA-357 standard (2nd edition), the E4X language.

My lexer isn’t oriented to syntax highlighting, even due to the manner it pushes ignored comments, but you have full access to line and start/end column (truly start/end indices) informations. I think it just needs a update to handle dynamic changes. :wink: