A little copying is better than a little dependency


#1

This started from a comment in a post in Hacker News

“Compiler writers are sometimes surprisingly clueless about programming.”

Sure, but there’s another aspect, which is trying to guess at what the minimum
level of knowledge the user of the language should be expected to have. For an
interesting debate on if an isOdd(int i) as opposed to just using (i & 1)
should exist as a standard library component or not, see:

https://digitalmars.com/d/archives/digitalmars/…

So, what intrigued me at first, was not really that “compiler writers are clueless about programming”,
because I do think it should be the opposite, “programmers are clueless about compilers” that should be more worrying.

But then this long discussion about including isOdd(int i) (or not) is much much more interesting.

Humm… how to put it? …

Writing compilers is something apart, it is needed, but it is not really a daily problem.

While making the decision to include a function, a class, etc. into a reusable library is
something anyone face every day, but a deceptive one as most developers think they already figured it out, because it is “so simple” right?

Usually it goes like that:

Hu ho here I need to trim those particular chars at the start of this string

If you’re using a programming language A you might already have this functionality builtin into the language, so you just reuse that.

Let’s say you use C# you just gonna use String.TrimStart
because it is alresdy there and it is the functionality you just need

And then if you’re using another programming language B, you might not have this particular functionality and so you have to write it yourself, or find a library that have this functionality already defined and reuse it.

So if you’re using ActionScript 3, well yeah there you don’t have such method and so you could quickly define it

package utils
{
    public function trimStart( str:String ):String
    {
        return str.replace(/^(\s|\u00A0)+/g, "");
    }
}

you would define it “like that” because at that particular moment you just need to trim whitespaces and only need to trim them at the start, job done and move on.

But what if you wanted to make that into a library, something reusable?

Suddenly, the scope of the problem change

  • what if we want to trim other chars than whitespace?
  • what if we want to trim at the end?
  • what if we want to trim both the start and the end?
  • what if we want an easy default (trim whitespaces)
    but have the options to trim other chars too?
    or only other chars and not triming whitespaces?

and so you can end up with all that

package core.strings
{
    public const whiteSpaces:Array = 
    [ 
        "\u0009" /*Horizontal tab*/ ,
        "\u000A" /*Line feed or New line*/,
        "\u000B" /*Vertical tab*/,
        "\u000C" /*Formfeed*/,
        "\u000D" /*Carriage return*/,
        "\u0020" /*Space*/,
        "\u00A0" /*Non-breaking space*/,
        "\u1680" /*Ogham space mark*/,
        "\u180E" /*Mongolian vowel separator*/,
        "\u2000" /*En quad*/,
        "\u2001" /*Em quad*/,
        "\u2002" /*En space*/,
        "\u2003" /*Em space*/,
        "\u2004" /*Three-per-em space*/,
        "\u2005" /*Four-per-em space*/,
        "\u2006" /*Six-per-em space*/,
        "\u2007" /*Figure space*/,
        "\u2008" /*Punctuation space*/,
        "\u2009" /*Thin space*/,
        "\u200A" /*Hair space*/,
        "\u200B" /*Zero width space*/,
        "\u2028" /*Line separator*/,
        "\u2029" /*Paragraph separator*/,
        "\u202F" /*Narrow no-break space*/,
        "\u205F" /*Medium mathematical space*/,
        "\u3000" /*Ideographic space*/ 
    ];

    public const trimStart:Function = function( source:String , chars:Array = null ):String
    {
        if( chars == null )
        {
            chars = whiteSpaces;
        }
        if ( source == null || source == "" )
        {
            return "";
        }
        var i:int;
        var l:int = source.length;
        for( i = 0; (i < l) && (chars.indexOf( source.charAt( i ) ) > - 1) ; i++ )
        {
        }
        return source.substring( i );
    };

    public const trimEnd:Function = function( source:String , chars:Array = null ):String
    {
        if( chars == null )
        {
            chars = whiteSpaces;
        }
        if ( source == null || source == "" )
        {
            return "";
        }
        var i:int;
        var l:int = source.length ;
        for( i = source.length - 1; (i >= 0) && (chars.indexOf( source.charAt( i ) ) > - 1) ; i-- )
        {
        }
        return source.substring( 0, i + 1 );
    };

    public const trim:Function = function( source:String , chars:Array = null ):String
    {
        if( chars == null )
        {
            chars = whiteSpaces;
        }
        if ( source == null || source == "" )
        {
            return "";
        }
        
        var i:int , l:int;
        
        l = source.length;
        for( i = 0; (i < l) && (chars.indexOf( source.charAt( i ) ) > - 1) ; i++ )
        {
        }
        source = source.substring( i );
        
        l = source.length;
        for( i = source.length - 1; (i >= 0) && (chars.indexOf( source.charAt( i ) ) > - 1) ; i-- )
        {
        }
        source = source.substring( 0, i + 1 );
        
        return source;
    };
}

So right away it get bigger, you basically trade space for reuse, you write it with most needed options, you test it, you oreganise it, compile it into a library and then can reuse it everywhere you need to trim some string.

But here the catch: what is better?

One side could go

var trimstr:String = somestr.replace(/^\s+/g, "");

Another side could go

import core.strings.*;
var trimstr:String = trimStart( somestr );

(or any other variant of reusing a library)

If in your whole program you need only once to trim the start of a string well … you don’t really need to use a library right?

But if your program need to trime the start of the string in a dozen different places sure … you would want to apply the good old principle to nto copy/paste code and isolate the functionality in a utility function and why not just reuse an already existing libray.

Simple right?

Not so fast …

if it is just for yourself, your own code, your own programs etc.
whatever which way is good, make a library, don’t … your choice

the problem is indeed when you do want to share the code with others
and it is a problem because you can not please everyone

some will find the code so trivial they will find it useless
some others will find it a bit useful but think it misses those 50 other options
and some more will dislike your code so much they will write their own library

And in fact the problem is so not figured out that depending on the language
you get more or less in the “default”

Java, C# will have a “core framework” with a lot of stuff

JavaScript will just have minimal “builtins”

ActionScript 3 also have minimal “builtins”
and then the Flash Platform added the Flash API but that’s not really a framework

So yeah there is no universal solution and different dev have different taste / knowledge / standard / etc. and then the programming language itself has its own way of doing things, it is extremely hard to find a good balance.

but then what do we do?

You could follow what seems good advice or even what some people call “best practice”

In some case you will hear “code reuse is good”
or in the similar vein “do not repeat (copy/paste) the same code”

but then this can be interepreted differently

see this for example
Why has there been nearly 3 million installs of is-odd - npm in the last 7 days?

people will justify it with that “code reuse” thing
other will think it is utterly stupid

so what do we do?
isOdd(int i) or (i & 1) ?
use is-odd dependency or not?

so from the reddit mockery you find this twitter post (from the guy who publish is-odd)

“no dependencies” is just another way of
saying “I recreated the wheel, added code debt
for insufficiently tested code to my library,
and ended up with the same amount of code as
battle-tested code from dependencies would have created,
but without the licenses”. #nodejs #GitHub

in comments of that tweet you can find the following video where you can find a pretty good advice

Go Proverbs - Rob Pike - Gopherfest - November 18, 2015

watch from 09:25

A little copying is better than a little dependency
so this is when I first went to Google
one of the first things that was told to me
that scared me was somebody who said
“we really care most importantly about code reuse”
and I thought that was a weird thing to say
and then I found what they meant was
if you could borrow, if you could avoid writing one line
by doing a number include you should do that
and I learned that was a very very bad idea
and Google internally is still suffering from that decision
but you can actually make your programs compile faster
be easier to maintain and simpler
if you keep the dependency tree really really small
and one of the ways you can do that is sometimes
that “you know what I don’t need that whole library,
all I need is these three linesw of code that I can jsut copy”
and it’s fine

don’t be afraid to copy when it makes sense

so what do we do?
who is right?

Rob Pike or Jon Schlinkert?

well… in theory both advices are valid

but in practice you can always find someone pushing to the extreme a good advice
and turning it into a very very bad idea

it’s like this quote

Everything Should Be Made as Simple as Possible, But Not Simpler
– Albert Einstein

and that is exactly what happened with those npm libraries
that publish only one function into a package

you see it well with the ansi packages
for example: ansi-yellow

'use strict';

var wrap = require('ansi-wrap');

module.exports = function yellow(message) {
  return wrap(33, 39, message);
};

one package for one color which is basically one integer 33
really? WTF?

and it happen I published such library for Redtamarin few years ago
here ansilib

well… that’s about the difference between “simple” and “too simple”

I’m not saying my library is “better”, but the right amount of granularity
or the right balance is to put together all ansi escape codes into one library,
having one library per escape code is way overkill.

And you know what?
Sometimes I do use this library because I do need to do a lot of ANSI escaping,
but some other times I just don’t use it because I just need to display one error message in red
and what do I do in this case? I just copy the code I need

public function showerr( msg:String )
{
    printf( "\x1b[31m%s\x1b[0m", msg );
}

aka A little copying is better than a little dependency

but it is not only that, it is also a cultural thing

with npm and a programming language like JavaScript
where you need to load everything dynamically

the culture is to reuse dependencies
and most people do recognise that it is too much

even joke about it

node_modules_joke

and yet most developers are doing it

so what do we do?

for ActionScript 3 specifically, we do both, even if we do not realise it
if you reuse SWC libraries in most cases those libraries contains “all the code”
and they do embed their own dependencies

there is not really any rules, you could compile a SWC against other libraries
with or without liking to them, nothing enforce one or the other with the compilers

I guess culturally, ActionScript developers thought it was simpler to compile it all into one library,
and as it did not generate huge conflicts, it kept going

also the libraries were pretty much focused on covering “one thing” and that helped

but what most people did not realise is whatever the size of that SWC library
when you reuse it in your own code you don’t necessarily embed it all in your final binary

when I released gaforflash.swc long time ago, people complained “the SWC is 10MB!!! it is too big!!!” without realising by using it it would add only few KB in their final SWF

and that is the great thing with the compilers, even if it is all bytecode, they are smart enough to only embed what is really needed, so yeah they optimise that by default for you

and only in rare case you will find people that compile against other libraries without linking to them or using such metadata as [inline] with ASC2, but in general they document it because it is so rarely used.

but still, what do we do?
would it be pertinent to have an isOdd() directly defined in the AS3 builtins?

my take on that is that it is good to have a named function for a specific functionality
but it is not essential to have it builtin or defined in a “core framework”
because

  1. it is trivial to do using % (modulo) even if not everyone know modulo
  2. % (modulo) is builtin into the language
  3. adding isOdd() would be a kind of duplicate of the functionality that is already there

but then is it worth it to have a library that only define isOdd() ?

no it is not, not just for one such trivial function

that said, you could perfectly make a “Number library”
where you could define such things as isOdd(), isEven(), etc.
a group of functionalities around the same “goal”/“theme”/“subject”
eg. working with numbers

See it like the Single responsibility principle but applied to external libraries
your library only deal with one subject “working with numbers”
with a set of definitions: classes, interfaces, functions, const, etc.

And if someone just need a small part of that library, they can just look in the source code of isOdd() and copy/paste it.

And by extension you do the same when you define API
in Redtamarin for example, you will find builtin functionalities related to the command-line
hasColor(), isBash(), isCommandPrompt(), isPowerShell(), isTerminalEmulator(), isWSL(), etc.

but you will not find the ansilib as a builtin, the library is not “worth” enough to be part of the core functionalities, and yes kind of an arbitrary choice but that is where a limit is drawn

should the functionality to display colors in a command-line be core as available all the time?
no, not really, it is NICE TO HAVE but SHOULD BE optional, if you need it just reuse a library

opposed to
should the functionality to detect the command-line shell be available all the time?
yes, this is something used all the time in scripts, CLI tools etc. It HAS TO be there by default

and here the logic in Redtamarin
if a user can detect the shell supports colors with hasColor()
he or she can then decide to

  • directly write ANSI escape code
    eg. printf( "\x1b[31m%s\x1b[0m", "some error here" );
  • load the ansilib and use its functionalities
    eg. trace( colorize( "some error here", colors.red ) );
  • and many other scenario

simply put detecting is more important than formatting
that’s where I draw the line and yes it is arbitrary and mostly based
on my daily usage of Redtamarin for the last few years

in another situation, where 1000s of users tell me “this is needed as a builtin”
that would probably change

so, “what do we do?”, well…

  • if it’s only for you, do whatever you want
  • but if you do share the code with others
    • follow the programming language builtin features
    • follow the culture if there is a strong one
    • don’t do it “too simple”
    • make it easy to look at the sources

but yeah there are not definitive rules, it will probably be a an arbitrary choice
and so be prepared to change it if it does not fit the users, the usage, etc.

it’s like adding sugar in tea or adding salt to chips
“not enough” you barely feel it
“too much” it kind of ruin it
“just good” require a bit of practice :slight_smile: