Page Moved

March 7, 2021June 30, 2021 AbigailLeave a comment

This page has been moved to abigail.github.io

Perl Weekly Challenge 100: Fun Time

February 21, 2021 AbigailLeave a comment

Challenge

You are given a time (12 hour / 24 hour).

Write a script to convert the given time from 12 hour format to 24 hour format and vice versa.

Ideally we expect a one-liner.

Examples

Example 1

Input: 05:15 pm or 05:15pm
Output: 17:15

Example 2

Input: 19:15
Output: 07:15 pm or 07:15pm

Overview

We have to do three things:

Parse the input and extract the interesting parts (hours, minutes, am/pm/none).
Calculate the result. Note that the minutes part stays as is, this doesn’t change whether the time is using am/pm or a 24 hour clock.
Print the resulting time.

For the am/pm marker, note that if the input uses am or pm, then the output doesn’t. If the input doesn’t have am or pm (that is, it uses a 24 hour clock), then the output will have either am or pm. In particular, it will have am in the output if the input hour is less than 12, else it will have pm.

For the hours, it’s important to realize that a time with am or pm doesn’t use the 0 hour — instead it uses 12. So, beside adding/subtracting 12 (if going from/to pm time), we have to compensate for this. We do this as follows:

We start off with taking the hour modulo 12. The effect of this is that if the start time uses am or pm, and the hour is 12, the hour now is 0. If the start time is in 24 hour format, we now have hours in range 0 – 11.
If the start time used pm, we add 12 to the hour.
If the start time was in 24 hour format, and the hour is 0 (after the modulo 12), set the hour to 12.

Solutions

For each of the languages, we assume the input is on standard input, one time per line. Output is written to standard output, one time per line.

Perl

We will use a substitution regular expression to modify the time, with an executable replacement part:

    say s {^\s* ([0-9]+) : ([0-9]+) \s* ([pa]?)m? \s*\n}
          {sprintf "%02d:%02d%s",
              $3 ? ($1 % 12) + (lc ($3) eq "a" ? 0 : 12)
                 : ($1 % 12) || 12,
              $2,
              $3 ? ""
                 : $1 >= 12 ? "pm" : "am"}xeir

From the input, we parse the hours (the first ([0-9]+)), the minutes (the second ([0-9]+)) and the am/pm setting (if any) (([pa]?)m?). The result is that $1 contains the hours; $2 contains the minutes, and $3 is either a (for an am time), p (for a pm time) or the empty string (for a time using 24 hours).

In the replacement part, we use sprintf to do formatting: hours and minutes as two digit numbers (0 padded) (%02s), followed by either am, pm, or none. The new hours and am/pm marker are calculated as explained in the overview section.

We’re using four regexp modifiers:

x means we can freely have white space in the pattern, which won’t be matched.
e means the replacement part is code which needs to be executed — the result is the replacement.
i means we will be doing the matching case insensitive (so we accept both am and AM, and pm and PM).
r means we don’t modify the original string, instead, we return the result. And this result is passed directly to sprintf.

Does this count as a one-liner? I say it does; the newlines are just there for readability.

Find the complete program on GitHub.

Perl Weekly Challenge 100: Triangle Sum

February 21, 2021 AbigailLeave a comment

Challenge

You are given triangle array.

Write a script to find the minimum path sum from top to bottom.

When you are on index `i` on the current row then you may move to either index `i` or index `i + 1` on the next row.

Examples

Example 1

Input: Triangle = [ [1], [2,4], [6,4,9], [5,1,7,2] ]
Output: 8

Explanation: The given triangle

            1
           2 4
          6 4 9
         5 1 7 2

The minimum path sum from top to bottom:  1 + 2 + 4 + 1 = 8

             [1]
           [2]  4
           6 [4] 9
          5 [1] 7 2

Example 2

Input: Triangle = [ [3], [3,1], [5,2,3], [4,3,1,3] ]
Output: 7

Explanation: The given triangle

            3
           3 1
          5 2 3
         4 3 1 3

The minimum path sum from top to bottom: 3 + 1 + 2 + 1 = 7

             [3]
            3  [1]
           5 [2] 3
          4 3 [1] 3

Overview

We will read in the data, and put it in a (2-dimensional) array. We will then work from bottom to top, calculating for each point, the minimum path length to the bottom. We can do this by adding the minimum of the two elements "below" the current element to it.

For instance, assume the input is:

we can calculate the minimum path length to the bottom for the penultimate row by, for each element in that row, adding to it the minimum of the two elements below. This results in:

In the same way, we can calculate the minimum path length for each element in the second row:

One more step, and we get the wanted result in the top of the triangle:

This clearly is a linear time algorithm.

Solutions

Each of the solutions assumes a single triangle on standard input, one row per line, numbers separated by white space. We’re assuming the numbers are integers. The result, as single number, will be written on standard output.

Perl

We start with reading in the data, which can be done in a single statement:

my @nums = map {[/-?[0-9]+/g]} <>;

For each line of input, we’re extracting the numbers using a simple regular expression. This stores the triangle of numbers in the array @nums; each element of @nums is an array with numbers (a row in the triangle).

We can now calculate the lengths using a nested loop; the other loop loops over the rows (bottom to top, starting with the penultimate row), the inner loop loops over the elements of a row.

for (my $x = @nums - 2; $x >= 0; $x --) {
    foreach my $y (keys @{$nums [$x]}) {
        $nums [$x] [$y] += 
               $nums [$x + 1] [$y] < $nums [$x + 1] [$y + 1]
             ? $nums [$x + 1] [$y] : $nums [$x + 1] [$y + 1]
    }
}

For each element, we adding the minimum of the two elements below it.

All what needs to be done is to print the result:

say $nums [0] [0];

Find the complete program on GitHub.

Feature Compare: Characters to Code Points

February 3, 2021February 3, 2021 AbigailLeave a comment

In this article we are going to compare how to converts characters/single character strings to their ASCII or Unicode code points in various languages.

For instance, character "A" has code point 65, character "~" has code point 126, and character "ด" has code point 3604.

AWK

AWK doesn’t have any buildins to do these mappings, but it does have sprintf, which can be used to map a number to a single string character:

char = sprintf "%c", codepoint

To map characters to code points, we need to do some preprocessing. For instance, if we need to map printable ASCII characters to code points, we may want to do:

BEGIN {
    for (o = 32; o < 127; o ++) {
        c = sprintf ("%c", o)
        ord [c] = o
    }
}

Now in the rest of the program, we can do:

codepoint = ord [char]

assuming that char is a printable ASCII character.

Do note that AWK does not know about Unicode, but gAWK (GNU AWK) does.

Example (gAWK)

BEGIN {
    for (o = 32; o < 3700; o ++) {
        c = sprintf ("%c", o)
        ord [c] = o
    }
}

BEGIN {
    print ord ["A"]        # Prints: 65
    print ord ["ด"]        # Prints: 3604
    printf "%c\n", 70      # Prints: E
    printf "%c\n", 3604    # Prints: ด
}

Bash

For bash, we need to resort to printf. It behaves more or less like printf(1). And printf(1) has subtle differences between implementations on BSD and GNU. And this is a bit of a problem. Where as in the GNU implementation, the %c format takes an integer, and returns the character with that code point, while on BSD, the %c format takes a string, and returns the first character. This makes that we cannot use the %c format to map code points to characters.

But the printf buildin (and printf(1)) has a feature not present in the C implementation of printf(3): if the format string contains "\xHH", for some hexadecimal number HH, then this is mapped to the ASCII character with HH as code point.

So, to turn a code point into a character, we’re going to use printf twice. First to create a string with a "\xHH" escape, and then another printf to turn this into a character:

printf -v char "\x$(printf %x $codepoint)"

This takes the code point in $codepoint, and places the corresponding character in $char.

To go from a character to a code point, we need to use another feature of the printf buildin/printf(1) utility: if an argument of printf starts with single or double quote, the value is the code point of the first character following the quote. Note that we have to be careful to use this syntax, as the quote character itself is special to the shell — we need to escape it.

We can then use the following:

printf -v codepoint "%d" "'$char"

This takes the character in the variable $char, and places the corresponding code point into $codepoint.

Example

printf "%d\n" "'A"             # Prints: 65
printf "\x$(printf %x 70)\n"   # Prints: F

C

In C, characters are code points: a string is just an array of numbers, where each number is the corresponding code point. So we can just use:

ch [0] = codepoint;
codepoint = ch [0];

where codepoint contains the code point, and ch is a string (of type char *, and appropriately malloc -ed).

Single quote characters are also just code points, so we can also things like:

codepoint = 'A';

Example

char ch [3];
ch [0] = 70;
ch [1] = 'G';
ch [2] = '\0';
printf ("%d\n", 'A');   /* Prints: 65 */
printf ("%s\n", ch);    /* Prints: FG */

Lua

In Lua, there are methods char and byte in the string module to map between characters and code points:

char = string . char (codepoint)
codepoint = string . byte (char)

The string module does not have to be explicitly loaded.

Example

print (string . byte ("A"))  -- Prints: 65
print (string . char (70))   -- Prints: F

Node.js

Node.js has methods fromCodePoint and codePointAt in the String module to map between code points and code points:

char = Strings . fromCodePoint (codepoint, ...)
codepoint = char . codePointAt (0)

Note that fromCodePoint is called as a method in the String module, while codePointAt is called on a string. The latter takes a parameter, indicating from which position of the string we want the code point.

Node.js also as fromCharCode and charCodeAt which are subtly different. Node.js works with UTF-16, which means each character is encoded as one or two 16-bit units. While fromCodePoint and codePointAt deal with Unicode code points, fromCharCode andcharCodeAt deal with those 16-bit units. This only becomes relevant for code points outside of the Basic Multilingual Plane; that is, for code points exceeding U+FFFF (65535).

The String module does not to be loaded explicitly.

Example

console . log ("A" . codePointAt (0))               // Prints: 65
console . log ("ด" . codePointAt (0))               // Prints: 3604
console . log (String . fromCodePoint (70))         // Prints: F
console . log (String . fromCodePoint (70, 72, 74)) // Prints: FHJ
console . log (String . fromCodePoint (3604))       // Prints: ด

Perl

Perl has chr which takes a code point and returns the corresponding character, and ord which takes a character and returns the corresponding code point:

$char = chr $codepoint;
$codepoint = ord $char;

This works for all Unicode characters.

Example

say ord "A";    # Prints: 65
say ord "ด";    # Prints: 3604
say chr 70;     # Prints: F
say chr 3604;   # Prints: ด

Python

Just like Perl, Python has chr and ord which maps between code points and characters:

char = chr (codepoint)
codepoint = ord (char)

In Python 2, this works for characters with code points up to 255; Python 3 supports the full Unicode range.

Example

print (ord ("A"))   # Prints: 65
print (ord ("ด"))   # Prints: 3604
print (chr (70))    # Prints: F
print (chr (3604))  # Prints: ด

Ruby

Ruby has chr and ord which are methods called on integers and strings:

char = codepoint . chr
codepoint = char . ord

Without arguments, chr only works on numbers 0 - 255, giving a range error on larger numbers. To make use of the full Unicode range, pass in the encoding:

char = codepoint . chr (Encoding::UTF_8)

Example

puts "A" . ord                       # Prints: 65
puts "ด" . ord                       # Prints: 3604
puts 70 . chr                        # Prints: F
puts 3604 . chr (Encoding::UTF_8)    # Prints: ด

Perl Weekly Challenge 97: Binary Substrings

January 30, 2021 AbigailLeave a comment

Challenge

You are given a binary string $B and an integer $S.

Write a script to split the binary string $B of size $S and then find the minimum number of flips required to make it all the same.

Overview

Just like in this weeks other challenge, we will use a command line parameter. We will use -s SIZE to indicate the size of the substrings. The binary strings will be read from standard input.

To calculate the minimum number of flips required, note that flipping a bit on position i of a sub string has no bearing on the flips required in any other position. So we can calculate the minimum number of flips required for each position i independently, and sum the totals.

To calculate the minimum number of flips required for a specific position i, we calculate the number of 0s in position i in all the sub strings. Given the number of 0s, we can calculate the number of 1s as both numbers should add up to the number of sub strings (and that number can be easily calculate by dividing the length of the binary string with the length of the sub strings). Now, the minimum number of flips required is the minimum of the number of 0s and the number of 1s. Observe that is may be possible we end up flipping to a string which is not one of the sub strings. Consider the binary string 111011011011, and sub string length of 4. Then we get the following sub strings:

1110
1101
1011

It requires flipping two bits to change one of the sub string to any of the others, so four flips would be required if the target where any of the sub strings. However, if we just flip the 0 bits, all sub strings end up being 1111, and only three flips are required.

Note that we don’t have to split the binary string into sub strings to calculate the number of flips. We can just leave the binary string as is, and use a simple formula. The bit at position i of sub string j, is at position j * size + i, where size is the given length of the sub strings. For languages which index strings starting from index 1, the formula is j * size + i + 1.

Solutions

We will not be showing any code related to parsing command line options. This is the same as for the other challenge of this week. In each language, we will have a variable size (or $size) which will contain the value passed in.

The implementation in each language is almost identical, except for some syntax variation.

Perl

while (<>) {
    chomp;
    my $sections = length () / $size;
    my $sum = 0;
    for my $i (0 .. $size - 1) {
        my $zeros = 0;
        for my $j (0 .. $sections - 1) {
            my $index = $j * $size + $i;
            $zeros ++ if substr ($_, $index, 1) eq "0";
        }
        my $ones = $sections - $zeros;
        $sum += min ($zeros, $ones);
    }
    say $sum;
}

For each position i, we count the number of 0 on that position. We then calculate the number of 1s in that position, and sum the minimum of the two.

Find the complete program on GitHub.

AWK

{
    sum = 0
    sections = length ($0) / size
    for (i = 0; i < size; i ++) {
        zeros = 0;
        for (j = 0; j < sections; j ++) {
            indx = j * size + i + 1  # Can't use 'index' as a variable
            if (substr ($0, indx, 1) == "0") {
                zeros ++
            }
        }
        ones = sections - zeros
        sum += zeros < ones ? zeros : ones
    }
    print sum
}

Find the complete program on GitHub.

Bash

while read line
do    sections=$((${#line} / $size))
      sum=0
      for ((i = 0; i < size; i ++))
      do   zeros=0
           for ((j = 0; j < sections; j ++))
           do    index=$(($j * $size + $i))
                 if   [ "${line:$index:1}" == "0" ]
                 then zeros=$(($zeros + 1))
                 fi
           done
           ones=$(($sections - $zeros))
           sum=$(($sum + ($zeros < $ones ? $zeros : $ones) ))
      done
      echo $sum
done

Find the complete program on GitHub.

C

    while ((strlen = getline (&line, &len, stdin)) != -1) {
        strlen --;                      /* We don't care about the newline */
        int sections = strlen / size; 
        int sum = 0;
        for (int i = 0; i < size; i ++) {
            int zeros = 0;
            for (int j = 0; j < sections; j ++) {
                int index = j * size + i;
                if (line [index] == '0') {
                    zeros ++;
                }
            }
            int ones = sections - zeros;
            sum += zeros < ones ? zeros : ones;
        }
        printf ("%d\n", sum);
    }
    free (line);

Find the complete program on GitHub.

Lua

for line in io . lines () do
    local sections = #line / size
    local sum = 0
    for i = 0, size - 1 do
        local zeros = 0
        for j = 0, sections - 1 do
            index = j * size + i + 1
            if   string . sub (line, index, index) == "0"
            then zeros = zeros + 1
            end
        end
        local ones = sections - zeros
        if   zeros < ones
        then sum = sum + zeros
        else sum = sum + ones
        end
    end
    io . write (sum, "\n")
end

Find the complete program on GitHub.

Node.js

require ('readline')
. createInterface ({input: process . stdin})   
. on ('line', _ => {
    let sum      = 0
    let sections = _ . length / size
    for (let i = 0; i < size; i ++) {
        let zeros = 0
        for (let j = 0; j < sections; j ++) {
            let index = j * size + i;
            if  (_ . substring (index, index + 1) == "0") {
                zeros ++
            }
        }
        let ones = sections - zeros
        sum += zeros < ones ? zeros : ones
    }
    console . log (sum)
});

Find the complete program on GitHub.

Python

for line in fileinput . input ():
    sections = len (line) / size
    sum = 0
    for i in range (size):
        zeros = 0
        for j in range (sections):
            index = j * size + i
            if line [index : index + 1] == "0":
                zeros += 1
        ones = sections - zeros
        sum += min (zeros, ones)
    print (sum)

Find the complete program on GitHub.

Ruby

ARGF . each_line do |line|
     sections = (line . length - 1) / size
     sum = 0
     for i in 0 .. size - 1 do
         zeros = 0
         for j in 0 .. sections - 1 do
             index = j * size + i
             if   line [index, 1] == "0"
             then zeros += 1
             end
         end
         ones = sections - zeros
         sum += [zeros, ones] . min
     end
     puts sum
end

Find the complete program on GitHub.

Perl Weekly Challenge 97: Ceasar Cipher

January 30, 2021 AbigailLeave a comment

Challenge

You are given string $S containing alphabets A..Z only and a number $N.

Write a script to encrypt the given string $S using Caesar Cipher with left shift of size $N.

Overview

Wikipedia has a description of the Ceasar Cipher.

First thing we need to define is how to read the input. We’ve decided to read the plain text from standard input, and we use a command line option (-s SHIFT) to indicate the left shift. The encrypted text will be written to standard output.

Shifting will be done by looking at each characters, and if it is a capital letter, we look at its code point (ASCII or Unicode — which for letters A to Z is identical), and subtract the left shift. If the result is less than 65, we add 26. 65 being the code point of A, and 26 the number of letters. We than map the resulting code point back to a character.

Solutions

Perl

use Getopt::Long;

GetOptions 's=i'  =>  \my $shift;
die "-s SHIFT option required" unless defined $shift;
$shift %= $NR_OF_LETTERS;

We first parse to the command line. Getopt::Long does the work of parsing @ARGV for us. If no option is given, we die. Else, we mode the value (now in $shift) with 26, because shifting by 26 is a no-op.

We can now do the actual shifting:

my $NR_OF_LETTERS = 26;
my $ORD_A         = ord ('A');

while (<>) {
    chomp;
    s {([A-Z])}
      {   my $ch = ord ($1) - $shift;
          $ch += $NR_OF_LETTERS if $ch < $ORD_A;
          chr $ch
      }eg;
    say;
}

We iterate over the input, and for each capital letter found, we take its code point (ord), and subtract the given shift value. We add 26 if we end up below the code point of the letter A. Then we translate the number back to a (one letter) string (chr).

Find the complete program on GitHub.

AWK

AWK does not have a method to map a character to its code point. We therefor start by building an array (ord), so we can look up the code point. In AWK, we can index arrays with strings:

BEGIN {
    NR_OF_LETTERS = 26
    ORD_A         = 65

    for (i = ORD_A; i < ORD_A + NR_OF_LETTERS; i ++) {
        t = sprintf ("%c", i)
        ord [t] = i
    }
}

We then need to parse the command line option:

BEGIN {
    for (i = 1; i < ARGC; i ++) {
        if (ARGV [i] == "-s") {
            shift = ARGV [i + 1]
        }
    }
    ARGC = 0
}

ARGC is the number of elements in ARGV, which, unlike most arrays in AWK, is indexed starting from 0. We’re scanning the array starting from index 1, as ARGV [0] contains the name of the executable (typically awk).

Note the assignment ARGC = 0. If we don’t do this, awk itself will treat the options as subsequent files to be executed.

Now the shifting of the characters:

{   
    out = ""
    for (i = 1; i <= length (letters); i ++) {
        char = substr ($0, i, 1)
        if (ord [char]) {
            n = ord [char] - shift
            if (n < ORD_A) {
                n = n + NR_OF_LETTERS
            }
            char = sprintf ("%c", n)
        }
        out = out char
    }
    print out
}

AWK does not have a direct way of mapping a code point to a string (like chr in Perl and several other languages), but it does have sprintf, which has a %c specifier in the format, which does exactly that job. With AWK, sprintf works exactly the same as its C counterpart.

Find the complete program on GitHub.

Bash

First, getting the option:

while getopts "s:" name
do  if [ "$name" = "s" ]
    then shift=$OPTARG
    fi
done

Bash (and the Bourne Shell) has a getopts buildin, which parses command line options. Here, we’re using the format "s:" to indicate we’re looking for a -s parameter with a mandatory option. We can iterate over the results of getopts, which sets the option in a variable $name in each iteration. The value belonging to that option is in $OPTARG.

To shift the characters, we use a different algorithm that we’re using in the other languages. Bash doesn’t have easy methods to get the code point of a character, or to get the character given a code point. So, we’re using tr to shift the text one character to the left — and we do this as often as needed:

while read line
do    for ((i = 0; i < $shift; i ++))
      do line=`echo $line | tr A-Z ZA-Y`
      done
      echo $line
done

Find the complete program on GitHub.

C

C has a getopt function, which like the Bash buildin, can be iterated over. It takes the same format string as we used in Bash, "s:", indicating a -s parameter taking an option. It take two further arguments, a number argc and an array of strings argv; the first is the length of the array argc, and argv contains the name of the program as the first element, and the arguments to the program as subsequent elements. Typically, argc and argv are just copied from the parameters to main ():

int main (int argc, char ** argv) {
    int     ch;
    int     shift   = -1;
    int     NR_OF_LETTERS = 26;

    while ((ch = getopt (argc, argv, "s:")) != -1) {
        switch (ch) {
            case 's':
                shift = atoi (optarg) % NR_OF_LETTERS;
                break;
        }
    }
    if (shift < 0) {
        fprintf (stderr, "Requires an -s parameter\n");
        exit (1);
    }

C doesn’t automagically convert between integers and strings, so we’re calling atoi on the -s argument; atoi takes a string, and returns an integer.

In C, strings are just arrays of numbers, so shifting is easy. We can directly modify the numeric value in the array. We’re using getline to read lines from standard input. We’re then using a moving pointer, line_ptr to look (and modify) each character in the string line. If line_ptr points to a capital letter — which we check using the function isupper, we shift the character the required amount:

    while ((strlen = getline (&line, &len, stdin)) != -1) {
        char * line_ptr = line;
        while (* line_ptr) {
            if (isupper (* line_ptr)) {
                * line_ptr -= shift;
                if (* line_ptr < 'A') {
                    * line_ptr += NR_OF_LETTERS;
                }
            }
            line_ptr ++;
        }
        printf ("%s", line);
    }

Find the complete program on GitHub.

Lua

In our Lua solution, we’re parsing the command line arguments ourselves. The array with arguments is called arg, and prefixing an array with # gives its size:

local NR_OF_LETTERS =  26
local shift         = -1
if   #arg == 2 and arg [1] == "-s"
then shift = arg [2] % NR_OF_LETTERS
end

if shift < 0
then io . stderr : write ("Requires a '-s SHIFT' option\n")
     os . exit (1)
end

We define a function to shift a capital character; it takes two arguments, char, the character to be shifted, and shift, the distance to shift:

local ORD_A = string . byte ("A")
function do_shift (char, shift)
    local n = string . byte (char) - shift
    if   n < ORD_A
    then n = n + NR_OF_LETTERS
    end
    return string . char (n)
end

Note the byte and char functions from the string class. The first returns the code point of the character passed in; the latter does the reverse, returning the character of the code point passed in.

We then iterate over the input, and use the gsub method to shift capital letters. gsub does a global substitution, and can take a function as parameter. If so, for each match, the function is called, and its return value is used as the substitution part. This is more or less equivalent to Perls s///e.

for line in io . lines () do
    io . write (string . gsub (line, "[A-Z]", function (ch)
             return do_shift (ch, shift)
         end), "\n")
end

Find the complete program on GitHub.

Node.js

Node.js has an impressive module to parse command line arguments, yargs, which we will be using the parse the command line:

const argv = require ('yargs')
. option ('s', {
    type: 'number',
  })
. demandOption ('s')
. argv;

const shift = argv . s

A function to shift an individual character:

const NR_OF_LETTERS = 26
const ORD_A         = "A" . charCodeAt (0)
function shift_char (char, shift) {
    if (char . match (/[A-Z]/)) {
        let n = char . charCodeAt (0) - (shift % NR_OF_LETTERS)
        if (n < ORD_A) {
           n = n + NR_OF_LETTERS
        }
        return String . fromCharCode (n)
    }
    else {
        return char
    }
}

Note that here, unlike with our Lua solution, the function is called for any character in the input, so we need an additional check so we only shift capital letters. To get the code point of a character, we call charCodeAt; to get the character given a code point, we use fromCharCode.

To process the input, we split each line into individual characters, call shift_char on each of them, then join the return values back to a string:

require ('readline')
. createInterface ({input: process . stdin})   
. on ('line', _ => 
       console . log (_ . split ("")
                        . map   (_ => shift_char (_, shift))
                        . join  ("")))

Find the complete program on GitHub.

Python

Python has a getopt module which has a getopt function which is similar to the one used in C and Bash. (Python also has argparse, but we stick with something we’re more familiar with).

NR_OF_LETTERS = 26

shift = -1
opts, args = getopt . getopt (sys . argv [1:], 's:')
for opt, val in opts:
    if opt == "-s":
        shift = int (val) % NR_OF_LETTERS

sys . argv [1:] = []

if shift < 0:
    sys . stderr . write ("Argument -s SHIFT is required\n")
    sys . exit (1)

Note the assignment sys . argv [1:] = [], this clears the argv array. That way, we will be reading from standard input instead of Python trying to open a file called -s.

Strings are immutable in Python, so we will be construction a new string, with the capital letters shifted:

for line in fileinput . input ():
    out = ""
    for i in range (len (line)):
        if "A" <= line [i] <= "Z":
            n = ord (line [i]) - shift
            if n < ORD_A:
                n = n + NR_OF_LETTERS
            out += chr (n)
        else:
            out += line [i]
    sys . stdout . write (out)

Find the complete program on GitHub.

Ruby

Ruby has the optparse module to parse command line parameters. This supplies a method getopts which returns an hash with options and their values.

require 'optparse'
NR_OF_LETTERS = 26

params = ARGV . getopts ('s:')
shift  = params ["s"] ? params ["s"] . to_i % NR_OF_LETTERS : -1

if shift < 0
    STDERR . puts "Requires a -s SHIFT option"
    exit 1
end

We’re using to_i to make the parameter a string.

A function to shift a capital letter:

def shift_letter (letter, shift)
    n = letter . ord - shift
    if   n < 'A' . ord
    then n = n + NR_OF_LETTERS
    end
    return n . chr
end

We’re iterating over the input, and are using gsub to replace each capital letter with a shifted one:

ARGF . each_line do |line|
     line = line . gsub (/[A-Z]/) {|_| shift_letter _, shift}
     puts line
end

Find the complete program on GitHub.

Perl Weekly Challenge 96: Edit Distance

January 23, 2021January 30, 2021 AbigailLeave a comment

Challenge

You are given two strings $S1 and $S2.

Write a script to find out the minimum operations required to convert $S1 into $S2. The operations can be insert, remove or replace a character. Please check out [the] Wikipedia page for more information.

Overview

What we are asked to do, is to calculate the Levenshtein distance between two strings. If we have two strings, $source and $dest, we can recursively define how to go from $source to $dest using inserts, deletes, and replacements. Let head ($str) be the first character of a string $str, and tail ($str) be the string, except the first character. Then:

If $source is the empty string, we add each of the characters of $dest.
If $dest is the empty string, we delete each of the characters of $source.
If head ($source) equals head ($dest), then we recurse with tail ($source) and tail ($dest).
Otherwise, we either:
- add head ($dest), and recurse with $source and tail ($dest).
- delete head ($dest), and recurse with tail ($source) and $dest.
- replace head ($source) with head ($tail), and recurse with tail ($source) and tail ($dest).

Since we just want to number of operations, we can translate this. Let $\text{lev }(s, d)$ be the Levenshtein distance between strings $s$ and $d$ . Let $|t|$ be the length of the string $t$ , $\text{head }(t)$ the first character of the string $t$ and $\text{tail }(t)$ the string $t$ minus the first character. Then:

$\text{lev }(s, d) = \begin{cases} |a| & \text{if }|b| = 0\\ |b| & \text{if }|a| = 0\\ \text{lev }(\text{tail }(s), \text{tail }(d)) & \text{if }\text{head }(s) = \text{head }(d) \\ 1 + \text{min } \begin{cases}\text{lev }(\text{tail }(s), d) \\ \text{lev }(s, \text{tail }(d)) \\ \text{lev }(\text{tail }(s), \text{tail }(d)) \end{cases} & \text{otherwise} \end{cases}$

We can rearrange this. Define $\text{hneq }(s, d)$ as:

$\text{hneq }(s, d) = \begin{cases} 1, & \text{if }\text{head }(s) \neq \text{head }(d) \\ 0, & \text{if }\text{head }(s) = \text{head }(d) \end{cases}$

That is, $\text{hneq }(s, d)$ is 1 if the first characters differ, and 0 if they are equal. Then:

$\text{lev }(s, d) = \begin{cases} |a| + |b| & \text{if }|a| = 0 \text{ or }|b| = 0 \\ \text{min }\begin{cases} \text{lev }(s, \text{tail }(d)) + 1 \\ \text{lev }(\text{tail }(s), d) + 1 \\ \text{lev }(\text{tail }(s), \text{tail }(d)) + \text{hneq }(s, d) \end{cases} & \text{otherwise} \end{cases}$

We can use this to construct a recursive algorithm. However, this would require $\mathcal{O} (|s| \cdot |d|)$ recursive calls. Instead, we will be using an iterative algorithm, the Wagner Fischer algorithm. This is a more or less straight forward implementation of the above definition — except that it works the other way around. Instead of iterating through the strings, finding the shortest number of edits from the current positions to the end of the strings, it keeps track of the shortest path from the beginning to the current position. Which boils down to the same thing, it’s just finding the shortest number of edits of the reversed strings.

We will be calculating a 2-dimensional array distance. distance [i] [j] will be the minimum number of edits required to change the first i characters of the source string (s) to the first j characters of the destination string (d). We then have, in pseudo-code:

for j = 1 .. |d| do
    distance [0] [j] = j for j = 1 .. |d|
end
for i = 1 .. |s| do
    distance [i] [0] = i for i = 1 .. |s|
done
for i = 1 .. |s| do
    for j = 1 .. |d| do
        cost = s [i] == d [j] ? 0 : 1
        distance [i] [j] = min (distance [i - 1] [j]     + 1,
                                distance [i]     [j - 1] + 1,
                                distance [i - 1] [j - 1] + cost)
    done
done
return distance [|s|] [|d|]

Note that strings are indexed 1-based.

This uses $\Theta\ (|s| \cdot |d|)$ memory. We can improve on this. Note that we are using the current and previous rows of the array, that is, in the inner loop, we only index by i and i - 1. With a little rearrangement and deletion of rows we no longer use, we can reduce the memory to $\mathcal{O}\ (|s| + |d|)$ :

for i = 0 .. |s| do
    for j = 0 .. |d| do
        if i == 0 || j == 0
        then
            distance [i] [j] = i + j
        else
            cost = s [i] == d [j] ? 0 : 1
            distance [i] [j] = min (distance [i - 1] [j]     + 1,
                                    distance [i]     [j - 1] + 1,
                                    distance [i - 1] [j - 1] + cost)
        end
    end
    if i > 0
    then
        delete distance [i]
    end
end
return distance [|s|] [|d|]

By just using two one dimensional arrays, and swapping the strings if the destination string is larger than the source string, we could reduce the memory usage to $\mathcal{O}\ (\text{min }(|s|, |d|))$ , but we decided not to bother.

Implementation of the algorithm above in various languages is now easy.

Solutions

Perl

sub LevenshteinDistance ($first, $second) {
    my $distance;
    for (my $i = 0; $i <= length ($first); $i ++) {
        for (my $j = 0; $j <= length ($second); $j ++) {
            $$distance [$i] [$j] =
                $i == 0 || $j == 0 ? $i + $j
              : min ($$distance [$i - 1] [$j]     + 1,
                     $$distance [$i]     [$j - 1] + 1,
                     $$distance [$i - 1] [$j - 1] +
                        (substr ($first,  $i - 1, 1) eq
                         substr ($second, $j - 1, 1) ? 0 : 1))
        }
        undef $$distance [$i - 1] if $i;
    }
    $$distance [-1] [-1];
}