Scanner Troubles

I got ahold of this text file that was the content of the ZF05 zine. My goal was to write a program to extract the good parts of the zine for me to read quickly. Of course I was writing my program in Java.

My process was to do some Test Driven Development. So before I wrote the complex algorithms to do smart filters, I decided to write some simple I/O operations first. To do this I employed the Scanner class in Java. That should be easy. You pass the Scanner constructor a File object. Then you keep calling nextLine() as long as there is a hasLine().

My first TDD exercise was to write a copy program. It should just take a file and duplicate its contents to a second file. That sounds easy. My first attempt compiled quickly. But when it ran, it only copied part of the file. I thought perhaps it was some type of buffer overflow. However no exceptions were through. The program ended gracefully. I put my debugger hat on. Then I determined how far the copy was getting in the source file. That when I discovered some strange characters in the source file. They seemed to be causing hasLine() to return FALSE prematurely. It is going to take some digging into the Scanner class to figure out why I was getting this false positive. For now I will just give you some lines around the place where hasLine() choked. See if you can see why it is failing:

-rw-r--r-- 1 thalakan thalakan 386 Jan 5 2006 zeller.cd
rwxr-xr-x 2 thalakan thalakan 512 Jun 22 2006 かたかã�ls
$ cat watcher.pl
#!/usr/bin/perl -w