Encoding missunderstanding

Yesterday I was a little bit confused. I needed to do a very simple task: to import data from CSV file to the database. One guy sent me two files with data which were created in MS Excel and than saved as CSV. I stored both files in my Linux machine and imported the first one without any problems using a simple Perl script.
I was very surprised when I started work with second file. My script gave me many errors. I tried to see the file with less and I was surprised more because less found that file as binary! I copied the file to the public share, logged to the Windows terminal and open it with Notepad. It was simple text file! I tried to see it on Linux – the effect was the same – it was binary! My first thought was “Brain overload”!
FAR helped me to understand what a problem was. I saw ‘Unicode’ when open that file in FAR editor. The file was saved in CSV Unicode format. After saving in simple CSV I could finish my task!

Play around with ‘wc’

Recently I’ve playd aroung with wc which is not a water closet actually :). This is a Linux utility which can count bytes, words and newlines for some file. And additionally it counts a total lines for more than one file is specified. So, I’ve decided to count lines for Billing System (I work with that project three years!). Billing System is pure Perl application contained Perl modules and scripts and also web interface Embperl scripts. Let’s see results:

  • Perl modules: 102707 rows;
  • Perl scripts: 16320 rows;
  • Embperl scripts: 83494 rows;
  • Grand Total: 202521 rows!

Uh-h-h. At least one third of them is mine (he-he-he)!