Saturday, July 11, 2009

Counting the Lines in a LARGE File with PHP

Firstly, I am no PHP Guru, so if someone has a better method, I would love to have it explained so I can post it here (credit attributed of course).

Problem:
So I wanted to count the lines in a 4GB text file, and all the examples on line showed me how to lead the file into an array, and get the count(); Well that's great if your file is small, but trying to load a 4GB file into an array on a machine with 2GB is not going to be pretty.

I tried anyway and quickly ran into PHP's script limit. Something I had not come across before. I updated my php.ini to allow for 1.5GB memory usage per script and up to 6 days before execution would time-out. Obviously this was not going to work anyway, but was fun to discover.

Solution:
Also very obvious, but worth documenting as someone will want to know how to do this, I can read a file line-by-line, incrementing a variable until the end-of-file has been reached.

I know-I know! There's a WAAAAAY better method to do this somewhere, but time did not permit me to discover it. Perhaps you have the answer, perhaps you are looking for an answer. If you're looking... you can try this code below, which took about 3 minutes to run on my machine.

// Set your filename
$file="ffdls.tsv";

// Open the file for reading
$handle = fopen($file, "r");

// Loop through the file until you reach the last line
while(!feof($handle)){

// Read a line
$line = fgets($handle);

// Increment the counter
$lineCount++;

}

// Release the file for access
fclose($handle);

// Display number of lines
echo "Line count: $lineCount";

2 comments:

  1. A curious thought; If you change "$line = fgets($handle);" to just "fgets($handle);" (i.e don't store the line, since that shouldn't be needed), does that make it go faster?

    ReplyDelete
  2. Technically you dont even need that line if you are ONLY after the number of lines.

    As that is purely for reading the line which you dont technically need unless filtering out empty lines etc

    Nice post though, thanks!

    ReplyDelete