 I have a new project I am hacking on. The goal is to process a huge file of URLs. Then I want to crawl those web sites and extract information to a database. I will provide some more specifics in the future.
I have a new project I am hacking on. The goal is to process a huge file of URLs. Then I want to crawl those web sites and extract information to a database. I will provide some more specifics in the future.The first task at hand was to deal with this huge file. It was 10 gigabytes large. Most text editors could not handle it. I wanted to see if my Java classes could deal with it. So I coded up a UNIX head program. It lists of the first 10 lines from the file.
Bamm. It worked. The Scanner class could read the first 10 lines with ease. The first time it did so there was a small pause. But I could read and process the first 10 lines. It seems to reason I can process the next 10 lines, and so on.
I will keep you posted on the progress of my little project. Next step is to figure out how to do database access with Java. I am thinking JDBC. But there may be better or more interesting technologies to try out.
 

 
 
 
