A simple Aho-Corasick based virus scanner.

Here is a simple virus scanner; a detector only.

It is based on the Aho-Corasick string searching algorithm. A hybrid approach to its construction was used to save memory, the initial implementation, that utilised a vector of 256 entries for each input symbol ( a byte in this case ) consumed some 500 Megabytes of memory using a full virus database, whilst quick, it was thought far to extravagant even for a demonstration. So, the current implementation steals away some 20mb, which is a vast improvement. Further improvements could certainly yield a working space of approx. 5 - 10mb, but at the cost of some efficiency - and more importantly code readability. 

Some of the virus search stings that are present in the Clam database are extremely long. The need for the Aho-Corasick algorithm lessens somewhat since the number of overlapping prefixes diminishes rapidly as the search string length increases. In this case, after a certain number of input symbols have been consumed by a Aho-Corasick machine, an alternative, and more memory efficient method can be used to check the remainder of a potential search match. A binary search may be apt for this purpose.

Full package with C++ source available here.

You will also need a signature database, the Clam AV database, viruses.db,  can be downloaded here http://clamav.sourceforge.net/database/viruses.db However, an older version of viruses.db is included with the source code .zip file.

Testing

The scanner detects real viruses, however if you don't happen to have any of those around, the standard EICAR test string can be used. It is infact, not a virus;  it does not replicate, or cause any other harm.

"To make use of the EICAR test string, type or copy/paste the 
following text into a file called EICAR.COM, or TEST.COM or whatever.

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

Running the file displays the text "EICAR-STANDARD-ANTIVIRUS-TEST-FILE!".
" - alt.comp.virus FAQ, Version 1.03: Part 4 of 4, 12th May 1997


Scanning the file with the scanner should trigger an alert.

Sample output

Sample output can be found here.

 

UML Class diagram