Cahit Arf 1.0 User's Guide

Installation

Since you are reading this manual on your computer, it means that Cahit Arf is already installed. Now, you need to obtain JDBC driver of your database and to create your options file(s) using your favorite text editor or Cahit Arf's interactive wizard.

In order to make available your JDBC drivers for Cahit Arf, just copy the Java archive (.jar) files of drivers into '<CahitAtf base dir>/lib' directory. No additional classpath modification is necessary. On run-time, Cahit Arf will scan that directory for drivers. If you do not have relevant JDBC driver right now, Obtaining JDBC Drivers section might be useful.

Options File Format

You can skip this section if you wish to use the Wizard.

Cahit Arf needs to know

We deliver all these informatin to Cahit Arf using an options file which includes several

option name=option value

formated options. Each option - value pair should be placed into one line. Lines started with '#' character are comment lines and they have no effect on Cahit Arf. In order to increase readability, you can leave some lines blank as well. An options file may be created either with a text editor manually or Cahit Arf Wizard. Cahit Arf options are as follows :

relation
The value of WEKA relation name (e.g. Weather)
jdbc.driver
The full Java class name of the main driver class of your JDBC driver (e.g org.gjt.mm.mysql.Driver)
jdbc.url
A URL statement indicates your database location. Refer to jour JDBC driver documentation for URL composition rules (e.g. jdbc:mysql://localhost/mydb)
jdbc.user, jdbc.password
User name and password to login to the database server. Unnecessary one(s) might be omitted or left blank of value part.
jdbc.select
The SQL Select statement (e.g. SELECT * FROM WEATHER_OBSERVATIONS ). The number of columns returned from the query must be equal to the number of attributes. Attribute names, on the other hand, would be different from column names of the result set.
attr.0, attr.1 ... attr.n
Attribute definitions acording to columns of the result set of the given select statement. Be careful: attribute indexes start from 0, not 1. attr.0 correspondes to the first column of the query results, attr.1 to the second column and so forth. If you ommit attribute definitions, Cahit Arf generates them from query result.
Each attribute definetion contains two or three parts separated by ':' (colon) character. The format is : attribute-name:class|numeric|string[:q|qs] . attribute-name does not have to be the same with the corresponding column name. The type value would be one of class, numeric, or string. class type differs from string with declaring all possible values at the header of the ARFF file. The last and optional parameter denotes that attribute values will be quoted or not. q means double quote, and qs single quote. If you omit the quote field, the values will not be quoted. You should not list the possible class values of class typed attributes, Cahit Arf extractor will generate them from the result set automaticaly.

A sample option file content would be as follows for 'weather.arff' sample data file which comes with WEKA distribution :

#Created by Cahit Arf Wizard
#Mon Sep 22 12:35:04 GMT+02:00 2003
relation=Weather

jdbc.driver=org.gjt.mm.mysql.Driver
jdbc.url=jdbc:mysql://localhost/weatherdb
jdbc.user=ayhan
jdbc.password=mypasswd
jdbc.select=SELECT * FROM WEATHER_OBSERVATIONS

attr.0=outlook:class:qs
attr.1=temperature:numeric
attr.2=humidity:numeric
attr.3=windy:class
attr.4=play:class

Command Line Mode

if java executable is in your PATH and your current directory is Cahit Arf root directory, a typical command line would be as follows :

java -cp CahitArf.jar com.prcomps.cahitarf.Db2Arff <options file path>  <output file path>

If if the output file path denotes an existing file, it will be overridden.

Wizard Mode

You can start Cahit Arf Wizard with either doule clicking CahitArf.jar file or typing a command such as :

java -jar CahitArf.jar

Step 1 - Creating a new or selecting an existing options set

Before to depart, you should decide to open an existing options file or create a new one. In addition to options file building, you wuold prefere to use the wizard for querying and creating an ARFF output file rather than use command line mode.

Step 2 - Providing database connection data and testing connection

Refer to the specific JDBC documentations for the driver class and how to construct JDBC URL. We supplied the class names and URL templates of some widely used drivers. If you are creating new options file or modifying the connection properties of an existing one, we strongly recommand you to test your connection before the next step.

Step 3 - Building SQL query string

After typing the SQL select statement, press the 'Rietrive Sample Rows' button which attempts to rietrive up to 50 rows using the query string you provide.

Step 4 - Defining attributes

You can change attribute names with double clcking over the name you wish to change. Attribute type and quotation options are listed by combo boxes. At this point, you gave all necessary information to Cahit Arf, you had better save them as an options file right now.

Step 5 - Generating an ARFF output

If you tested your connection and query, you are ready to generate an ARFF output. You can do this in three ways : 1) If you are expacting a large amount of output, it would be a good idea to generate a sample output to the screen. 2) You can generate the complete output to the screen and copy-paste the output into a file via a text editor. 3) You can redirect the output to a file you specified.

Obtaining JDBC Drivers

JDBC drivers, in general, are operating system independend pure Java classes packed as Java archive (.jar) files. As mentioned in Installation part, in order to make available JDBC driver for Cahit Arf, you should to obtain the driver from your database (or 3rd party JDBC driver) developer, and then put its driver .jar file into <CahitArf directory>/lib directory. Here are web pages of some widely used JDBC drivers :

MySQL

Oracle

MS SQL Server

Postgresql

Mckoi SQL

SAP DB

Informix