Cahit Arf 1.0 User's Guide
Since you are reading this manual on your computer, it means that Cahit Arf is already installed. Now, you need to obtain JDBC driver of your database and to create your options file(s) using your favorite text editor or Cahit Arf's interactive wizard.
In order to make available your JDBC drivers for Cahit Arf, just copy the
Java archive (.jar) files of drivers into
'<CahitAtf base dir>/lib'
No additional classpath modification is necessary. On run-time, Cahit Arf will
scan that directory for drivers. If you do not have relevant JDBC driver right now,
Obtaining JDBC Drivers section might be useful.
You can skip this section if you wish to use the Wizard.
Cahit Arf needs to know
- How to connect to your database : The full Java class name of the JDBC driver, the database server location (URL), your login id and password.
- How to collect data : The SQL 'select' statement for querying data table(s).
- How to convert data : The 'Relation' name; attribute names and types according to columns of query results.
We deliver all these informatin to Cahit Arf using an options file which includes several
option name=option value
formated options. Each option - value pair should be placed into one line. Lines started with '#' character are comment lines and they have no effect on Cahit Arf. In order to increase readability, you can leave some lines blank as well. An options file may be created either with a text editor manually or Cahit Arf Wizard. Cahit Arf options are as follows :
- The value of WEKA relation name (e.g.
- The full Java class name of the main driver class of your JDBC driver (e.g
- A URL statement indicates your database location. Refer to jour JDBC driver documentation
for URL composition rules (e.g.
- jdbc.user, jdbc.password
- User name and password to login to the database server. Unnecessary one(s) might be omitted or left blank of value part.
- The SQL Select statement (e.g.
SELECT * FROM WEATHER_OBSERVATIONS). The number of columns returned from the query must be equal to the number of attributes. Attribute names, on the other hand, would be different from column names of the result set.
- attr.0, attr.1 ... attr.n
- Attribute definitions acording to columns of the result set of the given select statement. Be careful: attribute indexes start from 0, not 1. attr.0 correspondes to the first column of the query results, attr.1 to the second column and so forth. If you ommit attribute definitions, Cahit Arf generates them from query result.
- Each attribute definetion contains two or three parts separated by ':' (colon) character.
The format is :
attribute-namedoes not have to be the same with the corresponding column name. The type value would be one of
classtype differs from
stringwith declaring all possible values at the header of the ARFF file. The last and optional parameter denotes that attribute values will be quoted or not.
qmeans double quote, and
qssingle quote. If you omit the quote field, the values will not be quoted. You should not list the possible class values of
classtyped attributes, Cahit Arf extractor will generate them from the result set automaticaly.
A sample option file content would be as follows for 'weather.arff' sample data file which comes with WEKA distribution :
#Created by Cahit Arf Wizard #Mon Sep 22 12:35:04 GMT+02:00 2003 relation=Weather jdbc.driver=org.gjt.mm.mysql.Driver jdbc.url=jdbc:mysql://localhost/weatherdb jdbc.user=ayhan jdbc.password=mypasswd jdbc.select=SELECT * FROM WEATHER_OBSERVATIONS attr.0=outlook:class:qs attr.1=temperature:numeric attr.2=humidity:numeric attr.3=windy:class attr.4=play:class
if java executable is in your PATH and your current directory is Cahit Arf root directory, a typical command line would be as follows :
java -cp CahitArf.jar com.prcomps.cahitarf.Db2Arff <options file path> <output file path>
If if the output file path denotes an existing file, it will be overridden.
You can start Cahit Arf Wizard with either doule clicking
file or typing a command such as :
java -jar CahitArf.jar
Step 1 - Creating a new or selecting an existing options set
Before to depart, you should decide to open an existing options file or create a new one. In addition to options file building, you wuold prefere to use the wizard for querying and creating an ARFF output file rather than use command line mode.
Step 2 - Providing database connection data and testing connection
Refer to the specific JDBC documentations for the driver class and how to construct JDBC URL. We supplied the class names and URL templates of some widely used drivers. If you are creating new options file or modifying the connection properties of an existing one, we strongly recommand you to test your connection before the next step.
Step 3 - Building SQL query string
After typing the SQL select statement, press the 'Rietrive Sample Rows' button which attempts to rietrive up to 50 rows using the query string you provide.
Step 4 - Defining attributes
You can change attribute names with double clcking over the name you wish to change. Attribute type and quotation options are listed by combo boxes. At this point, you gave all necessary information to Cahit Arf, you had better save them as an options file right now.
Step 5 - Generating an ARFF output
If you tested your connection and query, you are ready to generate an ARFF output. You can do this in three ways : 1) If you are expacting a large amount of output, it would be a good idea to generate a sample output to the screen. 2) You can generate the complete output to the screen and copy-paste the output into a file via a text editor. 3) You can redirect the output to a file you specified.
JDBC drivers, in general, are operating system independend pure Java classes
packed as Java archive (.jar) files. As mentioned in
Installation part, in order to make available JDBC driver for Cahit Arf,
you should to obtain the driver from your database (or 3rd party JDBC driver)
developer, and then put its driver .jar file into
directory. Here are web pages of some widely used JDBC drivers :