jump to navigation

Create Instances From Database, Weka Java Eclipse January 17, 2010

Posted by vyolian in weka.
add a comment

Setup and Configuration
1) Make sure you have Weka downloaded and Weka.jar extracted into a folder.
2) Create a new project
3) Navigate to “Java Build Path” -> “Libraries”
4) “Add External Class Folder” and choose your extracted weka folder

5) Navigate inside the extracted weka folder to weka/experiment. Look for two files, DataUtils.props and DataUtils.props.postgresql.
6) Create a backup of DataUtils.props and replace the contents of the original with that of DataUtils.props.postgresql.
7) Edit the jdbcURL to your own configurations, “jdbcURL=jdbc:postgresql://localhost:5432/my_database_name”

Optional: If you look below, I added the mapping “numeric=2″ to show that the Postgres type numeric should be translated as a Java double.

# JDBC driver (comma-separated list)
jdbcDriver=org.postgresql.Driver

# database URL
jdbcURL=jdbc:postgresql://localhost:5432/my_database_name

# specific data types
# string, getString() = 0;    --> nominal
# boolean, getBoolean() = 1;  --> nominal
# double, getDouble() = 2;    --> numeric
# byte, getByte() = 3;        --> numeric
# short, getByte()= 4;        --> numeric
# int, getInteger() = 5;      --> numeric
# long, getLong() = 6;        --> numeric
# float, getFloat() = 7;      --> numeric
# date, getDate() = 8;        --> date
# text, getString() = 9;      --> string
# time, getTime() = 10;       --> date

varchar=0
text=0
float4=2
float8=2
int4=5
oid=5
timestamp=8
date=8
numeric=2

Code

The code below shows two examples combined. One uses the configuration above to create instances directly from a query. The other is just a reference for general use of the database.

import java.sql.Connection;
import java.sql.DatabaseMetaData;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

import weka.core.Instances;
import weka.experiment.InstanceQuery;

public class Driver2 {

	public static void main(String[] args) throws ClassNotFoundException,
			SQLException, Exception {

		/***************************
		 * Instances from Database
		 ****************************/
		InstanceQuery query = new InstanceQuery();
		query.setUsername("my_username");
		query.setPassword("my_password");
		query.setQuery("SELECT price FROM products LIMIT 20");

		Instances data = query.retrieveInstances();
		System.out.println(data);

		/***************************
		 * General from Database
		 ****************************/
		// User Configurations
		String database = "my_database_name";
		String username = "my_username";
		String password = "my_password";

		// Create and Check Connection
		Class.forName("org.postgresql.Driver");
		Connection db = DriverManager.getConnection("jdbc:postgresql:"
				+ database, username, password);
		DatabaseMetaData dbmd = db.getMetaData();
		System.out.println("Connection to " + dbmd.getDatabaseProductName()
				+ " " + dbmd.getDatabaseProductVersion() + " successful.\n");

		// Query
		Statement sql = db.createStatement();
		ResultSet results = sql
				.executeQuery("SELECT id FROM products LIMIT 10");
		if (results != null) {
			while (results.next()) {
				System.out.println("id = " + results.getInt("id"));
			}
		}
		results.close();

		// Clean Up
		db.close();
	}
}

Weka with Java (Eclipse), Getting Started January 16, 2010

Posted by vyolian in Uncategorized.
1 comment so far

Quick, rough guide to getting started with Weka using Java and Eclipse.

1) Make sure you’ve downloaded Weka

2) Create a new project in Eclipse. Find Java Build Path -> Libraries either during project creation or afterwards under “Package Explorer” -> RClick project -> Properties.

3) “Add External Jars…” and select the weka.jar from your download.

4) Create a class file under the “src” folder. This code is taken pretty much line for line from weka.wikispaces.

import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instance;
import weka.core.Instances;

public class Driver {

	public static void main(String[] args) throws Exception{

		 // Declare two numeric attributes
		 Attribute Attribute1 = new Attribute("firstNumeric");
		 Attribute Attribute2 = new Attribute("secondNumeric");

		 // Declare a nominal attribute along with its values
		 FastVector fvNominalVal = new FastVector(3);
		 fvNominalVal.addElement("blue");
		 fvNominalVal.addElement("gray");
		 fvNominalVal.addElement("black");
		 Attribute Attribute3 = new Attribute("aNominal", fvNominalVal);

		 // Declare the class attribute along with its values
		 FastVector fvClassVal = new FastVector(2);
		 fvClassVal.addElement("positive");
		 fvClassVal.addElement("negative");
		 Attribute ClassAttribute = new Attribute("theClass", fvClassVal);

		 // Declare the feature vector
		 FastVector fvWekaAttributes = new FastVector(4);
		 fvWekaAttributes.addElement(Attribute1);
		 fvWekaAttributes.addElement(Attribute2);
		 fvWekaAttributes.addElement(Attribute3);
		 fvWekaAttributes.addElement(ClassAttribute);

		 // Create an empty training set
		 Instances isTrainingSet = new Instances("Rel", fvWekaAttributes, 10);       

		 // Set class index
		 isTrainingSet.setClassIndex(3);

		 // Create the instance
		 Instance iExample = new Instance(4);
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(0), 1.0);
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(1), 0.5);
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(2), "gray");
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(3), "positive");

		 // add the instance
		 isTrainingSet.add(iExample);
		 Classifier cModel = (Classifier)new NaiveBayes();
		 cModel.buildClassifier(isTrainingSet);

		 // Test the model
		 Evaluation eTest = new Evaluation(isTrainingSet);
		 eTest.evaluateModel(cModel, isTrainingSet);

		 // Print the result à la Weka explorer:
		 String strSummary = eTest.toSummaryString();
		 System.out.println(strSummary);

		 // Get the confusion matrix
		 double[][] cmMatrix = eTest.confusionMatrix();
		 for(int row_i=0; row_i<cmMatrix.length; row_i++){
			 for(int col_i=0; col_i<cmMatrix.length; col_i++){
				 System.out.print(cmMatrix[row_i][col_i]);
				 System.out.print("|");
			 }
			 System.out.println();
		 }
	}
}

Weka Output Class Predictions January 14, 2010

Posted by vyolian in Uncategorized.
add a comment

I’m building a predictive model that’s time-series related. I wanted to visualize the instances I predicted incorrectly on a time-series plot. The first step, however, is to get it into a format that R is happy with. Here’s how to add a “predicted” column to your training file.

BASIC (only need existing columns):

In the Weka explorer, under the classify tab, click “More Options”. Make sure the “Store predictions for visualization” is checked.

Click “Start” to build and run the model.

When finished, right-click the model name from the Result List. Click on “Visualize classifier errors.”

Click “Save” in that new window and the outputted file will have the new predicted column.

To convert the resulting ARFF file to CSV, do “java weka.core.converters.CSVSaver -i your.arff -o your.csv”

ADVANCED (need excluded columns):

Say you have extra columns for debugging that you need to exclude before you use it for training — think instance IDs or date markers. Here’s how you would do that.

In the Explorer GUI, go to the classify tab

Choose “FilteredClassifier” under the “meta” folder

Go inside the FilteredClassifier options and choose your base classifier (J48)

In “filter” option, remove “AllFilter” and add “Unsupervised -> Attribute -> Remove”.

In the “Remove” option, choose the attribute index that you want to remove. Then click Add.

You’re now ready to run your model. Follow the latter steps above in “BASIC” to go through visualization and save the arff with the predicted column.

[R] Multiple Plots in Histogram December 24, 2009

Posted by vyolian in R.
Tags: , ,
add a comment

Might be a misleading title but that’s what I searched for when I wanted to plot multiple series on a histogram. Looks like what I should have looked for was barplot.

  t1 <- table( c(1,1,2,3,1,1,2,3,1) )
  t2 <- table( c(1,2,2,3,2,2,2,2,2) )
  t <- rbind(t1,t2)
  barplot(t, beside=TRUE)

Dataframes in R (basic cheatsheet) December 17, 2009

Posted by vyolian in R.
add a comment

Construct

  #construct and initialize dataframe
  df <- data.frame(x=10:15, y=c(2,4,6,1,3,0))
  #construct empty dataframe
  df <- data.frame(x=numeric(0), y=numeric(0))
  #add new row
  df <- rbind(df, data.frame(x=16, y=-1))

Get/Set

  column_names <- colnames(df)

  ys <- df$y

  y_row2 <- df$y[2]

  df$y[2] = 12
  range_xs <- range(df$x)

  range_xs_and_ys <- range( c(df$x, df$y) )

  num_rows <- nrow(df)
  #select certain rows

  subset_xs <- df[ (df$x > 11 & df$y < 6),  ]

  subset_xs <- df[ with(df, x > 11 & y < 6),  ]

  subset_xs <- subset(df, x > 11 & y < 6)

  #select certain columns

  subset_ys <- subset(df, select=c(x,z))

  subset_ys <- subset(df, select=-c(y))

Manipulate

  ordered <- df[ order(df$y),  ]

Transform

  colnames(df) = c("rename_x", "rename_y")

  df <- transform(df, new_z=(x+y))

  #add column
  df <- merge(df, list(c=15:20), by=0, all.x=TRUE)

3 Year Dating Anniversary December 15, 2009

Posted by vyolian in Community.
add a comment

Writing Source Code in WordPress December 13, 2009

Posted by vyolian in Development.
add a comment

I always find myself looking this up and the search results being really bad. This is for easy access for myself. From http://en.support.wordpress.com/code/posting-source-code/.

  [sourcecode language="<language from below>"]
    ...your code here...
  [//sourcecode] <- except only single slash.

And languages that I actually care about:

  • bash
  • cpp
  • csharp
  • css
  • java
  • javascript
  • bash
  • ruby
  • sql
  • xml

Inserting From One Table Into Another – Postgres December 13, 2009

Posted by vyolian in database.
add a comment

Inserts a subset of one table into another.


INSERT INTO <another_table>
 (<column1>, <column2>)
SELECT
  column1, column2
FROM <table>
WHERE <conditions>

Plotting Week Days in R December 13, 2009

Posted by vyolian in R.
add a comment

Say you’re plotting a graph with week days vs frequency but your week days is in the form of numbers (Sunday is 0, Monday is 1, etc). Here’s how you label the x axis with the right tick marks so that Sunday is ‘Sun’, Monday is ‘Mon’, etc.

day_of_week &lt;- c(0, 1, 2, 3, 4, 5, 6)
frequency &lt;- c(10, 15, 2, 9, 15, 16, 7)

plot(day_of_week, frequency, xaxt="n")
axis(1, at=day_of_week, labels=c('Sun','Mon','Tues','Wed','Thurs','Fri','Sat'))

Postgres Duplicate Key Unique Constraint December 11, 2009

Posted by vyolian in database.
add a comment

In case you restored a table in Postgres and came across this error while inserting additional rows:


PGError: ERROR: duplicate key value violates unique constraint \"[tablename]_pkey\"\n: INSERT INTO ... VALUES(...) RETURNING \"id\"

Try:


select setval('[tablename]_id_seq', (select max(id) + 1 from [tablename]));