Create Instances From Database, Weka Java Eclipse January 17, 2010
Posted by vyolian in weka.add a comment
Setup and Configuration
1) Make sure you have Weka downloaded and Weka.jar extracted into a folder.
2) Create a new project
3) Navigate to “Java Build Path” -> “Libraries”
4) “Add External Class Folder” and choose your extracted weka folder
5) Navigate inside the extracted weka folder to weka/experiment. Look for two files, DataUtils.props and DataUtils.props.postgresql.
6) Create a backup of DataUtils.props and replace the contents of the original with that of DataUtils.props.postgresql.
7) Edit the jdbcURL to your own configurations, “jdbcURL=jdbc:postgresql://localhost:5432/my_database_name”
Optional: If you look below, I added the mapping “numeric=2″ to show that the Postgres type numeric should be translated as a Java double.
# JDBC driver (comma-separated list) jdbcDriver=org.postgresql.Driver # database URL jdbcURL=jdbc:postgresql://localhost:5432/my_database_name # specific data types # string, getString() = 0; --> nominal # boolean, getBoolean() = 1; --> nominal # double, getDouble() = 2; --> numeric # byte, getByte() = 3; --> numeric # short, getByte()= 4; --> numeric # int, getInteger() = 5; --> numeric # long, getLong() = 6; --> numeric # float, getFloat() = 7; --> numeric # date, getDate() = 8; --> date # text, getString() = 9; --> string # time, getTime() = 10; --> date varchar=0 text=0 float4=2 float8=2 int4=5 oid=5 timestamp=8 date=8 numeric=2
Code
The code below shows two examples combined. One uses the configuration above to create instances directly from a query. The other is just a reference for general use of the database.
import java.sql.Connection;
import java.sql.DatabaseMetaData;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import weka.core.Instances;
import weka.experiment.InstanceQuery;
public class Driver2 {
public static void main(String[] args) throws ClassNotFoundException,
SQLException, Exception {
/***************************
* Instances from Database
****************************/
InstanceQuery query = new InstanceQuery();
query.setUsername("my_username");
query.setPassword("my_password");
query.setQuery("SELECT price FROM products LIMIT 20");
Instances data = query.retrieveInstances();
System.out.println(data);
/***************************
* General from Database
****************************/
// User Configurations
String database = "my_database_name";
String username = "my_username";
String password = "my_password";
// Create and Check Connection
Class.forName("org.postgresql.Driver");
Connection db = DriverManager.getConnection("jdbc:postgresql:"
+ database, username, password);
DatabaseMetaData dbmd = db.getMetaData();
System.out.println("Connection to " + dbmd.getDatabaseProductName()
+ " " + dbmd.getDatabaseProductVersion() + " successful.\n");
// Query
Statement sql = db.createStatement();
ResultSet results = sql
.executeQuery("SELECT id FROM products LIMIT 10");
if (results != null) {
while (results.next()) {
System.out.println("id = " + results.getInt("id"));
}
}
results.close();
// Clean Up
db.close();
}
}
Weka with Java (Eclipse), Getting Started January 16, 2010
Posted by vyolian in Uncategorized.1 comment so far
Quick, rough guide to getting started with Weka using Java and Eclipse.
1) Make sure you’ve downloaded Weka
2) Create a new project in Eclipse. Find Java Build Path -> Libraries either during project creation or afterwards under “Package Explorer” -> RClick project -> Properties.
3) “Add External Jars…” and select the weka.jar from your download.
4) Create a class file under the “src” folder. This code is taken pretty much line for line from weka.wikispaces.
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instance;
import weka.core.Instances;
public class Driver {
public static void main(String[] args) throws Exception{
// Declare two numeric attributes
Attribute Attribute1 = new Attribute("firstNumeric");
Attribute Attribute2 = new Attribute("secondNumeric");
// Declare a nominal attribute along with its values
FastVector fvNominalVal = new FastVector(3);
fvNominalVal.addElement("blue");
fvNominalVal.addElement("gray");
fvNominalVal.addElement("black");
Attribute Attribute3 = new Attribute("aNominal", fvNominalVal);
// Declare the class attribute along with its values
FastVector fvClassVal = new FastVector(2);
fvClassVal.addElement("positive");
fvClassVal.addElement("negative");
Attribute ClassAttribute = new Attribute("theClass", fvClassVal);
// Declare the feature vector
FastVector fvWekaAttributes = new FastVector(4);
fvWekaAttributes.addElement(Attribute1);
fvWekaAttributes.addElement(Attribute2);
fvWekaAttributes.addElement(Attribute3);
fvWekaAttributes.addElement(ClassAttribute);
// Create an empty training set
Instances isTrainingSet = new Instances("Rel", fvWekaAttributes, 10);
// Set class index
isTrainingSet.setClassIndex(3);
// Create the instance
Instance iExample = new Instance(4);
iExample.setValue((Attribute)fvWekaAttributes.elementAt(0), 1.0);
iExample.setValue((Attribute)fvWekaAttributes.elementAt(1), 0.5);
iExample.setValue((Attribute)fvWekaAttributes.elementAt(2), "gray");
iExample.setValue((Attribute)fvWekaAttributes.elementAt(3), "positive");
// add the instance
isTrainingSet.add(iExample);
Classifier cModel = (Classifier)new NaiveBayes();
cModel.buildClassifier(isTrainingSet);
// Test the model
Evaluation eTest = new Evaluation(isTrainingSet);
eTest.evaluateModel(cModel, isTrainingSet);
// Print the result à la Weka explorer:
String strSummary = eTest.toSummaryString();
System.out.println(strSummary);
// Get the confusion matrix
double[][] cmMatrix = eTest.confusionMatrix();
for(int row_i=0; row_i<cmMatrix.length; row_i++){
for(int col_i=0; col_i<cmMatrix.length; col_i++){
System.out.print(cmMatrix[row_i][col_i]);
System.out.print("|");
}
System.out.println();
}
}
}
Weka Output Class Predictions January 14, 2010
Posted by vyolian in Uncategorized.add a comment
I’m building a predictive model that’s time-series related. I wanted to visualize the instances I predicted incorrectly on a time-series plot. The first step, however, is to get it into a format that R is happy with. Here’s how to add a “predicted” column to your training file.
BASIC (only need existing columns):
In the Weka explorer, under the classify tab, click “More Options”. Make sure the “Store predictions for visualization” is checked.
Click “Start” to build and run the model.
When finished, right-click the model name from the Result List. Click on “Visualize classifier errors.”
Click “Save” in that new window and the outputted file will have the new predicted column.
To convert the resulting ARFF file to CSV, do “java weka.core.converters.CSVSaver -i your.arff -o your.csv”
ADVANCED (need excluded columns):
Say you have extra columns for debugging that you need to exclude before you use it for training — think instance IDs or date markers. Here’s how you would do that.
In the Explorer GUI, go to the classify tab
Choose “FilteredClassifier” under the “meta” folder
Go inside the FilteredClassifier options and choose your base classifier (J48)
In “filter” option, remove “AllFilter” and add “Unsupervised -> Attribute -> Remove”.
In the “Remove” option, choose the attribute index that you want to remove. Then click Add.
You’re now ready to run your model. Follow the latter steps above in “BASIC” to go through visualization and save the arff with the predicted column.
[R] Multiple Plots in Histogram December 24, 2009
Posted by vyolian in R.Tags: boxplot, histogram, R
add a comment
Might be a misleading title but that’s what I searched for when I wanted to plot multiple series on a histogram. Looks like what I should have looked for was barplot.
t1 <- table( c(1,1,2,3,1,1,2,3,1) ) t2 <- table( c(1,2,2,3,2,2,2,2,2) ) t <- rbind(t1,t2) barplot(t, beside=TRUE)
Dataframes in R (basic cheatsheet) December 17, 2009
Posted by vyolian in R.add a comment
Construct
#construct and initialize dataframe df <- data.frame(x=10:15, y=c(2,4,6,1,3,0))
#construct empty dataframe df <- data.frame(x=numeric(0), y=numeric(0)) #add new row df <- rbind(df, data.frame(x=16, y=-1))
Get/Set
column_names <- colnames(df) ys <- df$y y_row2 <- df$y[2] df$y[2] = 12
range_xs <- range(df$x) range_xs_and_ys <- range( c(df$x, df$y) ) num_rows <- nrow(df)
#select certain rows subset_xs <- df[ (df$x > 11 & df$y < 6), ] subset_xs <- df[ with(df, x > 11 & y < 6), ] subset_xs <- subset(df, x > 11 & y < 6) #select certain columns subset_ys <- subset(df, select=c(x,z)) subset_ys <- subset(df, select=-c(y))
Manipulate
ordered <- df[ order(df$y), ]
Transform
colnames(df) = c("rename_x", "rename_y")
df <- transform(df, new_z=(x+y))
#add column
df <- merge(df, list(c=15:20), by=0, all.x=TRUE)
Writing Source Code in WordPress December 13, 2009
Posted by vyolian in Development.add a comment
I always find myself looking this up and the search results being really bad. This is for easy access for myself. From http://en.support.wordpress.com/code/posting-source-code/.
[sourcecode language="<language from below>"]
...your code here...
[//sourcecode] <- except only single slash.
And languages that I actually care about:
- bash
- cpp
- csharp
- css
- java
- javascript
- bash
- ruby
- sql
- xml
Inserting From One Table Into Another – Postgres December 13, 2009
Posted by vyolian in database.add a comment
Inserts a subset of one table into another.
INSERT INTO <another_table> (<column1>, <column2>) SELECT column1, column2 FROM <table> WHERE <conditions>
Plotting Week Days in R December 13, 2009
Posted by vyolian in R.add a comment
Say you’re plotting a graph with week days vs frequency but your week days is in the form of numbers (Sunday is 0, Monday is 1, etc). Here’s how you label the x axis with the right tick marks so that Sunday is ‘Sun’, Monday is ‘Mon’, etc.
day_of_week <- c(0, 1, 2, 3, 4, 5, 6)
frequency <- c(10, 15, 2, 9, 15, 16, 7)
plot(day_of_week, frequency, xaxt="n")
axis(1, at=day_of_week, labels=c('Sun','Mon','Tues','Wed','Thurs','Fri','Sat'))
Postgres Duplicate Key Unique Constraint December 11, 2009
Posted by vyolian in database.add a comment
In case you restored a table in Postgres and came across this error while inserting additional rows:
PGError: ERROR: duplicate key value violates unique constraint \"[tablename]_pkey\"\n: INSERT INTO ... VALUES(...) RETURNING \"id\"
Try:
select setval('[tablename]_id_seq', (select max(id) + 1 from [tablename]));