Ian Ma

Life Labs

Installing RMySQL on Mac Mountain Lion

1. Download MySQL from dev.mysql.com. Make sure to get the 64-bit version.

2. Uncompress mysql-xxx.tar.gz somewhere which we’ll now refer to as MYSQL_HOME. It should contain directories such as bin, lib, and include.

3. Download R from CRAN.

4. Download RMySQL package source from CRAN. We’ll refer to the path of that RMySQL_xxx.tar.gz as RMYSQL_PATH.

5. Before installing RMySQL, we need to place libmysqlclient.18.dylib in a place where RMySQL can find it. sudo ln -s MYSQL_HOME/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib.

6. In terminal, use the following command to install RMySQL. R CMD INSTALL –configure-args=’–with-mysql-dir=MYSQL_HOME –with-mysql-inc=MYSQL_HOME/include –with-mysql-lib=MYSQL_HOME/lib’ RMYSQL_PATH. This is documented as one of the options in the installation guide.

7. library(RMySQL) should know work in the R interactive shell.

 

These steps will get you over these ugly errors:

** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for ‘RMySQL’, details:
call: dyn.load(file, DLLpath = DLLpath, …)
error: unable to load shared object ‘/usr/local/Cellar/r/2.15.2/R.framework/Versions/2.15/Resources/library/RMySQL/libs/RMySQL.so’:
dlopen(/usr/local/Cellar/r/2.15.2/R.framework/Versions/2.15/Resources/library/RMySQL/libs/RMySQL.so, 6): Library not loaded: libmysqlclient.18.dylib
Referenced from: /usr/local/Cellar/r/2.15.2/R.framework/Versions/2.15/Resources/library/RMySQL/libs/RMySQL.so
Reason: image not found
Error: loading failed
Execution halted
ERROR: loading failed

– and –

Configuration error:
could not find the MySQL installation include and/or library
directories. Manually specify the location of the MySQL
libraries and the header files and re-run R CMD INSTALL.

Postgresql Create Random Dates

NOW() – ’1 day’::INTERVAL * ROUND(RANDOM() * 100)

Credit to ilovebonnie

Access restriction: Java OpenJdk Rt.jar

Environment: Ubuntu Jaunty 9.10, Eclipse
Error: “Access restriction: The type HttpServer is not accessible due to restriction on required library /usr/lib/jvm/java-6-openjdk/jre/lib/rt.jar”
Solution (quick): You need to configure your project from Java OpenJdk to Sun’s Java

Solution (detailed):

  1. Make sure you have sun-java6-bin and sun-java6-jre installed (using Synaptics Package Manager perhaps)
  2. In Eclipse, RClick JRE System Library > Build Path > Configure Build Path
  3. Add Library > Alternate JRE > Installed JREs…
  4. Add > Standard VM > Next > Directory > /usr/lib/jvm/java-6-sun-1.6.x.xx
  5. Remove Openjdk JRE System Library

Ian Ma

Ruby OpenSSL and FTP

This is going to be my most random post. Ran into a series of problems related to openssl and ftp using ruby1.9. Here are just some useful links.

Problem: LoadError: no such file to load — openssl
Solution: ruby-v.v.v/ext/openssl; ruby extconf.rb; make; make install
Source: http://www.ruby-forum.com/topic/90083

Problem: Net::FTPTempError: 425 Can’t build data connection: Connection timed out
Solution: ftp.passive = true
Source: http://groups.google.com/group/ruby-talk-google/browse_thread/thread/2c2c5258f8beb83a

Create Instances From Database, Weka Java Eclipse

Setup and Configuration
1) Make sure you have Weka downloaded and Weka.jar extracted into a folder.
2) Create a new project
3) Navigate to “Java Build Path” -> “Libraries”
4) “Add External Class Folder” and choose your extracted weka folder

5) Navigate inside the extracted weka folder to weka/experiment. Look for two files, DataUtils.props and DataUtils.props.postgresql.
6) Create a backup of DataUtils.props and replace the contents of the original with that of DataUtils.props.postgresql.
7) Edit the jdbcURL to your own configurations, “jdbcURL=jdbc:postgresql://localhost:5432/my_database_name”

Optional: If you look below, I added the mapping “numeric=2″ to show that the Postgres type numeric should be translated as a Java double.

# JDBC driver (comma-separated list)
jdbcDriver=org.postgresql.Driver

# database URL
jdbcURL=jdbc:postgresql://localhost:5432/my_database_name

# specific data types
# string, getString() = 0;    --> nominal
# boolean, getBoolean() = 1;  --> nominal
# double, getDouble() = 2;    --> numeric
# byte, getByte() = 3;        --> numeric
# short, getByte()= 4;        --> numeric
# int, getInteger() = 5;      --> numeric
# long, getLong() = 6;        --> numeric
# float, getFloat() = 7;      --> numeric
# date, getDate() = 8;        --> date
# text, getString() = 9;      --> string
# time, getTime() = 10;       --> date

varchar=0
text=0
float4=2
float8=2
int4=5
oid=5
timestamp=8
date=8
numeric=2

Code

The code below shows two examples combined. One uses the configuration above to create instances directly from a query. The other is just a reference for general use of the database.

import java.sql.Connection;
import java.sql.DatabaseMetaData;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

import weka.core.Instances;
import weka.experiment.InstanceQuery;

public class Driver2 {

	public static void main(String[] args) throws ClassNotFoundException,
			SQLException, Exception {
		
		/***************************
		 * Instances from Database
		 ****************************/
		InstanceQuery query = new InstanceQuery();
		query.setUsername("my_username");
		query.setPassword("my_password");
		query.setQuery("SELECT price FROM products LIMIT 20");

		Instances data = query.retrieveInstances();
		System.out.println(data);
		
		/***************************
		 * General from Database
		 ****************************/
		// User Configurations
		String database = "my_database_name";
		String username = "my_username";
		String password = "my_password";

		// Create and Check Connection
		Class.forName("org.postgresql.Driver");
		Connection db = DriverManager.getConnection("jdbc:postgresql:"
				+ database, username, password);
		DatabaseMetaData dbmd = db.getMetaData();
		System.out.println("Connection to " + dbmd.getDatabaseProductName()
				+ " " + dbmd.getDatabaseProductVersion() + " successful.\n");

		// Query
		Statement sql = db.createStatement();
		ResultSet results = sql
				.executeQuery("SELECT id FROM products LIMIT 10");
		if (results != null) {
			while (results.next()) {
				System.out.println("id = " + results.getInt("id"));
			}
		}
		results.close();

		// Clean Up
		db.close();
	}
}

Weka with Java (Eclipse), Getting Started

Quick, rough guide to getting started with Weka using Java and Eclipse.

1) Make sure you’ve downloaded Weka

2) Create a new project in Eclipse. Find Java Build Path -> Libraries either during project creation or afterwards under “Package Explorer” -> RClick project -> Properties.

3) “Add External Jars…” and select the weka.jar from your download.

4) Create a class file under the “src” folder. This code is taken pretty much line for line from weka.wikispaces.

import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instance;
import weka.core.Instances;

public class Driver {

	public static void main(String[] args) throws Exception{
		
		 // Declare two numeric attributes
		 Attribute Attribute1 = new Attribute("firstNumeric");
		 Attribute Attribute2 = new Attribute("secondNumeric");
		 
		 // Declare a nominal attribute along with its values
		 FastVector fvNominalVal = new FastVector(3);
		 fvNominalVal.addElement("blue");
		 fvNominalVal.addElement("gray");
		 fvNominalVal.addElement("black");
		 Attribute Attribute3 = new Attribute("aNominal", fvNominalVal);
		 
		 // Declare the class attribute along with its values
		 FastVector fvClassVal = new FastVector(2);
		 fvClassVal.addElement("positive");
		 fvClassVal.addElement("negative");
		 Attribute ClassAttribute = new Attribute("theClass", fvClassVal);
		 
		 // Declare the feature vector
		 FastVector fvWekaAttributes = new FastVector(4);
		 fvWekaAttributes.addElement(Attribute1);    
		 fvWekaAttributes.addElement(Attribute2);    
		 fvWekaAttributes.addElement(Attribute3);    
		 fvWekaAttributes.addElement(ClassAttribute);
		 
		 // Create an empty training set
		 Instances isTrainingSet = new Instances("Rel", fvWekaAttributes, 10);       
		 
		 // Set class index
		 isTrainingSet.setClassIndex(3);
		 
		 // Create the instance
		 Instance iExample = new Instance(4);
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(0), 1.0);      
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(1), 0.5);      
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(2), "gray");
		 iExample.setValue((Attribute)fvWekaAttributes.elementAt(3), "positive");
		 
		 // add the instance
		 isTrainingSet.add(iExample);
		 Classifier cModel = (Classifier)new NaiveBayes();	 
		 cModel.buildClassifier(isTrainingSet);

		 // Test the model
		 Evaluation eTest = new Evaluation(isTrainingSet);
		 eTest.evaluateModel(cModel, isTrainingSet);
		 
		 // Print the result à la Weka explorer:
		 String strSummary = eTest.toSummaryString();
		 System.out.println(strSummary);
		 
		 // Get the confusion matrix
		 double[][] cmMatrix = eTest.confusionMatrix();
		 for(int row_i=0; row_i<cmMatrix.length; row_i++){
			 for(int col_i=0; col_i<cmMatrix.length; col_i++){
				 System.out.print(cmMatrix[row_i][col_i]);
				 System.out.print("|");
			 }
			 System.out.println();
		 }
	}
}

Weka Output Class Predictions

I’m building a predictive model that’s time-series related. I wanted to visualize the instances I predicted incorrectly on a time-series plot. The first step, however, is to get it into a format that R is happy with. Here’s how to add a “predicted” column to your training file.

BASIC (only need existing columns):

In the Weka explorer, under the classify tab, click “More Options”. Make sure the “Store predictions for visualization” is checked.

Click “Start” to build and run the model.

When finished, right-click the model name from the Result List. Click on “Visualize classifier errors.”

Click “Save” in that new window and the outputted file will have the new predicted column.

To convert the resulting ARFF file to CSV, do “java weka.core.converters.CSVSaver -i your.arff -o your.csv”

ADVANCED (need excluded columns):

Say you have extra columns for debugging that you need to exclude before you use it for training — think instance IDs or date markers. Here’s how you would do that.

In the Explorer GUI, go to the classify tab

Choose “FilteredClassifier” under the “meta” folder

Go inside the FilteredClassifier options and choose your base classifier (J48)

In “filter” option, remove “AllFilter” and add “Unsupervised -> Attribute -> Remove”.

In the “Remove” option, choose the attribute index that you want to remove. Then click Add.

You’re now ready to run your model. Follow the latter steps above in “BASIC” to go through visualization and save the arff with the predicted column.

[R] Multiple Plots in Histogram

Might be a misleading title but that’s what I searched for when I wanted to plot multiple series on a histogram. Looks like what I should have looked for was barplot.

  t1 <- table( c(1,1,2,3,1,1,2,3,1) )
  t2 <- table( c(1,2,2,3,2,2,2,2,2) )
  t <- rbind(t1,t2)
  barplot(t, beside=TRUE)  

Dataframes in R (basic cheatsheet)

Construct

  #construct and initialize dataframe
  df <- data.frame(x=10:15, y=c(2,4,6,1,3,0)) 
  #construct empty dataframe
  df <- data.frame(x=numeric(0), y=numeric(0)) 
  #add new row
  df <- rbind(df, data.frame(x=16, y=-1))  

Get/Set

  column_names <- colnames(df)

  ys <- df$y

  y_row2 <- df$y[2]

  df$y[2] = 12
  range_xs <- range(df$x)

  range_xs_and_ys <- range( c(df$x, df$y) )

  num_rows <- nrow(df) 
  #select certain rows

  subset_xs <- df[ (df$x > 11 & df$y < 6),  ]

  subset_xs <- df[ with(df, x > 11 & y < 6),  ]

  subset_xs <- subset(df, x > 11 & y < 6)

  #select certain columns

  subset_ys <- subset(df, select=c(x,z))

  subset_ys <- subset(df, select=-c(y))

Manipulate

  ordered <- df[ order(df$y),  ]

Transform

  colnames(df) = c("rename_x", "rename_y")

  df <- transform(df, new_z=(x+y))

  #add column
  df <- merge(df, list(c=15:20), by=0, all.x=TRUE) 

3 Year Dating Anniversary

Follow

Get every new post delivered to your Inbox.