Wednesday, February 16, 2011

Descriptors generated from a SDF file to Excel

Just came across the power mv output of descriptor calculation in an excel format,we have seen how to calculate the descriptor using the chemistry development kit,the same output can be generated on a excel sheet using jxl.jar, a Java API. Java Excel API is a mature, open source java API enabling developers to read, write, and modifiy Excel spreadsheets dynamically. The program works with cdk-1.3.4.jar and cdk-jchempaint-8.jar. Hope everyone is hacking their way across cdk ,please follow the blog and subscribe to the post, untill next time .wq:



/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Iterator;
import java.util.Map;
import jxl.Workbook;
import jxl.write.Label;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;
import jxl.write.WriteException;
import org.openscience.cdk.DefaultChemObjectBuilder;
import org.openscience.cdk.Molecule;
import org.openscience.cdk.exception.CDKException;
import org.openscience.cdk.interfaces.IMolecule;
import org.openscience.cdk.io.iterator.IteratingMDLReader;
import org.openscience.cdk.qsar.DescriptorEngine;
import org.openscience.cdk.qsar.DescriptorSpecification;
import org.openscience.cdk.qsar.DescriptorValue;
import org.openscience.cdk.qsar.descriptors.molecular.BCUTDescriptor;
import org.openscience.cdk.smiles.SmilesGenerator;
import org.openscience.cdk.smiles.SmilesParser;
/**
*
* @author Harish Sankar
*/
public class Main {
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws FileNotFoundException, CDKException, IOException, WriteException {
// TODO code application logic here
File sdfFile = new File("C:/CID_241.sdf");
WritableWorkbook workbook = Workbook.createWorkbook(new File("output.xls"));
WritableSheet sheet = workbook.createSheet("Sheet1",0);
IteratingMDLReader reader = new IteratingMDLReader(new FileInputStream(sdfFile), DefaultChemObjectBuilder.getInstance());
SmilesGenerator sg = new SmilesGenerator();
String smile = "";
int k = 0;
while (reader.hasNext()) {
IMolecule mol = (IMolecule)reader.next();
smile = sg.createSMILES(mol);
System.out.print(smile+"\n");
BCUTDescriptor descriptor = new BCUTDescriptor();
// Molecule molecule = (Molecule) new SmilesParser(DefaultChemObjectBuilder.getInstance()).parseSmiles("NC(CO)C(=O)O");
Molecule molecule = (Molecule) new SmilesParser(DefaultChemObjectBuilder.getInstance()).parseSmiles(smile);
// Second program
DescriptorEngine engine = new DescriptorEngine(DescriptorEngine.MOLECULAR,new String[]{"lib/cdk-1.3.7.jar"});
engine.process(molecule);
/**
* The function getProperties returns a Map value. Map represents a Key Value set.
* It is retrieved using an iterator. Since the key is of DescriptorSpecification class and the
* value is of type DescriptorValue we have to typecast the key and value with the same while retrieving.
* since they have not override the same to String.
*/
Map i = molecule.getProperties();
Iterator iterator = i.keySet().iterator();
/**
* The following prints the key and Value set.
*/
Label label0 = new Label(1, k, smile);
sheet.addCell(label0);
k++;k++;
while (iterator.hasNext()) {
Object l = iterator.next();
String key = ((DescriptorSpecification) l).getSpecificationReference().toString();
Label label1 = new Label(1,k,key.substring(key.lastIndexOf("#") + 1));
sheet.addCell(label1);
Label label2 = new Label(2,k,((DescriptorValue) i.get((DescriptorSpecification)l)).getValue().toString());
sheet.addCell(label2);
k++;
}
k++;k++;
}
workbook.write();
workbook.close();
System.out.print(k+"\n");
}
}
view raw gistfile1.java hosted with ❤ by GitHub


Screenshot of the Descriptors in Excel


5 comments:

  1. Nice!

    Have you considered tilting the data, with the descriptors in columns? Also, for each descriptor class, you can request the descriptor labels, which gives a label for each value, rather than one label for each class...

    ReplyDelete
  2. i used this code most of your programs are saying some access issues that cannot access
    org.openscience.cdk.DefaultChemObjectBuilder;
    org.openscience.cdk.interfaces.IMolecule;
    org.openscience.cdk.io.iterator.IteratingMDLReader;
    org.openscience.cdk.qsar.descriptors.molecular.BCUTDescriptor;

    i used to run the program from folder above org folder

    ReplyDelete
  3. I am facing the same problem it cannot access the library i have put the cdk jar file in jre/lib/ext folder

    ReplyDelete
  4. generated the excel file, but it only prints the top row. i think the second while loop doesn't get executed. what might be the problem? used cdk-1.4.1.jar...

    ReplyDelete
  5. hai,hmmm will take a look into it, no issues with cdk-1.4.1....

    ReplyDelete