My favorite features in Java 7

So far following are my favorite new features in Java 7, I wonder why these were not added in earlier version.

1. The try-with-resources Statement

Any object that implements java.lang.AutoCloseable, which includes all objects which implement java.io.Closeable, can be used as a resource.

static String readFirstLineFromFile(String path) throws IOException {
          try (BufferedReader br = new BufferedReader(new FileReader(path))) {
                     return br.readLine();
          }
}

In this example, the resource declared in the try-with-resources statement is a BufferedReader. The declaration statement appears within parentheses immediately after the try keyword. The class BufferedReader, in Java SE 7 and later, implements the interface java.lang.AutoCloseable. Because the BufferedReader instance is declared in a try-with-resource statement, it will be closed regardless of whether the try statement completes normally or abruptly (as a result of the method BufferedReader.readLine throwing an IOException).

Prior to Java SE 7, you can use a finally block to ensure that a resource is closed regardless of whether the try statement completes normally or abruptly. The following example uses a finally block instead of a try-with-resources statement:

static String readFirstLineFromFileWithFinallyBlock(String path) throws IOException {
       BufferedReader br = new BufferedReader(new FileReader(path));
       try {
           return br.readLine();
       } finally { // no need of closing in finally clause in Java 7
          if (br != null) br.close();
       }
}

And you can declare more than one resource to close:

try (
 InputStream in = new FileInputStream(src);
 OutputStream out = new FileOutputStream(dest))
{
 // code
}

2.Catching Multiple Exception Types and Rethrowing Exceptions with Improved Type Checking

In Java SE 7 and later, a single catch block can handle more than one type of exception. This feature can reduce code duplication and lessen the temptation to catch an overly broad exception.

Consider the following example, which contains duplicate code in each of the catch blocks:

   catch (IOException ex) {    
       logger.log(ex);   
       throw ex; }
   catch (SQLException ex) { 
       logger.log(ex);   
       throw ex; 
   } 

In releases prior to Java SE 7, it is difficult to create a common method to eliminate the duplicated code because the variable ex has different types.
The following example, which is valid in Java SE 7 and later, eliminates the duplicated code:

  catch (IOException|SQLException ex) {
    logger.log(ex);
    throw ex;
  }

The catch clause specifies the types of exceptions that the block can handle, and each exception type is separated with a vertical bar (|).

Note: If a catch block handles more than one exception type, then the catch parameter is implicitly final. In this example, the catch parameter ex is final and therefore you cannot assign any values to it within the catch block.

Bytecode generated by compiling a catch block that handles multiple exception types will be smaller (and thus superior) than compiling many catch blocks that handle only one exception type each. A catch block that handles multiple exception types creates no duplication in the bytecode generated by the compiler; the bytecode has no replication of exception handlers.

3.Type Inference for Generic Instance Creation

You can replace the type arguments required to invoke the constructor of a generic class with an empty set of type parameters () as long as the compiler can infer the type arguments from the context. This pair of angle brackets is informally called the diamond.

For example, consider the following variable declaration:

    Map<String, List> myMap = new HashMap<String, List>();

In Java SE 7, you can substitute the parameterized type of the constructor with an empty set of type parameters ():

Class projects for Hadoop

Best way of learning anything is by doing it. To master Hadoop ecosystem you need to go beyond Word Count program. Here are list of some projects which I think of working on if I get time. This can be a good list of class projects for Hadoop.

1) Matrix Decomposition routines (QR, Cholesky etc)

2) Decision Trees with ID3, C4.5 or other heuristic (https://issues.apache.org/jira/b… ).

Note: It looks like Mahout has a partial implementation of random decision forest, you may be able to use it to test your code (if questions arise please ask on Mahout mailing list, the community there is very helpful):
https://cwiki.apache.org/MAHOUT/…
https://cwiki.apache.org/MAHOUT/…
https://cwiki.apache.org/MAHOUT/…

3) Linear Regression https://cwiki.apache.org/conflue… ,

Ordinary Least Squares or other linear least squares methods: http://en.wikipedia.org/wiki/Ord…

4) Gradient Descent and other optimization and linear programming algorithms, seeConvex Optimization: What are some good resources for learning about distributed optimization? , What are some fast gradient descent algorithms? , Matlab optimization toolbox: http://www.mathworks.com/help/to… Convex Optimization: Which optimization algorithms are good candidates for parallelization with MapReduce?

5) AdaBoost and other meta-algorithms: http://en.wikipedia.org/wiki/Ada…

6) SVM:

https://issues.apache.org/jira/b…

https://issues.apache.org/jira/b…

https://issues.apache.org/jira/b…

Support Vector Machines: What is the best way to implement an SVM using Hadoop?

7) Vector space models http://en.wikipedia.org/wiki/Vec…

8) Hidden Markov Models – an extremely popular method in NLP & bioinformatics.

9) Slope One by Daniel Lemirehttp://en.wikipedia.org/wiki/Slo… or otherCollaborative Filtering algorithms.

See Mahout in Action by Sean Owen:http://www.manning.com/owen/

10) DFT/FFT, Wavelets, z-transform, other popular signal and image processing transforms, see Matlab Signal Processing toolbox: http://www.mathworks.com/help/to… ,  Image Processing toolbox: http://www.mathworks.com/help/to…  Wavelet Toolbox http://www.mathworks.com/help/to… also see OpenCV catalog: http://opencv.willowgarage.com/w…

11) PageRank, here is a good tutorial: http://michaelnielsen.org/blog/u…

12) Build an eigensolver: http://www.cs.cmu.edu/~ukang/pap…

13) For a wealth of open ended problems see Programming Challenges: What are some good “toy problems” in data science?

Notes: