Tuesday, August 31, 2010

How to parse a PDF

PDFBox is a Java API from Ben Litchfield that will let you access the contents of a PDF document. It comes with integration classes for Lucene to translate a PDF into a Lucene document.
 
JPedal is a Java API for extracting text and images from PDF documents.
 
PDFTextStream is a Java API for extracting text, metadata, and form data from PDF documents. It also comes with an integration module making it easier to convert a PDF document into a Lucene document.
 
XPDF is an open source tool that is licensed under the GPL. It's not a Java tool, but there is a utility called pdftotext that can translate PDF files into text files on most platforms from the command line.
 
Based on xpdf, there is a utility called pdftohtml that can translate PDF files into HTML files. This is also not a Java application.

How to change the encoding of a java String

String newStr = new String(someString.getBytes("UTF-8"));

Monday, August 30, 2010

JAVA -- write a java.sql.blob to File

public void saveToFile(Blob blob) {
                try {
                    File file = new File("c:/someFileName.ext");
                    FileOutputStream os = new FileOutputStream(file);
                    os.write(getBlobBytes(blob));
                } catch (Exception ex) {
                    ex.printStackTrace();
                    JOptionPane.showMessageDialog(null, "Error!");
                }
}



public byte[] getBlobBytes(Blob blob) throws Exception {
        final int MAXBUFSIZE = 4096;
        if (blob != null) {
            try {
                BufferedInputStream bis = new BufferedInputStream(blob
                        .getBinaryStream());
                ByteArrayOutputStream bo = new ByteArrayOutputStream();
                byte[] buf = new byte[MAXBUFSIZE];
                int n = 0;
                while ((n = bis.read(buf, 0, MAXBUFSIZE)) != -1) {
                    bo.write(buf, 0, n);
                }
                bo.flush();
                bo.close();
                bis.close();
                buf = null;
                return bo.toByteArray();
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        }
        return null;
    }

JDBC -- Inserting binary data

Inserting Data :

    public void saveMedia(File file, short type) {
        FileInputStream io=null;
        try{
            io = new FileInputStream(file);
        }catch(IOException ioEx){
            ioEx.printStackTrace();
        }
        try {
            PreparedStatement statement = connection.prepareStatement("insert into media (content,mediatype,fileName) values(?,?,?)");
            statement.setBinaryStream(1, io, file.length());
            statement.setShort(2, type);
            statement.setString(3,file.getName());
            statement.executeUpdate();
            connection.close();
        } catch (SQLException sqlEx) {
            sqlEx.printStackTrace();
        } catch(ClassNotFoundException cnfEx){
            cnfEx.printStackTrace();
        }
    }

Monday, August 23, 2010

Important matter in indexing --> Solr

Data sent to Solr is not immediately searchable, nor do deletions take immediate
effect. Like a database, changes must be committed frst. Unlike a database, there
are no distinct sessions (that is transactions) between each client, and instead there
is in-effect one global modifcation state. This means that if more than one Solr client
were to submit modifcations and commit them at similar times, it is possible for part
of one client's set of changes to be committed before that client told Solr to commit.
Usually, you will have just one process responsible for updating Solr. But if not, then
keep this in mind.

From :
Solr 1.4 Enterprise Search Server (Packt, 2009, 1847195881) 

index-time-boosting while posting an xml to solr

Here is a sample XML fle you can HTTP POST to Solr:

<add allowDups="false">
<doc boost="2.0">
<field name="id">5432a</field>
<field name="type" ...</field>
<field name="a_name" boost="0.5"></field>
<!-- the date/time syntax MUST look just like this (ISO-8601)-->
<field name="begin_date">2007-12-31T09:40:00Z</field>
</doc>
<doc>
<doc>
<field name="id">5432a</field>
<field name="type" ...
<field name="begin_date">2007-12-31T09:40:00Z</field>
</doc>
<!-- more here as needed -->
</add>


The allowDups defaults to false to guarantee the uniqueness of values in the feld
that you have designated as the unique feld in the schema (assuming you have such
a feld). If you were to add another document that has the same value for the unique
feld, then this document would override the previous document, whether it is
pending a commit or it's already committed. You will not get an error.
If you are sure that you will be adding a document that is not
a duplicate, then you can set allowDups to true to get a
performance improvement.

Boosting affects the scores of matching documents in order to affect ranking in 
score-sorted search results. Providing a boost value, whether at the document or
feld level, is optional. The default value is 1.0, which is effectively a non-boost.
Technically, documents are not boosted, only felds are. The effective boost value 
of a feld is that specifed for the document multiplied by that specifed for the feld.

Specifying boosts here is called index-time boosting, which is rarely
done as compared to the more fexible query-time boosting. Index-time
boosting is less fexible because such boosting decisions must be decided
at index-time and will apply to all of the queries.



From :
Solr 1.4 Enterprise Search Server (Packt, 2009, 1847195881)

Friday, August 20, 2010

The OSGi Architecture

The OSGi technology is a set of specifications that define a dynamic component system for Java. These specifications enable a development model where applications are (dynamically) composed of many different (reusable) components. The OSGi specifications enable components to hide their implementations from other components while communicating through services, which are objects that are specifically shared between components. This surprisingly simple model has far reaching effects for almost any aspect of the software development process.

Though components have been on the horizon for a long time, so far they failed to make good on their promises. OSGi is the first technology that actually succeeded with a component system that is solving many real problems in software development. Adopters of OSGi technology see significantly reduced complexity in almost all aspects of development. Code is easier to write and test, reuse is increased, build systems become significantly simpler, deployment is more manageable, bugs are detected early, and the runtime provides an enormous insight into what is running. Most important, it works as is testified by the wide adoption and use in popular applications like Eclipse and Spring. 

see :
http://www.osgi.org/About/WhatIsOSGi

Thursday, August 19, 2010

How to deploy Solr on Tomcat

  • Step 2 : Make a folder somewhere in your computer and name it 'solr_home' (it can have any name). I assume that you have made a folder with the path : C:\ solr-home.
  • Step 3 : Copy the following folders to you solr-home directory which you made in the last step. 
    1. apache-solr-x.x.x/example/lib
    2. apache-solr-x.x.x/example/solr/conf
    3. apache-solr-x.x.x/example/solr/bin
  • Step 4 : Copy the war file placed in the ./apache-solr-x.x.x/dist folder which has a name like apache-solr-x.x.x.war (where x.x.x is the version of your solr core) and paste it in your tomcat webapps directory. 
  • Step 5 : Rename the file 'solr-x.x.x.war'  to solr.zip.
  • Step 6 : Now you have to set the Solr home page in order to tell tomcat where to save your indexes.  The first way to approach this aim is to open the web.xml file in Notepad located in the Solr.zip/WEB-INF directory. Find the <env-entry> element (it should be commented by default). Copy it whole and paste it to the bottom of your xml doc. something like :
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>C:\ solr-home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry> .
Also you can set the solr home directory in tomcat configuration panel. To do that , right-click on the icon of tomcat in the notification area , select configure,  go to the  java tab, add the following line to the java options :
-Dsolr.solrhome=C:\solr-home

Friday, August 13, 2010

Java Message Service API (the JMS API)

General idea of messaging

Messaging is a form of loosely coupled distributed communication, where in this context the term 'communication' can be understood as an exchange of messages between software components. Message-oriented technologies attempt to relax tightly coupled communication (such as TCP network sockets, CORBA or RMI) by the introduction of an intermediary component, which in this case would be a queue. The latter approach allows software components to communicate 'indirectly' with each other. Benefits of this include message senders not needing to have precise knowledge of their receivers, since communication is performed using this queue. This is the first of two types: point to point and publish and subscribe.

Java Message Service API Overview

The Java Message Service (JMS) defines the standard for reliable Enterprise Messaging. Enterprise messaging, often also referred to as Messaging Oriented Middleware (MOM), is universally recognized as an essential tool for building enterprise applications. By combining Java technology with enterprise messaging, the JMS API provides a powerful tool for solving enterprise computing problems.

Enterprise messaging provides a reliable, flexible service for the asynchronous exchange of critical business data and events throughout an enterprise. The JMS API adds to this a common API and provider framework that enables the development of portable, message based applications in the Java programming language.

The JMS API improves programmer productivity by defining a common set of messaging concepts and programming strategies that will be supported by all JMS technology-compliant messaging systems.

The JMS API is an integral part of the Java 2, Enterprise Edition (J2EE) platform, and application developers can use messaging with components using J2EE APIs ("J2EE components").

Version 1.1 of the JMS API in the J2EE 1.4 platform has the following features:
  • Message-driven beans enable the asynchronous consumption of JMS messages.
  • Message sends and receives can participate in Java Transaction API (JTA) transactions.
  • J2EE Connector Architecture interfaces that allow JMS implementations from different vendors to be externally plugged into a J2EE 1.4 application server.
The addition of the JMS API enhances the J2EE platform by simplifying enterprise development, allowing loosely coupled, reliable, asynchronous interactions among J2EE components and legacy systems capable of messaging. As a developer, you can easily add new behavior to a J2EE application with existing business events by adding a new message-driven bean to operate on specific business events.

The J2EE platform's Enterprise JavaBeans (EJB) container architecture, moreover, enhances the JMS API in two ways:
  • By allowing for the concurrent consumption of messages
  • By providing support for distributed transactions, so that database updates, message processing, and connections to EIS systems using the J2EE Connector Architecture can all participate in the same transaction context.

See : 
http://www.oracle.com/technetwork/java/overview-137943.html

See also: Message-oriented middleware and Message passing

And a complete useful tutorial you cant miss :
http://download-llnw.oracle.com/javaee/1.3/jms/tutorial/1_3_1-fcs/doc/overview.html 

The Java Management Extensions (JMX) API

The JMX API is a standard API for management and monitoring of resources such as applications, devices, services, and the Java virtual machine.
Typical uses of the JMX technology include:
  • Consulting and changing application configuration.
  • Accumulating and publishing statistics about application behavior.
  • Notifying users or applications of state changes and erroneous conditions.
The JMX API includes remote access, so a remote management program can interact with a running application for the above purposes.

see :
http://openjdk.java.net/groups/jmx/
http://en.wikipedia.org/wiki/Java_Management_Extensions

tutorial for starting Spring Roo

This is where you can find a very good tutorial for Spring Roo version 1.0.2 , creating a Roo based project from the scratch :

http://www.lalitbhatt.com/tiki-index.php?page=Spring+Roo

Tuesday, August 10, 2010

how to prevent lack of memory while executing large jasper reports

        JRSwapFile swapFile =
                    new JRSwapFile(getServletContext().getRealPath("/report/swap/"), 1024 * 50/* 50 KB */, 2);
        virtualizer = new JRSwapFileVirtualizer(40, swapFile);
        virtualizer.setReadOnly(false);
        reportParam_.put(JRParameter.REPORT_VIRTUALIZER, virtualizer);

Monday, August 9, 2010

Software versioning

Software versioning is the process of assigning either unique version names or unique version numbers to unique states of computer software. Within a given version number category (major, minor), these numbers are generally assigned in increasing order and correspond to new developments in the software.

see :
http://en.wikipedia.org/wiki/Software_versioning

Apache Incubator

The Incubator project is the entry path into The Apache Software Foundation (ASF) for projects and codebases wishing to become part of the Foundation's efforts. All code donations from external organisations and existing external projects wishing to join Apache enter through the Incubator.
The Apache Incubator has two primary goals:

see :
http://incubator.apache.org/

Spring Roo

Spring Roo is an open source software tool that uses convention-over-configuration principles to provide rapid application development of Java-based enterprise software[1]. The resulting applications use common Java technologies such as Spring Framework, Java Persistence API, Java Server Pages, Apache Maven and AspectJ[2]. Spring Roo is a member of the Spring portfolio of projects.

see :
http://en.wikipedia.org/wiki/Spring_Roo
http://www.springsource.org/roo

Convention over configuration

Convention over Configuration (also known as Coding by convention) is a software design paradigm which seeks to decrease the number of decisions that developers need to make, gaining simplicity, but not necessarily losing flexibility.
The phrase essentially means a developer only needs to specify unconventional aspects of the application. For example, if there's a class Sale in the model, the corresponding table in the database is called sales by default. It is only if one deviates from this convention, such as calling the table "products_sold", that one needs to write code regarding these names.
When the convention implemented by the tool you are using matches your desired behavior, you enjoy the benefits without having to write configuration files. When your desired behavior deviates from the implemented convention, then you configure your desired behavior.

see :
http://en.wikipedia.org/wiki/Convention_over_configuration