Wednesday, December 4, 2013

DBMS for a data analysis project.

RDBMS VS NoSQL
Recently, I embarked on a small data analysis project. The project itself does not store any critical data. It retrieves from various sources, guides user through an analysis flow and finally generates a report. The generated report will be stored for future reference. Moreover, there is no scalability and transaction requirement. In conceivable future, it will be used by a dozen of persons. So what is a good technology stack for this?
The standard technology stack for this would be like this
  1. Mysql for database
  2. JPA(hibernate) for persistence
  3. DAO to encapsulate JPA.
  4. JSF/JSP for server-side web technology
  5. Bootstrap/jQuery/Angular JS for client technology 
 SOA and Angular JS are advocated along the way.

Why should RDBMS be used? We need to understand why RDBMS should be used in  the first place. 1)RDBMS is really good for Ad-hoc query through the whole database once indexes is built. But if you rarely do Ad-hoc query, why use it? For example, in one application managing research project, user only accesses his own project. The application does not allow user search/query all projects through various project's properties. In this case, we could organize data by project. We never do a query through project table. 2) RDBMS is good at structured data. If your data are not well defined or change constantly, you will end up modifying the table structure constantly or store most of data structure as blob.  Apparently, this is not a situation RDBMS can handle gracefully. 3) RDBMS excel at managing real-time transaction.

Apparently, NoSql is appropriate for the small data-analysis project mentioned above. First, we do not store any critical data.  The final result (report) is never really intensive queried. Second, we need to capture any information gathered during the analysis process so end user can retrospect how the analysis is performed. Apparently. 'any information' does not have a well defined structure.  The 'any information' can change at any time as the analysis flow is improved or changed. Most likely, the 'any information' will be serialized and stored in database as blob or clob. Third, we do not have transaction requirement.

Another consideration is angular JS. Where does the data come from if the angular JS is used?  Conceivably, WS in server can be designed to channel data in JSON to browser.  I did some research on NoSql database. I found the database CouchDB. CouchDB serves data directly through HTTP API. The data will be in JSON format and it stores data in JSON format, too.  If CouchDB is adopted for this project, I can imagine that even a J2EE server is not needed any more.  We only need HTML+angular js. All dynamic aspects are performed at client side.  What a nice and ligh-weight solution!




No comments:

Post a Comment