Detailed Description

Apruary 7, 2015

First, we need to visit a relational database provided by the user and know the structure of the database through SQL queries. The main method can be found in mySQL.java. After that, we can know the database, table, column, and record information. Meanwhile, we also need to know the Foreign Key (FK) in the database, so we the FK.java are used to find the Foreign Key.

After we have the access to the relational database, we need to import it into Neo4j, a graphic database system. We then rebuild the data structure into Neo4j while our basic assumption is that there is no circle in the nodes of database. The main methods can be found in SQL2neo4j.java, which is the main class to implement the methods in Neo4j.java.

The following step is to search the data according to the query. This function is realized in KeywordSearchDao.java.

For the joined columns, we need to find the shortest path with minimum cost value among tables. To implement this method, we use the Steiner Tree algorithm. This function is implement in Join.java, which is the main function, Kruskai.java or SteineerTree.java used to find the shortest path between two nodes.

The following shows the major terms:

Node Property
Database value(database name), type(database)
Record value(record value), type(record)
Column value(column name), type(record), parent (table name)
Table value(table name), type(table), parent (database)
Relational Label Discription Property
Belong to Record - (belong to) - Column
Column - (belong to) -Table
Table - (belong to) - Database
Relation type:
column-table, record-column, table-database
Link In the same database, if 2 columns have FK constraint.
In 2 different database, if 2 columns have the same name.
RelationType:
PK-FK(same database column)
SameNameCol(different database)
Path If 2 columns can be joined together, there is a path between 2 tables. RelationType:
table.columnname
cost:
same database=1
different database=2

Our project is working with MetaStore, Query (PrestoDB) and Visualization projects. We allow users to type in more than one keywords and show list of datasets contains these keywords, the data sourc is coming from MetaStore group. Each result item should include datasource name (id), table name, column name where keywords were found. And we also add a button to each search result to show data as a result. By clicking on the button, the query (http://54.174.80.167:7654/Query/{databaseid}/{tablename}) will be sent to Presto project via REST request. The query then be executed and the results we be sent back and showed in the html table. We also provide an access to visualization project.

It is quite easy to start our project, an account of Ubuntu, knowledge of using RESTful API and understanding of Neo4j database are required.

Click on : Preview this instruction

Usage: geyword.sh {install|start|rebuild}     

  • intall: install required package and build project     
  • rebuild: build/rebuild projects     
  • start: start server