Detailed Description
First, we need to visit a relational database provided by the user and know the structure of the database through SQL queries. The main method can be found in mySQL.java. After that, we can know the database, table, column, and record information. Meanwhile, we also need to know the Foreign Key (FK) in the database, so we the FK.java are used to find the Foreign Key.
After we have the access to the relational database, we need to import it into Neo4j, a graphic database system. We then rebuild the data structure into Neo4j while our basic assumption is that there is no circle in the nodes of database. The main methods can be found in SQL2neo4j.java, which is the main class to implement the methods in Neo4j.java.
The following step is to search the data according to the query. This function is realized in KeywordSearchDao.java.
For the joined columns, we need to find the shortest path with minimum cost value among tables. To implement this method, we use the Steiner Tree algorithm. This function is implement in Join.java, which is the main function, Kruskai.java or SteineerTree.java used to find the shortest path between two nodes.
The following shows the major terms:
Node | Property |
Database | value(database name), type(database) |
Record | value(record value), type(record) |
Column | value(column name), type(record), parent (table name) |
Table | value(table name), type(table), parent (database) |
Relational Label | Discription | Property |
Belong to | Record - (belong to) - Column Column - (belong to) -Table Table - (belong to) - Database |
Relation type: column-table, record-column, table-database |
Link | In the same database, if 2 columns have FK constraint. In 2 different database, if 2 columns have the same name. |
RelationType: PK-FK(same database column) SameNameCol(different database) |
Path | If 2 columns can be joined together, there is a path between 2 tables. | RelationType: table.columnname cost: same database=1 different database=2 |
Our project is working with MetaStore, Query (PrestoDB) and Visualization projects. We allow users to type in more than one keywords and show list of datasets contains these keywords, the data sourc is coming from MetaStore group. Each result item should include datasource name (id), table name, column name where keywords were found. And we also add a button to each search result to show data as a result. By clicking on the button, the query (http://54.174.80.167:7654/Query/{databaseid}/{tablename}) will be sent to Presto project via REST request. The query then be executed and the results we be sent back and showed in the html table. We also provide an access to visualization project.
It is quite easy to start our project, an account of Ubuntu, knowledge of using RESTful API and understanding of Neo4j database are required.
Click on : Preview this instruction
Usage: geyword.sh {install|start|rebuild}