Geyword

Detailed Description

Apruary 7, 2015

Main parts, algorithm, and not easy concepts

First, we need to visit a relational database provided by the user and know the structure of the database through SQL queries. The main method can be found in mySQL.java. After that, we can know the database, table, column, and record information. Meanwhile, we also need to know the Foreign Key (FK) in the database, so we the FK.java are used to find the Foreign Key.

After we have the access to the relational database, we need to import it into Neo4j, a graphic database system. We then rebuild the data structure into Neo4j while our basic assumption is that there is no circle in the nodes of database. The main methods can be found in SQL2neo4j.java, which is the main class to implement the methods in Neo4j.java.

The following step is to search the data according to the query. This function is realized in KeywordSearchDao.java.

For the joined columns, we need to find the shortest path with minimum cost value among tables. To implement this method, we use the Steiner Tree algorithm. This function is implement in Join.java, which is the main function, Kruskai.java or SteineerTree.java used to find the shortest path between two nodes.

The following shows the major terms:

Node	Property
Database	value(database name), type(database)
Record	value(record value), type(record)
Column	value(column name), type(record), parent (table name)
Table	value(table name), type(table), parent (database)

Relational Label	Discription	Property
Belong to	Record - (belong to) - Column Column - (belong to) -Table Table - (belong to) - Database	Relation type: column-table, record-column, table-database
Link	In the same database, if 2 columns have FK constraint. In 2 different database, if 2 columns have the same name.	RelationType: PK-FK(same database column) SameNameCol(different database)
Path	If 2 columns can be joined together, there is a path between 2 tables.	RelationType: table.columnname cost: same database=1 different database=2

How this project work with other projects

Our project is working with MetaStore, Query (PrestoDB) and Visualization projects. We allow users to type in more than one keywords and show list of datasets contains these keywords, the data sourc is coming from MetaStore group. Each result item should include datasource name (id), table name, column name where keywords were found. And we also add a button to each search result to show data as a result. By clicking on the button, the query (http://54.174.80.167:7654/Query/{databaseid}/{tablename}) will be sent to Presto project via REST request. The query then be executed and the results we be sent back and showed in the html table. We also provide an access to visualization project. Requirements to start your project

It is quite easy to start our project, an account of Ubuntu, knowledge of using RESTful API and understanding of Neo4j database are required.

How to deploy project on AWS Ubuntu servers

Click on : Preview this instruction

Installation of our project

Usage: geyword.sh {install|start|rebuild}

intall: install required package and build project

rebuild: build/rebuild projects

start: start server