Search
  • Richard Loveday

Convert Data From Multiple Tables To RDF Using The SQL Lens

In our blog The fastest way to convert your data to RDF I went through the suite of DataLens ETL products and walked through using the Static File Lens to convert some sample data to RDF. In this blog I will take the opportunity to show how the SQL Lens can use a SQL query to extract whatever data you want from a database to convert into RDF.


All DataLens products can be found on the Data-Lens AWS Marketplace home page. The SQL Lens allows users to ingest data from SQL Databases with ease. All popular SQL Databases are supported (JDBC connection). Full user documentation is available here.


The Lenses are available as docker images to be used wherever you wish but in this blog I will show you how to use the lens by obtaining it as a product from the AWS MarketPlace and running it within AWS.


The sample data is going to be taken from 2 MySQL tables in a database. The first contains IMDB basic title information and the second contains user ratings information about the titles. The csv file containing the data for the basic title info to be uploaded to your SQL database of choice can be found here and the ratings data here.


Basic Title Info


Title Ratings Info


To help visualise the mapping we will create between the source SQL and target RDF I have drawn an Ontology. This is not a required step for the transformation but gives you a document/diagram describing your RDF model that you can discuss and share.


Unified Basic Title And Ratings Model

Once we know how our data is going to be modeled we can move on to the next part of the process which is the creation of a mapping file. The key elements of creating a mapping file are laid out in my first blog on how to convert data to RDF found here.

The difference for a SQL mapping file is the logical source.

rml:logicalSource [
  rml:source <#DB_source>;
  rr:sqlVersion rr:SQL2008;
  rr:tableName "titles_basics_small";
];

The above is the logical source used to return all the data from a table. For our purposes in this blog we need to use a SQL query to return the data from 2 tables so the rr:tableName line is removed and a SQL query put in its place. This is done using rml:query with the query surrounded in triple double-quotes """ [query] """. When using your own query, you must then specify the reference formula to be CSV as that is what the returned results will be parsed as. This is done with rml:referenceFormulation ql:CSV. For our mapping file the logical source with SQL query added is as below


rml:logicalSource [
 rml:source <#DB_source>;
 rr:sqlVersion rr:SQL2008;
 rml:query “””
 SELECT * FROM titles_basics_small left join titles_ratings_small
 on titles_basics_small.tconst = titles_ratings_small.ratings_tconst
 “””;
 rml:referenceFormulation ql:CSV
 ];

The only other difference to the mapping file is we also have to add the connection data for our database at the end of the file.


<#DB_source> a d2rq:Database;
    d2rq:jdbcDSN "jdbc:mysql://localhost:3306/imdb";
    d2rq:jdbcDriver "com.mysql.jdbc.Driver";
    d2rq:username "admin";
    d2rq:password "somepassword".

The d2rq:jdbcDSN specifies the URL of your database, and thed2rq:jdbcDriver specifies the JDBC driver for your DB type. d2rq:username and d2rq:password allow you to specify your security authentication credentials. Full details for creating mapping files including further jdbc connection options can be found in the documentation here.


Once the SQL specifics have been added we end up with the following mapping file. To follow the tutorial you will need to amend the <#DB_source> section to contain the connection details for the database you are using. A downloadable copy of the file is also available here. NB use the downloadable file if following the tutorial rather than copying the below as random spaces can be added if copying and pasting from blog codes snippets!


@prefix rr:  <http://www.w3.org/ns/r2rml#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ql:  <http://semweb.mmlab.be/ns/ql#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix imdb:  <http://www.imdb.com/> .
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
@prefix jdbc: <http://d2rq.org/terms/jdbc/> .@base <http://www.imdb-lens.co.uk/> .<TriplesMap1>
  a rr:TriplesMap;rml:logicalSource [
    rml:source <#DB_source>;
    rr:sqlVersion rr:SQL2008;
    rml:query """
    SELECT * FROM titles_basics_small left join 
    titles_ratings_small
    on titles_basics_small.tconst = 
    titles_ratings_small.ratings_tconst
    """;
  rml:referenceFormulation ql:CSV
  ];rr:subjectMap [
    rr:template "http://www.imdb.com/{tconst}";
    rr:class imdb:Title
  ];rr:predicateObjectMap [
    rr:predicate imdb:hasTitleType;
    rr:objectMap [
      rr:template "http://www.imdb-lens.co.uk/{titleType}";
  ]
 ];rr:predicateObjectMap [
    rr:predicate imdb:hasPrimaryTitle;
    rr:objectMap [
      rr:template "{primaryTitle}";
      rr:termType rr:Literal;
      rr:imdbtype xsd:string;
    ]
  ];rr:predicateObjectMap [
    rr:predicate imdb:hasOriginalTitle;
    rr:objectMap [
      rr:template "{originalTitle}";
      rr:termType rr:Literal;
      rr:imdbtype xsd:string;
    ]
  ];rr:predicateObjectMap [
    rr:predicate imdb:hasStartYear;
    rr:objectMap [
      rr:template "{startYear}";
      rr:termType rr:Literal;
      rr:imdbtype xsd:date;
    ]
  ];rr:predicateObjectMap [
    rr:predicate imdb:hasRunTimeMinutes;
    rr:objectMap [
      rr:template "{runtimeMinutes}";
      rr:termType rr:Literal;
      rr:imdbtype xsd:int;
    ]
  ];rr:predicateObjectMap [
    rr:predicate imdb:hasAverageRating;
    rr:objectMap [
      rr:template "{averageRating}";
      rr:termType rr:Literal;
      rr:imdbtype xsd:double;
    ]
  ];rr:predicateObjectMap [
    rr:predicate imdb:hasNumberOfVotes;
    rr:objectMap [
      rr:template "{numVotes}";
      rr:termType rr:Literal;
      rr:imdbtype xsd:int;
    ]
  ].<#DB_source> a d2rq:Database;
    d2rq:jdbcDSN "jdbc:mysql://localhost:3306/imdb";
    d2rq:jdbcDriver "com.mysql.jdbc.Driver";
    d2rq:username "admin";
    d2rq:password "somepassword".

With the source data and a mapping file we can now use the SQL Lens to generate our RDF. The below diagram gives a basic overview of how the SQL Lens works.


Typical AWS Architecture


Here we are using AWS infrastructure but the SQL lens is a docker image so can be used on whatever cloud or on prem infrastructure you like. Additionally, in this tutorial, we are using a REST call to trigger the lens but it could also be triggered by a cron expression as an environment variable input to the lens.


The Lens runs as a service in Amazon ECS. For security, it is run within a private subnet. The rest of the architectural information on the SQL Lens Cloudformation template and how to run it is essentially the same as in the Static File Lens blog here. The only differences for the SQL lens are as follows

  • The URL for the mapping file is wherever you wish to store your amended version of the original file in S3 on your AWS account.

  • There are SQL Limit and SQL Offset parameters in the cloudformation template which can be used in conjunction with the SQL query in the mapping file but for the purposes of this blog can be ignored.

  • The curl command to the SQL Lenses process endpoint to run the data transformation does not require an input parameter. We are getting the data to be transformed from a SQL database rather than an input file and the connection details for the database are in the mapping file.

  • The product page to access the SQL Lens is here.

The output from your N-Quads file should look like this

http://www.imdb.com/tt0000001> <http://www.imdb.com/hasAverageRating> “5.7” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000001> <http://www.imdb.com/hasNumberOfVotes> “1834” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000001> <http://www.imdb.com/hasOriginalTitle> “Carmencita” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000001> <http://www.imdb.com/hasPrimaryTitle> “Carmencita” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000001> <http://www.imdb.com/hasRunTimeMinutes> “1” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000001> <http://www.imdb.com/hasStartYear> “1894” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000001> <http://www.imdb.com/hasTitleType> <http://www.imdb-lens.co.uk/short> <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.imdb.com/Title> <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000002> <http://www.imdb.com/hasAverageRating> “6” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000002> <http://www.imdb.com/hasNumberOfVotes> “236” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000002> <http://www.imdb.com/hasOriginalTitle> “Le clown et ses chiens” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000002> <http://www.imdb.com/hasPrimaryTitle> “Le clown et ses chiens” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000002> <http://www.imdb.com/hasRunTimeMinutes> “5” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000002> <http://www.imdb.com/hasStartYear> “1892” <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .
<http://www.imdb.com/tt0000002> <http://www.imdb.com/hasTitleType> <http://www.imdb-lens.co.uk/short> <http://www.data-lens.co.uk/6bd8adc8-5588-475b-bdcd-e8b850f013eb> .

If you want to double-check the full output of your file an example of the output can be found here.

Now you know how to transform data into RDF quickly and easily using the SQL Lens. You can launch as many instances of the SQL Lens or any of the other Data-Lens products as you have requirements or data feeds for. Links to the AWS Market Place product pages for all the Lenses are below and full documentation for the Data-Lens products can be found here. If you have any queries feel free to contact us through our website here.

The Structured File Lens can be accessed here.

The SQL Lens can be accessed here.

The RESTful Lens can be accessed here.

The Document Lens can be accessed here.

The Lens Writer can be accessed here.

Happy converting!

3 views0 comments