Exercise Sqoop
Install crime-data-la database in MySQL
Copy datasets crime_data_la.csv and crime_data_area_name.csv in /user/ubuntu/crime_la in HDFS.
From the console, copy crime_data_la.csv and crime_data_area_name.csv to the local filesystem using:
hadoop fs -copyToLocal /user/ubuntu/crime_la/crime_data_la.csv
hadoop fs -copyToLocal /user/ubuntu/crime_la/crime_data_area_name.csv
ls -a
Create a new database crime_la in MySQL
mysql -u root -p
CREATE DATABASE crime_la;
USE crime_la;
Create table crime_data_la and area_lookup:
CREATE TABLE crime_data_la
(dr_number INT, date_reported VARCHAR(255), date_occured VARCHAR(255),
tm_occured INT, area_id INT, reporting_district INT, crime_code INT, victime_age INT,
victim_sex VARCHAR(255), victim_descent VARCHAR(255), coord_lat FLOAT, coord_long FLOAT);
alter table crime_data_la add primary key (dr_number);
CREATE TABLE area_lookup (area_id INT, area_name VARCHAR(255));
alter table area_lookup add primary key (area_id);
Load data into crime_la database:
LOAD DATA LOCAL INFILE 'crime_data_la.csv'
INTO TABLE crime_data_la
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
(dr_number, date_reported, date_occured,tm_occured,area_id, reporting_district, crime_code, victime_age,
victim_sex, victim_descent, coord_lat, coord_long);
LOAD DATA LOCAL INFILE 'crime_data_area_name.csv'
INTO TABLE area_lookup
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(area_id, area_name);
Check the data has been correctly loaded:
SELECT * FROM crime_data_la LIMIT 10;
SELECT * FROM area_lookup;
Grant all privileges on crime_la
GRANT ALL PRIVILEGES
ON crime_la.*
TO 'ubuntu'@'%'
IDENTIFIED BY 'ubuntu';
Exercise 1
- Create the crime_la database in MySQL using the course material.
- Using sqoop, import crime_data_la table from MySQL into HDFS such that fields are separated by a ‘*' and lines are separated by '\n'.
- Null values are represented as -1 for numbers and "NA" for strings.
- Only records with date_reported between 2013-01-15 and 2014-05-27 should be imported.
- Compare date strings using CAST('date_string' AS DATE) or str_to_date( 'date_string' , '%Y-%m-%d’). Either way works.
CAST(date_reported AS DATE)
CAST('2015-05-27' AS DATE)
str_to_date(date_reported,'%Y-%m-%d')
str_to_date('2015-05-27','%Y-%m-%d')
- Use 3 mappers for importing.
- The destination file should be stored as a text file to directory /user/ubuntu/crime_la/export.
Exercise 2
- Create the crime_la database in Hive (i.e. in HUE).
Using sqoop, import area_lookup table from MySQL into crime_la Hive database such that fields are separated by a '|' and lines are separated by '\n'.
Only records with area_id between 5 and 20 should be imported.
Use 2 mappers for importing.
Overwrite the table if it already exists.