SQL Server PolyBase to Twitter data

api

#1

Hello Developers,
I am simply trying to take advantage of some of the new components in SQL server 2016 & 2017 that allow users to connect and query Big data servers. i.e. Hadoop type servers like twitter. This new component to SQL Server is called PolyBase and give the user the ability to use all the MS-SQL language to manipulate Big data. Wow.

Due to this new method of connecting twitter data, there are not many examples on how to connect to the data. Microsoft has supplied a sample on to connect to Hadoop type servers and the user just needs to fill in the blanks. With this said I can’t seem to find the missing twitter information. Please help.

Below is the Microsoft sample code and link to the Microsoft Page
Hello Developers,
I am simply trying to take advantage of some of the new components in SQL server 2016 & 2017 that allow users to connect and query Big data servers. i.e. Hadoop type servers like twitter. This new component to SQL Server is called PolyBase and give the user the ability to use all the MS-SQL language to manipulate Big data. Wow.

Due to this new method of connecting twitter data, there are not many examples on how to connect to the data. Microsoft has supplied a sample on to connect to Hadoop type servers and the user just needs to fill in the blanks. With this said I can’t seem to find the missing twitter information. Please help.

Below is the Microsoft sample code and link to the Microsoft Page

– 1: Create a database scoped credential.
– Create a master key on the database. This is required to encrypt the credential secret.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘S0me!nfo’;

– 2: Create a database scoped credential for Kerberos-secured Hadoop clusters.
– IDENTITY: the Kerberos user name.
– SECRET: the Kerberos password

CREATE DATABASE SCOPED CREDENTIAL HadoopUser1
WITH IDENTITY = ‘<hadoop_user_name>’, Secret = ‘<hadoop_password>’;

– 3: Create an external data source.
– LOCATION (Required) : Hadoop Name Node IP address and port.
– RESOURCE MANAGER LOCATION (Optional): Hadoop Resource Manager location to enable pushdown computation.
– CREDENTIAL (Optional): the database scoped credential, created above.

CREATE EXTERNAL DATA SOURCE MyHadoopCluster WITH (
TYPE = HADOOP,
LOCATION =‘hdfs://10.xxx.xx.xxx:xxxx’,
RESOURCE_MANAGER_LOCATION = ‘10.xxx.xx.xxx:xxxx’,
CREDENTIAL = HadoopUser1
);

– 4: Create an external file format.
– FORMAT TYPE: Type of format in Hadoop (DELIMITEDTEXT, RCFILE, ORC, PARQUET).
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR =’|’,
USE_TYPE_DEFAULT = TRUE)

– 5: Create an external table pointing to data stored in Hadoop.
– LOCATION: path to file or directory that contains the data (relative to HDFS root).

CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] (
[SensorKey] int NOT NULL,
[CustomerKey] int NOT NULL,
[GeographyKey] int NULL,
[Speed] float NOT NULL,
[YearMeasured] int NOT NULL
)
WITH (LOCATION=’/Demo/’,
DATA_SOURCE = MyHadoopCluster,
FILE_FORMAT = TextFileFormat
);

– 6: Create statistics on an external table.
CREATE STATISTICS StatsForSensors on CarSensor_Data(CustomerKey, Speed)


#2

Twitter itself is not a Hadoop service. I’m unable to understand exactly what information you believe you need here? You’d need to use the Twitter APIs to access data from the service and then write additional code to deposit the data into your Hadoop or PolyBase storage.