美文网首页
SQL Queries from inside R

SQL Queries from inside R

作者: 腾子_Lony | 来源:发表于2017-08-30 01:33 被阅读0次

    In your life as a data scientist, you'll often be working with huge databases that contain tables with millions of rows. If you want to do some analyses on this data, it's possible that you only need a fraction of this data. In this case, it's a good idea to send SQL queries to your database, and only import the data you actually need into R.

    dbGetQuery() is what you need. As usual, you first pass the connection object to it. The second argument is an SQL query in the form of a character string. This example selects theagevariable from thepeopledataset where gender equals "male":

    dbGetQuery(con, "SELECT age FROM people WHERE gender = 'male'")

    Apart from checking equality, you can also check forless thanandgreater thanrelationships, with, just like in R.

    # Connect to the database

    library(DBI)

    con <- dbConnect(RMySQL::MySQL(),

    dbname = "tweater",

    host = "courses.csrrinzqubik.us-east-1.rds.amazonaws.com",

    port = 3306,

    user = "student",

    password = "datacamp")

    # Import post column of tweats where date is higher than '2015-09-21': latest

    latest<-dbGetQuery(con,"select post from tweats where date>'2015-09-21'")

    # Print latest

    latest

    # Create data frame specific

    specific<-dbGetQuery(con,"select message from comments where tweat_id=77 and user_id>4")

    # Print specific

    specific

    There are also dedicated SQL functions that you can use in theWHEREclause of an SQL query. For example,CHAR_LENGTH() returns the number of characters in a string.

    Of course, SQL does not stop with the the three keywordsSELECT,FROMandWHERE. Another very often used keyword isJOIN, and more specificallyINNER JOIN. Take this call for example:

    SELECT name, post

    FROM users INNER JOIN tweats on users.id = user_id

    WHERE date > "2015-09-19"

    Here, theuserstable is joined with thetweatstable. This is possible because theidcolumn in theuserstable corresponds to theuser_idcolumn in thetweatstable. Also notice howname, from theuserstable, andpostanddate, from thetweatstable, can be referenced to without problems.

    Can you predict the outcome of the following query?

    SELECT post, message

    FROM tweats INNER JOIN comments on tweats.id = tweat_id

    WHERE tweat_id = 77

    You've used dbGetQuery() multiple times now. This is a virtual function from the DBI package, but is actually implemented by the RMySQL package. Behind the scenes, the following steps are performed:

    Sending the specified query with dbSendQuery();

    Fetching the result of executing the query on the database with dbFetch();

    Clearing the result with dbClearResult().

    # Send query to the database

    res <- dbSendQuery(con, "SELECT * FROM comments WHERE user_id > 4")

    # Use dbFetch() twice

    dbFetch(res, n = 2)

    dbFetch(res)

    # Clear res

    dbClearResult(res)

    相关文章

      网友评论

          本文标题:SQL Queries from inside R

          本文链接:https://www.haomeiwen.com/subject/avczdxtx.html