How To Use Lm Function In R

By , Executive Editor, Data & Analytics, Computerworld |

Well-nigh |

The focus here is on data: from R tips to desktop tools to taking a hard wait at information claims.

Useful R functions y'all might non know

tmaptools palette_explorer screen — Screenshot by Sharon Machlis

Almost every R user knows about popular packages like dplyr and ggplot2. Only with ten,000+ packages on CRAN and yet more on GitHub, information technology'due south not e'er like shooting fish in a barrel to unearth libraries with great R functions. One of the all-time way to find cool, new-to-you R code is to see what other useRs have discovered. And then, I'one thousand sharing a few of my discoveries -- and promise you'll share some of yours in render (contact info below).

Choose a ColorBrewer palette from an interactive app. Need a color scheme for a map or app? ColorBrewer is well known as a source for pre-configured palettes, and the RColorBrewer package imports those into R. Only information technology's non always like shooting fish in a barrel to remember what'south available. The tmaptools package's palette_explorer creates an interactive application that shows you the possibilities.

First, install tmaptools with install.packages("tmaptools"), then load tmaptools with library("tmaptools") and run palette_explorer() (or, don't load tmaptools and run tmaptools::palette_explorer() ). You'll see all available palettes as in the image above, as well every bit sliders to adjust options like number of colors. In that location'due south also info most bones syntax for using a color scheme below each group of palettes.

palette_explorer also needs shiny and shinyjs packages installed in order to generate the interactive app.

Create character vectors without quotation marks. It tin can exist a fleck abrasive to manually turn Firefox, Chrome, Edge, Safari, Internet Explorer, Opera into the c("Firefox", "Chrome", "Edge", "Safari", "InternetExplorer", "Opera") format R needs to utilize such text equally a vector of character strings.

That'due south what the Hmisc parcel's Cs part was designed to do. Afterward loading the Hmisc package,

                Cs(Firefox, Chrome, Edge, Safari, InternetExplorer, Opera)

volition evaluate the same as

                c("Firefox", "Chrome", "Edge", "Safari", "InternetExplorer", "Opera")

If you've e'er manually added quotation marks to a lengthy string of words, yous'll appreciate the elegance. Notation the lack of a space in Internet Explorer -- spaces will trip up the Cs part.

RStudio bonus: If you lot use RStudio, there's another option for sleek vector-string creation. Security pro Bob Rudis created an RStudio add together-in that takes selected comma-separated text and adds the necessary quotes and c(). And it tin can handle spaces. Install it with devtools::install_github("hrbrmstr/hrbraddins") (which ways y'all need the devtools bundle as well), and you'll see Bare Combine as an choice in the RStudio Tools > Addins carte.

You can run information technology from that Addins carte du jour, simply selecting text then leaving your coding window to get to the Tools > Addins menu to select Blank Combine doesn't necessarily feel less cumbersome than typing a few quotation marks. Much ameliorate to create a custom keyboard shortcut for the addin.

You can do that by going to Tools > Modify Keyboard Shortcuts. Scroll down until you see Bare Combine in the Addins section -- or search for Blank Combine in the filter box. Double click in the shortcut surface area and type the keystroke(due south) you want to assign to the addin (I usedalt-shift-').

Now, whatever time you want to turn comma-separated plain text into an R vector of character strings, you can highlight the text and apply your keyboard shortcuts.

By the mode, RStudio add together-ins are mostly just plain R. If yous'd similar having keyboard shortcuts for R tasks like this, it might be worth learning the syntax.

Finally, the datapasta package's vector_paste() offers some other anarchistic culling. You can copy a string like Firefox, Chrome, Edge, Safari, Net Explorer, Opera into your clipboard so run vector_paste(). That's it, only vector_paste(), and it converts your clipboard contents into R lawmaking, such as c("Firefox", "Chrome", "Edge", "Safari", "Internet Explorer", "Opera"). This works if there are tabs betwixt the words as well equally commas, or if each word is on its own line.

If you'd rather include data in your command, yous can use vector_paste() with a syntax such equally vector_paste("Firefox, Chrome, Safari, Edge") to generate the code such equally c("Firefox", "Chrome", "Safari", "Edge"). datapasta has some other groovy function, including df_paste(), which will turn a table copied into your clipboard from the Web, Excel, or other source, into code to generate a data frame.

Produce an interactive table with one line of lawmaking. Regardless of how much you similar and use the command line, sometimes information technology's withal dainty to look at a spreadsheet-like table of data to scan, sort and filter. RStudio provided a basic view like this; but for large data sets, I like RStudio's DT bundle, a wrapper for the DataTables JavaScript library. DT::datatable(mydf) creates an interactive HTML table; DT::datatable(mydf, filter = "top") adds a filter box in a higher place each row.

Easy file conversions. rio is one of my favorite R packages. Instead of remembering which functions to employ for importing what types of files (read.csv? read.table? read_excel?), rio vastly simplifies the process with one import function for a couple of dozen file formats. As long every bit the file extension is a format that rio recognizes, it will accordingly import from files such as .csv, .json, .xlsx and .html (tables). Aforementioned for rio'due south consign command if you'd like to relieve to a detail file format. Just rio has a third major part: convert, which will import and consign in a single footstep. Have a 1000000-row Excel file yous demand to save as a CSV? An HTML table you'd like to save equally JSON? Utilise a syntax similar catechumen("myfile.xlsx", "myfile.csv"), where the first argument is your existing file and the 2nd is your desired file with the desired extension, and your file will exist created.

Re-create and paste from R to your clipboard. rio bonus: You can copy between your clipboard and R with rio. Send some information from a small R variable to your clipboard with export(myRobject, "clipboard"). Importing to the clipboard should work as well, although I've had mixed success with that.

Import large files chop-chop - and save space. It recently took shut to 30 seconds when reading in a big spreadsheet. That'due south doable once, but annoying when I needed to admission it multiple times. For saving space every bit well equally wait fourth dimension, the fst parcel was an excellent choice because it offers compression as well every bit high operation. In my testing, write.fst(mydf, "myfile.fst", 100) -- maximum pinch -- was extremely quick -- and the .fst file took nigh one-3rd the space of the original spreadsheet.

Plough a data frame of numbers into one of percents. If you've got a data frame with one cavalcade of categories and the rest numbers -- imagine, say, a information frame showing election results by candidate and precinct -- the janitor package's adorn_percentages() will calculate all the percentages for you lot. You tin cull whether the denominator for each percent should exist summed by "row", "col" or "all". And, the function automatically assumes the outset row has category information and skips it, without you having to manually bargain with a not-numeric column.

janitor has several other handy functions worth knowing. adorn_totals() adds a totals row and/or cavalcade to a data frame. get_dupes() volition find duplicated rows in a information frame based on one or more than columns. And, clean_names() takes column names with spaces and other non-R-friendly characters in them and makes them R-compatible.

table() alternatives. Need to calculate frequencies of variables in a data frame? I like janitor'due south tabyl() function, which easily creates crosstabs with counts and percents and returns a data frame.

In addition, janitor's tabyl() can be used instead of base R'southward tabular array(), helpfully returning a conventional information frame with counts and percents.

A few boosted favorite functions from readers and social media:

"I'one thousand a huge fan of xtabs()" for crosstabs, Timothy Teravainen posted at Google+. "Information technology'due south in base R, but I sadly went years without knowing well-nigh it."

The format is xtabs(~df$col1 + df$col2), which will return a frequency tabular array with col1 as the rows and col2 every bit the columns.

More with quotes. In response to the Cs() role that adds quotes, Kwan Lowe touted the usefulness of noquote(), which strips quotes -- useful for importing sure types of data into R. noquote() is a base R office, aimed it making it easier to wrangle variables.

United nations-factoring factors. Another useful part: unfactor() in the varhandle parcel, which aims to detect the "existent" class of an R data frame column of factors then turn it into either numeric or character variables.

Text searching. If y'all've been using regular expressions to search for text that starts or ends with a certain character string, there's an easier way. "startsWith() and endsWith() -- did I actually not know these?" tweeted information scientist Jonathan Carroll. "That's it, I'm sitting downward and reading through dox for every #rstats function."

Loading packages -- and auto-installing if they're not present. For reproducible inquiry, an R script tin't but load external packages -- it'due south got to check whether those packages are loaded on the user's car and install them if they're not. There are several ways to do this in base R, such as using require() to cheque if various packages load and and then installing the packages if they're non. The pacman package simplifies this immensely. To load packages and install them from CRAN if not available, the syntax is: p_load("package1", "package2", "package3"). There'due south also a p_load_gh() version for packages on GitHub. Thank you to Twitter user @Himmie_He for the tip.

Identifying your project'southward home directory. The here package's here() function finds the working directory for a current R projection. This is especially handy for RStudio projects when a) your code needs access other directories and b) you'd like that lawmaking to work on other systems with a different directory structure. Thanks to Jenny Bryan and Hadley Wickham for that info via Twitter.

Get minimum and maximum values with a single command. Demand to find the minimum and maximum values in a vector? Base R'south range() function does just that, returning a 2-value vector with lowest and highest values. The help file says range() works on numeric and character values, simply I've also had success using it with date objects.

Extract or operate on items in a listing that are several layers deep. This is especially useful if you're working with XML or JSON data imported into R, or you'd like to operate on multiple data frames but keep them split up. For case, this task tweeted by @netzstreuner asking if there was a better manner to add together a cavalcade to each data frame in a list of identically structured data frames:

R question on Twitter — Question from @netzstreuner on Twitter about operating on a specific column in every information frame inside a list

The answer: purrr'southward modify_depth() function. modify_depth(mylist, 2, ~ myfunction) will run myfunction() on every item in mylist at the second level of that list.

That's for a generic list. Specifically for this question involving a list of information frames, dplyr's mutate() tin add a new column to one data frame. To do this for a list of information frames, you tin can combine mutate() and modify_depth(). Hither'southward my proposed solution to @netzstreuner's question:

                ll_edited <- modify_depth(ll, 2, ~ mutate(.10, b = a %% 2 == 0))

That code says: "For every item two levels deep in the list ll, add a cavalcade b computing if the value in column a is divisible past 2 with no residue."

Hands filter a list. dplyr::filter() is a super-like shooting fish in a barrel way to filter data frames. Have you ever wanted something like for lists? Cheque out the rlist bundle'due south list.filter() function, which uses the syntax list.filter(mydf, mycondition) such as the package's case of list.filter(10, type == "B").

Get a number from a string. Have character strings that should be numbers? readr's parse_number() tin handle formats such equallyparse_number("#three") andparse_number("1,012"). Columbia University stats lecturer Joyce Robbins noted on Twitter that you merely want to exist careful about negative numbers with certain formats. readr includes other handy parse_ functions, such every bit parse_time("4:xxx pm").

Preview an R Markdown document each time you salvage. "Simply a friendly reminder that xaringan:::inf_mr() works on whatever Rmd, and allows you to **live** preview your RMarkdown in the Viewer," information scientist Colin Fay tweeted. And that is indeed the case. Each time you save, a document volition be re-generated automatically without specifically needing to knit or render.

Check user input when writing a function. Base of operations R'south match.arg() lets you input a vector of canonical values for an statement, so users know that they've entered something that won't piece of work instead of getting a more generic fault message. That tip comes from Irene Steves' FUNctional programming tricks in httr tweeted by @dataandme.

Desire to share your own favorites? Tell me via Twitter @sharon000 or electronic mail at sharon_machlis@idg.com.

For more than on useful R functions, see Great R packages for data import, wrangling and visualization.

Sharon Machlis is Executive Editor, Data & Analytics at IDG, where she works on data analysis and in-house editor tools in addition to writing and editing. Her volume Practical R for Mass Communication and Journalism was published in December 2018.