Quantcast
Channel: R Programming Archives - Mark Niemann-Ross
Viewing all articles
Browse latest Browse all 10

Party Buzz Kill: Data Storage

$
0
0

I’m at this party where Bob and Marsha and I are discussing the best languages for programming a Raspberry Pi. Bob advocates for Python, Marsha is a devout student of C. I’m defending my use of R. After all, Raspberry Pi starts with R. We have chased all the other guests out of the room with our conversation.

“With R, I have all sorts of built-in data management,” I say. “Manipulating matrices is in R’s basic DNA.”

Steve wanders in from the other room and joins our conversation. “Matrices aren’t a proper data strategy. You should be using a database. You can run SQLite on a Raspberry Pi with hardly any effort.”

Bob and Marsha simultaneously turn to stare me down. They are curious about how I’m going to get around this supposition.

“Sure. SQL with R–in particular SQLite, would have been easy to implement,” I pontificate. “Just call up RSQLite, push a few buttons, and Bob’s Your Uncle.”

“And that’s not what you did?” Steve is incredulous.

“I store the R object on disk and pull it into memory when I need it.”

“What kind of knucklehead stores data as a file on disk?”

I resist reminding Steve that SQLite stores data as a file on disk.

“Let me explain my code philosophy,” I tell Steve. Marsha and Bob roll their eyes; they have already heard this diatribe.

“My time is way more important than code execution speed. I’m not doing signal processing or high-speed computation. So I write the easiest code possible. The easiest for me.”

“In this case, I’m tracking eight observations (rows) of 366 variables (columns). The columns represent the day of the year. 366 columns accounts for leap years.”

I grab a napkin and sketch out the data object…

> str(waterByZone)
 num [1:8, 1:366] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:8] "rainfall" "neededInFront" "neededInRear" "wateredInFront" ...
  ..$ : NULL

> rownames(waterByZone)
[1] "rainfall"              "neededInFront"         "neededInRear"         
[4] "wateredInFront"        "wateredInRear"         "secondsWateredInFront"
[7] "secondsWateredInRear"  "evapotranspiration"

Okay…I am taking some liberties with this narrative. Nobody sketches out a data object on a napkin at a party. But it reads well. Ride along with me.

“The data I’m using for the irrigation system is a simple matrix. It updates once a day. The index is simply the day of the year.”

> yearDay <- as.POSIXlt(Sys.Date())$yday + 1
> yearDay
86

To access today’s rainfall, I subset the matrix:

# grab the matrix from disk
waterByZone <- readRDS("sprinklR/waterByZone.RDS")

# get today's rainfall
waterByZone["rainfall",yearDay]

# If I update waterByZone, save back to disk
saveRDS(waterByZone, "sprinklR/waterByZone.RDS")

“Why not use the tools R already provides?” I say.

Steve is skeptical. Marsha and Bob grab their napkins and pencils. I can see we are in for a debate.

How about you? If you were at my imaginary little party, what thoughts would you share?

By the way, did you know I teach R for LinkedIn Learning and Educative?

The post Party Buzz Kill: Data Storage appeared first on Mark Niemann-Ross.


Viewing all articles
Browse latest Browse all 10

Trending Articles