2054: "Data Pipeline"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

Hiferator
Posts: 82
Joined: Fri Feb 15, 2013 8:23 am UTC

2054: "Data Pipeline"

Postby Hiferator » Wed Oct 03, 2018 3:25 pm UTC

Image
Title text: "Is the pipeline literally running from your laptop?" "Don't be silly, my laptop disconnects far too often to host a service we rely on. It's running on my phone."

Is this a reference to some specific example?

(Created with chridd's xkcd thread formatter.)

pebkac
Posts: 7
Joined: Mon Dec 05, 2016 8:50 pm UTC

Re: 2054: "Data Pipeline"

Postby pebkac » Wed Oct 03, 2018 3:45 pm UTC

I am a datawarehouse administrator, and I approve of this message.

alanbbent
Posts: 42
Joined: Sat Nov 06, 2010 5:46 pm UTC

Re: 2054: "Data Pipeline"

Postby alanbbent » Wed Oct 03, 2018 4:01 pm UTC

Ouch, this one is pointed. I'm always the one saying "I can automate the collection and parsing of that data!"

In my defense, I never intend for those scripts to be used by anyone but me. Because yeah, web sites and servers change, and the script needs to be updated. I'm just... the guy who is in charge of getting that data, and I automate it. And even if the script needs fixing once or twice a year, I still think it's way better than throwing my hands up and saying "We shouldn't try to automate this! Sometimes inputs change and it breaks the automation!"

I guess I've never run into anyone like Ponytail who points out how fragile the whole idea is. I'd probably look a lot like panel 3 if I did.

User avatar
HES
Posts: 4875
Joined: Fri May 10, 2013 7:13 pm UTC
Location: England

Re: 2054: "Data Pipeline"

Postby HES » Wed Oct 03, 2018 4:15 pm UTC

alanbbent wrote:In my defense, I never intend for those scripts to be used by anyone but me. Because yeah, web sites and servers change, and the script needs to be updated. I'm just... the guy who is in charge of getting that data, and I automate it. And even if the script needs fixing once or twice a year, I still think it's way better than throwing my hands up and saying "We shouldn't try to automate this! Sometimes inputs change and it breaks the automation!"

I mean, as long as you're within the bounds of https://xkcd.com/1205/ and not https://xkcd.com/1319/ , then what could go wrong?
He/Him/His Image

rmsgrey
Posts: 3477
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 2054: "Data Pipeline"

Postby rmsgrey » Wed Oct 03, 2018 4:23 pm UTC

So long as you plan for the scripts to break periodically, there's nothing wrong with the basic idea. The trick is planning for them to break and keeping the conglomeration robust rather than having everything collapse when something weird comes in...

richP
Posts: 190
Joined: Wed Aug 17, 2011 3:28 pm UTC

Re: 2054: "Data Pipeline"

Postby richP » Wed Oct 03, 2018 4:42 pm UTC

Hiferator wrote:...
Is this a reference to some specific example?

(Created with chridd's xkcd thread formatter.)

No universal example, but most of us have an example in our past.
I always thought of the process as less "house of cards" and more "meat grinder to sausage stuffer". Of course, my process usually involved grep, sed, awk, and maybe a PERL one-liner if things got really hairy.

User avatar
cellocgw
Posts: 1954
Joined: Sat Jun 21, 2008 7:40 pm UTC

Re: 2054: "Data Pipeline"

Postby cellocgw » Wed Oct 03, 2018 5:01 pm UTC

richP wrote:
Hiferator wrote:...
Is this a reference to some specific example?

No universal example, but most of us have an example in our past.
I always thought of the process as less "house of cards" and more "meat grinder to sausage stuffer". Of course, my process usually involved grep, sed, awk, and maybe a PERL one-liner if things got really hairy.


Contest Time!

Write a one-line shell command (80 char or less) using all of grep, sed, and awk that actually does something recognizable.

Prizes will be given on the basis of both style and output. Points taken off if the output is useful. Judges' determination of "useful" is final.
https://app.box.com/witthoftresume
Former OTTer
Vote cellocgw for President 2020. #ScienceintheWhiteHouse http://cellocgw.wordpress.com
"The Planck length is 3.81779e-33 picas." -- keithl
" Earth weighs almost exactly π milliJupiters" -- what-if #146, note 7

User avatar
ucim
Posts: 6587
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: 2054: "Data Pipeline"

Postby ucim » Wed Oct 03, 2018 5:06 pm UTC

cellocgw wrote: Points taken off if the output is useful.
Is it useful to win points?

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

User avatar
Sableagle
Ormurinn's Alt
Posts: 1935
Joined: Sat Jun 13, 2015 4:26 pm UTC
Location: The wrong side of the mirror
Contact:

Re: 2054: "Data Pipeline"

Postby Sableagle » Wed Oct 03, 2018 5:18 pm UTC

rmsgrey wrote:So long as you plan for the scripts to break periodically, there's nothing wrong with the basic idea. The trick is planning for them to break and keeping the conglomeration robust rather than having everything collapse when something weird comes in...

A supermarket chain over here failed to include "check the input makes sense" lines and someone added a new product to their system with volume 1 litre and mass 350 kg. Fortunately, the part of the system that decided each of these needed assigning its own van to transport it was followed by a part of the system that would let staff combine assigned vanloads indefinitely and pile "hundreds of tonnes" into 1 van.

Daggerfall had a system under which the mana cost of a spell from any school of magic got lower as the caster's skill in that school got higher. There was a check to make sure the total cost was at least +5, but by maxing out Destruction and always including a mighty firebolt in every custom spell, a player could get a super-awesome shield spell to cost (+380) + (-540) = (-160) 5 mana.

Then there was Heartbleed, in which the server code was streamlined by not bothering to check that it had even received the correct length of string in a protocol that sent a string back and forth to make sure the connection was alright. :roll:
Oh, Willie McBride, it was all done in vain.

User avatar
Flumble
Yes Man
Posts: 2082
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 2054: "Data Pipeline"

Postby Flumble » Wed Oct 03, 2018 6:32 pm UTC

cellocgw wrote:Write a one-line shell command (80 char or less) using all of grep, sed, and awk that actually does something recognizable.

Prizes will be given on the basis of both style and output. Points taken off if the output is useful. Judges' determination of "useful" is final.

Code: Select all

echo "'awk!' sed grep -e 'bit me finger!'"

It's a stylish 42 characters and it does absolutely nothing of interest. :D "Using" awk, grep and sed in the broadest sense possible.

...I have no idea how to use awk. It seems like a heavily outdated language and interpreter that should only exist today to support code that was written and checked 30 years ago.

richP
Posts: 190
Joined: Wed Aug 17, 2011 3:28 pm UTC

Re: 2054: "Data Pipeline"

Postby richP » Wed Oct 03, 2018 6:52 pm UTC

Sableagle wrote:A supermarket chain over here failed to include "check the input makes sense" lines and someone added a new product to their system with volume 1 litre and mass 350 kg.

350 kg/L? Your supermarket sells dark matter? or Liquid black holes?

NotAllThere
Posts: 140
Joined: Fri Aug 06, 2010 12:54 pm UTC

Re: 2054: "Data Pipeline"

Postby NotAllThere » Wed Oct 03, 2018 7:21 pm UTC

First comic I've laughed out loud over for a long time. And there's something I really need to address in my code tomorrow... :oops:
yangosplat wrote:So many amazing quotes, so little room in 300 characters!

User avatar
Archgeek
Posts: 205
Joined: Wed May 02, 2007 6:00 am UTC
Location: Central US
Contact:

Re: 2054: "Data Pipeline"

Postby Archgeek » Wed Oct 03, 2018 10:47 pm UTC

richP wrote:
Sableagle wrote:A supermarket chain over here failed to include "check the input makes sense" lines and someone added a new product to their system with volume 1 litre and mass 350 kg.

350 kg/L? Your supermarket sells dark matter? or Liquid black holes?

Nah, even common electron degenerate matter in white dwarf stars piles in at around a million kg/L. This is still nearly 15.5 x the density of osmium, though, so I'm going to guess they're selling liter containers of very compressed dense gas or highly compressible liquid, if anything can be crushed that hard.
"That big tube down the side was officially called a "systems tunnel", which is aerospace contractor speak for "big tube down the side."

Mikeski
Posts: 1044
Joined: Sun Jan 13, 2008 7:24 am UTC
Location: Minnesota, USA

Re: 2054: "Data Pipeline"

Postby Mikeski » Wed Oct 03, 2018 10:57 pm UTC

richP wrote:
Sableagle wrote:A supermarket chain over here failed to include "check the input makes sense" lines and someone added a new product to their system with volume 1 litre and mass 350 kg.

350 kg/L? Your supermarket sells dark matter? or Liquid black holes?


That's only 18 times the density of tungsten. No unproven dark-matter physics required, just get the delivery van up to 0.998c.

A black hole of that mass would have a volume about 70 orders of magnitude smaller than 1 liter.

User avatar
Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 3721
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

Re: 2054: "Data Pipeline"

Postby Soupspoon » Thu Oct 04, 2018 12:23 am UTC

Regardless, probably a lot more of it gets sold than needs to be delivered to the store, while every avocado mysteriously evaporates from stock.

(Half the first Google page for that search I just made pointed at Aussies doing it, I had to search down a bit to find the articles from home that I knew existed!)

User avatar
freezeblade
Posts: 1311
Joined: Fri Aug 24, 2012 5:11 pm UTC
Location: Oakland

Re: 2054: "Data Pipeline"

Postby freezeblade » Thu Oct 04, 2018 12:29 am UTC

Soupspoon wrote:Regardless, probably a lot more of it gets sold than needs to be delivered to the store, while every avocado mysteriously evaporates from stock.

(Half the first Google page for that search I just made pointed at Aussies doing it, I had to search down a bit to find the articles from home that I knew existed!)


I see this commonly in the US, except rung up as "bananas" in my area, which are typically somewhere around 19-29 cents a pound.
Belial wrote:I am not even in the same country code as "the mood for this shit."

fluffysheap
Posts: 38
Joined: Sat Sep 28, 2013 7:53 am UTC

Re: 2054: "Data Pipeline"

Postby fluffysheap » Thu Oct 04, 2018 3:54 am UTC

Mikeski wrote:That's only 18 times the density of tungsten. No unproven dark-matter physics required, just get the delivery van up to 0.998c.

A black hole of that mass would have a volume about 70 orders of magnitude smaller than 1 liter.

I was actually wondering about that. Tungsten, osmium, gold, or whatever conventional heavy materials are much too light, but then exotic materials (at least exotic on Earth) are much too heavy. There doesn't seem to be any physically reasonable material with that density.

Maybe stellar core material? What would have to be fusing to get the right density?

pscottdv
Posts: 67
Joined: Fri Feb 19, 2010 4:32 pm UTC

Re: 2054: "Data Pipeline"

Postby pscottdv » Thu Oct 04, 2018 11:13 am UTC

Hiferator wrote:Image
Title text: "Is the pipeline literally running from your laptop?" "Don't be silly, my laptop disconnects far too often to host a service we rely on. It's running on my phone."

Is this a reference to some specific example?

(Created with chridd's xkcd thread formatter.)


Obviously, he has worked for my company.

pscottdv
Posts: 67
Joined: Fri Feb 19, 2010 4:32 pm UTC

Re: 2054: "Data Pipeline"

Postby pscottdv » Thu Oct 04, 2018 11:19 am UTC

Flumble wrote:
cellocgw wrote:Write a one-line shell command (80 char or less) using all of grep, sed, and awk that actually does something recognizable.

Prizes will be given on the basis of both style and output. Points taken off if the output is useful. Judges' determination of "useful" is final.

Code: Select all

echo "'awk!' sed grep -e 'bit me finger!'"

It's a stylish 42 characters and it does absolutely nothing of interest. :D "Using" awk, grep and sed in the broadest sense possible.

...I have no idea how to use awk. It seems like a heavily outdated language and interpreter that should only exist today to support code that was written and checked 30 years ago.


I don't know. I never seem to get cut to work the way I want. awk seems to "just work"TM.

User avatar
Zamfir
I built a novelty castle, the irony was lost on some.
Posts: 7516
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: 2054: "Data Pipeline"

Postby Zamfir » Thu Oct 04, 2018 3:01 pm UTC

I once stumbled on the following, written by some seemingly sane and highly respected venture capital guy:
The example I often give here is of a VP of Something or Other in a big company who every month downloads data from an internal system into a CSV, imports that into Excel and makes charts, pastes the charts into PowerPoint and makes slides and bullets, and then emails the PPT to 20 people. Tell this person that they could switch to Google Docs and they’ll laugh at you; tell them that they could do it on an iPad and they’ll fall off their chair laughing. But really, that monthly PowerPoint status report should be a live SaaS dashboard that’s always up-to-date, machine learning should trigger alerts for any unexpected and important changes, and the 10 meg email should be a Slack channel. Now ask them again if they want an iPad.


And I thought NOOOOOOOOOOOOO . They VP guy has a working, robust system that he understands. You want to replace it by a black box, that breaks the next time that someone changes a column name in the CSV?

SuicideJunkie
Posts: 345
Joined: Sun Feb 22, 2015 2:40 pm UTC

Re: 2054: "Data Pipeline"

Postby SuicideJunkie » Thu Oct 04, 2018 3:39 pm UTC

I think the biggest problems my pipeline has had is dealing with IT changes to the network.

The fact that most of the inputs I trust aren't touched by humans helps a lot.
And of the inputs that are touched by humans, the scripts are mostly doing error checking and reporting on the problems they find.

Also, my pipeline is less of a krazy-straw and more of a 50 pack of regular straws filling the second desktop with a rainbow of status colors and countdown timers.

User avatar
Flumble
Yes Man
Posts: 2082
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 2054: "Data Pipeline"

Postby Flumble » Thu Oct 04, 2018 3:44 pm UTC

Zamfir wrote:I once stumbled on the following, written by some seemingly sane and highly respected venture capital guy:
... But really, that monthly PowerPoint status report should be a live SaaS dashboard ...


That part of the quote absolutely makes sense to me: do all the information processing on the server and export as little info as feasible. (and of course only expose that dashboard to the internal network, preferably only to the VP's account) That VP is a walking liability with a detailed CSV on his laptop that he takes to conventions and probably has the password 'welcome1!'.

User avatar
Zamfir
I built a novelty castle, the irony was lost on some.
Posts: 7516
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: 2054: "Data Pipeline"

Postby Zamfir » Thu Oct 04, 2018 4:09 pm UTC

If it works, that would be fine. But it's going to break, like the comic says. And then it's a black box.

Programming people underestimate this power of excel: regular people understand it.

User avatar
cellocgw
Posts: 1954
Joined: Sat Jun 21, 2008 7:40 pm UTC

Re: 2054: "Data Pipeline"

Postby cellocgw » Thu Oct 04, 2018 5:05 pm UTC

Zamfir wrote:If it works, that would be fine. But it's going to break, like the comic says. And then it's a black box.

Programming people underestimate this power of excel: regular people understand it.


No, regular people think they understand it. In reality, they understand just enough to produce results which look great but are often wrong.
https://app.box.com/witthoftresume
Former OTTer
Vote cellocgw for President 2020. #ScienceintheWhiteHouse http://cellocgw.wordpress.com
"The Planck length is 3.81779e-33 picas." -- keithl
" Earth weighs almost exactly π milliJupiters" -- what-if #146, note 7

User avatar
Zamfir
I built a novelty castle, the irony was lost on some.
Posts: 7516
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: 2054: "Data Pipeline"

Postby Zamfir » Thu Oct 04, 2018 5:20 pm UTC

Nah, there are lots of people who make competent use of excel, but who could not write a simple script, let alone some server-based app with all the surrounding complications.

Viqsi
Posts: 4
Joined: Fri Aug 21, 2009 4:11 am UTC

Re: 2054: "Data Pipeline"

Postby Viqsi » Thu Oct 04, 2018 5:30 pm UTC

That's so totally us. Me as the ponytail, my father with the laptop, and our supervisor with the hat. (He doesn't wear a hat, but I do wear my hair in a ponytail. And yes, my father and I are both on the same development team.)

Really, all Laptop Guy has to do is throw in something about how it's software built for folks who know what they're doing rather than for end users (probably with some sort of meretricious reference to the controls of a fighter plane) and the comparison becomes exact.

User avatar
Old Bruce
Posts: 174
Joined: Tue Jun 28, 2016 2:27 pm UTC

Re: 2054: "Data Pipeline"

Postby Old Bruce » Thu Oct 04, 2018 10:40 pm UTC

SuicideJunkie wrote:... a 50 pack of regular straws filling the second desktop with a rainbow of status colors and countdown timers.

I want to work where you work and I would do crazy amounts of drugs all day.

Tub
Posts: 409
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: 2054: "Data Pipeline"

Postby Tub » Thu Oct 04, 2018 11:32 pm UTC

cellocgw wrote:Contest Time!

Write a one-line shell command (80 char or less) using all of grep, sed, and awk that actually does something recognizable.

Prizes will be given on the basis of both style and output. Points taken off if the output is useful. Judges' determination of "useful" is final.

Here's one:

Code: Select all

cat .bash_history | grep grep | awk '/awk/' | sed -e 's/sed/sed/;t;d' | sort -u

Purpose: find candidates for submission to this contest
Usefulness: Yields no results on all logins I've tried, so the output clearly isn't useful. Even if there were output, it's use is questionable.

User avatar
CatCube
Posts: 38
Joined: Wed Sep 21, 2011 5:28 pm UTC

Re: 2054: "Data Pipeline"

Postby CatCube » Sat Oct 06, 2018 4:09 am UTC

cellocgw wrote:
Zamfir wrote:If it works, that would be fine. But it's going to break, like the comic says. And then it's a black box.

Programming people underestimate this power of excel: regular people understand it.


No, regular people think they understand it. In reality, they understand just enough to produce results which look great but are often wrong.


What will implementing this as a software black box fix? If you have a programmer implement a bad model, then now you have a bad model that literally nobody can inspect, but with a shinier interface. Plus, if it becomes apparent that something needs to change, instead of the user being able to do it, now you have to have a programmer do it, who may not be involved in the problem on a day-to-day basis and will have to relearn it. Or you make your shiny interface the lord and master of your organization, and everybody else is reduced to "Computer says no."

I'm a structural engineer, and I'm always pissed off when I have to move my work from Excel to Python (usually because of size limitations), because in Excel I can follow the logic line-by-line in a convenient tabular format, where when I have Python chewing on it I have to struggle with the debugger and its rather limited view of the internal state of the program and data to follow what's going on while debugging. (This is separate from dealing with finite element programs, which can make things easy so long as you understand the modeling assumptions--and don't make mistakes subtle enough to lead you astray.)

Python is obviously still better than not doing the task due to Excel's limitations, but it's frustrating compared to being able to see what's going on. At the end of the day, I'm trying to use my computer as a computing machine. That is, I just want it to do the arithmetic bitchwork for me. Excel is great for that.

User avatar
Zamfir
I built a novelty castle, the irony was lost on some.
Posts: 7516
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: 2054: "Data Pipeline"

Postby Zamfir » Sat Oct 06, 2018 5:28 pm UTC

I'm a structural engineer, and I'm always pissed off when I have to move my work from Excel to Python (usually because of size limitations), because in Excel I can follow the logic line-by-line in a convenient tabular format, where when I have Python chewing on it I have to struggle with the debugger and its rather limited vie

Yeah, I know exactly this problem. I partially tackle this by running Python in Spyder, a MATLAB clone with a variable viewer etc. It encourages matlab- style programming where everything and it's mother goes to the global scope, for easy inspection. But it's still Python, so it's much easier than in matlab to switch to proper limited scoping when needed.

The other aid for me is Pandas, a table-based data library. You can use it for "excel" style work, where you add a new column for every derived variable.

Hafting
Posts: 62
Joined: Thu Feb 24, 2011 11:23 am UTC

Re: 2054: "Data Pipeline"

Postby Hafting » Mon Oct 08, 2018 11:42 am UTC

Tub wrote:Here's one:

Code: Select all

cat .bash_history | grep grep | awk '/awk/' | sed -e 's/sed/sed/;t;d' | sort -u

Purpose: find candidates for submission to this contest
Usefulness: Yields no results on all logins I've tried, so the output clearly isn't useful. Even if there were output, it's use is questionable.


A slight modification turns up at least one example:

Code: Select all

history|grep grep | awk '/awk/' | sed -e 's/sed/sed/;t;d' | sort -u


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: Yahoo [Bot] and 30 guests