Jobs Configuration
Jobs in rpeat are controlled through configuration files specified on start up. This is a quick overview of some of the most important options you have to help schedule, organize and manage your tasks, with examples and explanations. Everything is optional, so you can add complexity depending on your needs. We like to start simple, as we are simple people.
This is what a job might look like - click a parameter to see more detail:
[
{
// job metadata
Name: the job's name, not yours
Description: a short description about the job (optional)
Comment: an internal comment within config (optional)
Tags: an array of strings
Groups: a name grouping jobs in gui
Type: set as "template" to declare jobspec as template
Inherits: specify a template to inherit from
// job specifics
Cmd: the command to run - can be multiple lines
CronStart: a cron-syntax like 0 0 * * 1-5 or @daily
CronEnd
CronRestart
Jitter: add a random number of (maximum) seconds to CronStart
ShutdownCmd: graceful shutdown command or signal
ShutdownSig
// complex dependency triggers
Dependency: trigger based on one or more other jobs
// job parameters
Env: array of "key=value" variables to use
DateEnv: like Env, but for dates
Timezone: timezone to run each job
Calendar: calendar to determine valid dates
CalendarDirs: location of optional calendar files
RequireCal: behavior if calendar doesn't cover current period
Rollback: rollback to date before holiday?
// enabling and visibility
Hold: hold a job on rpeat startup
Disabled: remove job from scheduler, but keep config
Hidden: add job to scheduler, but remove from dashboard
// exception handling
Retry: how many attempts to restart a failing job
RetryWait: duration between retry attempts
MaxDuration: maximum runtime before stopping
AlertAction how and what should trigger an alert
// logging
TmpDir: where are temp files written
Logging: how should logs be named, created, and rotated
// permissions
User: who is owner of the job
Admin: who else has access
Permissions: what level of access do admins get.
},
...
]
Name, Description, Comment [top]
"Name":"The most important job ever!"
This feels obvious and frankly almost necessary, but it isn't. Choose something descriptive and useful, but don't worry too
much about it as Name
is for humans only and you can change it whenever you want. (Internally everything is tracked using an automatically generated id called JobUUID when first started — do not change this in your config or you will lose history)
"Description":"Maybe not the most important job _ever_, but it is important to me"
Description
is also optional. Helps to convey a bit more detail in the dashboard and in your config, so
a good practice to add.
"Comment":"FIXME"
Comments are only visible within the config file itself - much like a comment in your code. Since json
doesn't support comments, we have included a field that does. You do use comments, right?
Type, Inherits [top]
"Type":"template",
"Name":"TradingConfigEurope"
Now things get interesting. Type
lets you tell rpeat that this job is actually a template, to be used by other
jobs (or other templates) that Inherits
from this job. The rest of the job, umm template, is specified as if it were a job,
except that when it is time to run it, it doesn't get a spot in the scheduler. Multiple Templates can be combined through the Inherits
field.
"Inherits":"TradingConfigEurope"
To use templates you specify the template you want to inherit from. All
fields are copied to this new job, and then anything you specify here overrides. You can only inherit from one template, but since
templates can inherit from other template — recall they are just jobs internally — you can build very complex multiple inheritance
templates to help keep business logic distinct from configuration logic. That or you use global environment variable.. Just kidding, don't
use global variables. Ever. Seriously.
Tags, Group [top]
<Tags>#nofilter</Tags>
<Tags>#noDAGs</Tags>
<Tags>backup</Tags>
Tags
and Group
is all about organizing. Or sometimes about calling out your friends. Tags are whatever you want them to be. They are used to
help find jobs in your dashboard, and are specified as an array of strings.
"Group":["Servers"]
An rpeat Group
is magical. Well, maybe not magical. Group
let you take jobs from many places in your config file (or files) and render them
in one spot in the dashboard. The dashboard also lets you filter your view by group, so it is easy to see what is happening for a
particular set of jobs that are related.
Hold, Disabled, Hidden [top]
"Hold": true
"Disabled": true
"Hidden": true
Controlling visibility and how a job behaves overall is what these are for.
Hold
is the ability to hold a job indefinitely. This is also controllable from dashboard and API, but sometimes it is helpful to start the server with it held
to make sure it doesn't run until you are really ready. Defaults to false. No quotes.
Disabled
is sort of like Hold
, but instead removes the job from the scheduler and dashboard —
but it remains in your configuration file. It's a great way to stub out a job or remove an old one without losing track of it. Like hold,
the default is false so no need to add this unless you want to disable.
Hidden
hides the job from the dashboard, but keeps it in the scheduler. This may be useful for a task that has no reason to be monitored, or
because you are very insecure of the name you gave it. Generally, it is better to control visibility of jobs through the Permission
options offered by rpeat.
Cmd [top]
"Cmd":"/bin/bash -c echo 'this is seriously so much better than pressing buttons.'"
The reason you are here! This is where the action happens. Specified exactly as you would in crontab, it is a system call that keeps track of itself. Biggest caveat is that it runs as the rpeat-server user, and it defaults to not using a shell. Most people want a shell, so generally you specify that first.
One key thing to be aware of though. If you want logging to happen within rpeat — including rotation, visibility, permissioning — don't redirect stdout or stderr as that will cause rpeat to no longer be able to capture it.
ShutdownCmd, ShutdownSig [top]
"ShutdownCmd": "/bin/bash -c /usr/bin/redis-cli -p ${REDIS_PORT} shutdown"
"ShutdownSig": "Interrupt"
"ShutdownSig": "Kill"
"ShutdownSig": "Term"
"ShutdownSig": "SIGINT"
"ShutdownSig": "SIGKILL"
"ShutdownSig": "SIGTERM"
Sometimes you have a long running job. Something like a redis server that has a proper shutdown process.
This is where you specify graceful shutdown calls. Think of it as the opposite of ripping the cord out of the wall to turn off your
server.
ShutdownSig
lets you send a signal to your process. Sometimes this is what a process needs to gracefully exit.
Env [top]
"Env": [
"REDIS_BIN=/usr/bin/redis-server",
"REDIS_PORT=6789"
...
]
"Cmd": "/bin/bash -c ${REDIS_BIN} --port ${REDIS_PORT}"
Another favorite of the rpeat team. Environment variables are nice, sure. But too many, recursively set/clobbered/set are a disaster waiting to happen at 2:30 in the morning. Stop the madness by definining all your variables within the job and you will get a good night's sleep. We promise.
Because none is too few and many is never enough, Env takes an array of "key=value" strings.
(pro hint: put the variables in a reusable template to share amongst all your jobs)
DateEnv [top]
"DateEnv": [
"TODAY=CCYY-MM-DD",
"DATEFILED=CCYY-MM-DD,-4D,MF",
"YM=YY/MM,+2D"
...
],
DateEnv
takes Env one further. These are seriously magical date variables. You can specify a
variable date calculated at runtime, using both your own format, timezone, calendars (oh yes!), and basic date math
and have it available to your command at runtime.
The format is "YOURVARIABLENAME=DATEFORMAT,ADJUSTMENT,CALENDAR"
AlertActions [top]
"AlertActions": {
"OnSuccess": {
"Subject": "Success!"
},
"OnFailure": {
"To": [
"jeff@example.com"
],
"Subject": "Something went wrong!"
...
}
...
}
Alert actions are how rpeat sends alerts for failing or successful jobs. Actually any state that can exist in rpeat can be a trigger to send an alert for. Alert actions are comprised of Alert specifications for zero or more state changes. This is best shown with an example.
By default, if you have an account with rpeat.io, your alerts can be sent to rpeat.Alert. All accounts include a free tier of 20 email and 1000 alerts per month.
It is also easy to change the endpoint of your alert message to point to your own internal message platform.
Timezone [top]
"Timezone":"Asia/Hong_Kong"
Timezones are job-specific, and default to UTC when not specified. There is no excuse to let the system control
when your job runs. The best practice is to use the IANA zoneinfo style such as America/Chicago
or Europe/London. Abbreviations and offsets are less clear and ambiguous.
Calendar, CalendarDirs, Rollback, RequireCal [top]
"Calendar":"us/NYSE",
"CalendarDirs":".rpeat/cals",
Calendars are an often overlooked aspect to scheduling. They are a mainstay of nearly every aspect of life though. Most jobs are likely using resources that may need to coordinate with special calendars (i.e. institutions, government holidays). Standard cron syntax can't accommodate so we added in a very simplistic (and simple!) calendar feature. rpeat ships with some basics, and hosts a community effort to build new ones.
Calendar
describes to calendar to be used, which is located by searching the CalendarDirs
. Depending on
the scheduled time, it may be that you want the next available day or to "Rollback":true
to the day before. And in some cases
your calendar may not be complete, so you can default to ignoring this if outside of range by setting "RequireCal":false
.
CronStart, CronEnd, CronRestart [top]
// [Sec] Min Hour MonDay Mon WeekDay
"CronStart": ["@weekly"]
"CronStart": ["30 59 * * * MF"] // 30s before the hour
"CronStart": ["0 */15 * * M,F"] // run every 15m, Monday and Friday
"CronStart": ["R* 0 * * * MF"] // randonmized second to start (fixed for all future runs)
"CronStart": ["@manual"] // on demand job
"CronStart": ["@every 1h"]
"CronStart": ["@at 2020123235959"] // specific moment to fire (once)
"CronEnd": ["55 23 * * SAT"]
"CronRestart": ["@midnight"]
"CronRestart": ["@eow"]
...
"Jitter": 120 // add up to 120s of jitter to start time
When rpeat should wake to run your job. This is a hard departure from regular schedulers, as each job maintains it's
own timer to know when to awaken. Generally you'll want CronStart
for most jobs, and on occassion a way to either end or restart
long running jobs on some schedule. I tend to use this to restart servers or polling jobs that are only valid certain times of the day.
All follow an extended cron-style syntax, space separated elements representing (1) second, (2) min, (3) hour, (4) month day, (5) month, (6) day of week. If only five fields are passed in (standard cron) it is interpretted as seconds=0. If any field is prefixed with a capital R, e.g. R* or R2-10,
Setting Jitter
to a positive integer will add random number of seconds (aka "jitter") to the start. This is extremely helpful if multiple jobs start
around a particular time, but need to minimize startup load on network, database or just compute.
Dependency [top]
<CronStart>@depends</CronStart>
...
<Dependency>
<Dependencies>
<JobTrigger>
<NameOrUUID>MySQL Server</NameOrUUID>
<Trigger>running</Trigger>
</JobTrigger>
<JobTrigger>
<NameOrUUID>Redis Server</NameOrUUID>
<Trigger>running</Trigger>
</JobTrigger>
<Action>start</Action>
<Condition>all</Condition>
<Delay>10s</Delay>
</Dependencies>
<Dependencies>
<JobTrigger>
<NameOrUUID>NetworkStorageDevice</NameOrUUID>
<Trigger>failed</Trigger>
</JobTrigger>
<Action>stop</Action>
<Condition>any</Condition>
<Delay>0s</Delay>
</Dependencies>
<Dependency>
Dependency is an elegant way to specify an arbitrary set of dependencies to drive an event for your job. It is possible
to make a job start, stop or even restart if one or more other jobs are Running, Successful or Failed.
This is ideal for starting services that need other services to be
running - or for shutting down services if another becomes unavailable. All can then trigger additional alerts.
Retry, RetryWait [top]
"Retry":2
"RetryWait":"1m"
"RetryWait":"1m30s"
"RetryWait":"2h"
Retry tells the scheduler to attempt to restart a job that has failed. This is the number of attempts to try, with
RetryWait
being the duration between attempts. This is useful when you may have a network issue, or another server
currently offline.
MaxDuration [top]
"MaxDuration":"45m"
MaxDuration is sort of the opposite of a Retry
and more like CronEnd
. This sets an upper limit on
how long a job should run before being stopped by rpeat. This corresponds to a OnEnd event in the Alert system.
TmpDir, Logging [top]
"DateEnv": [
"TODAY=CCYYMMDD"
]
"TmpDir": "/home/temp",
"Logging": {
"StdoutFile": "${TODAY}.stdout.log",
"StderrFile": "${TODAY}.stderr.log",
"Append": false,
"Purge": "7d"
}
Most jobs will produce some sort of outout - either as the product you are expecting, or a by-product of the process itself
in the form of standard out and standard err streams. rpeat provides access to these logging events by capturing the process's
stdout and stderr while also writing to the locations you specify (or a default from the system). Logs are owned by the rpeat
process user, but are permissioned for viewing using the dashboard. Logs can be specified to append only, and have an optional
rotation schedule to make sure old logs don't continue to take up space.
User, Admin, Permissions [top]
"User": "jeff",
"Admin": [
"sara"
"jeff"
],
"Permissions": {
"start": [
"sara"
]
"stop": [
"sara"
]
"all": [
"jeff"
]
}
Since jobs have the option of being controlled from a browser interface and API, permissions are paramount. In fact, this was the very first consideration for rpeat on day one. Not only is everything running over TLS by default, but locally we maintain granular authorization on a job by job basis. This even provides visibility controls so sensitive jobs or IP is easy to protect.
Admin
is who has access to a job, with User
always having complete
control. Permissions
grants specific actions to individual users, which
make it easy to give certain users only access to view and restart a stuck job, but not to
stop or hold the job. You can even grant access to job details pages as well as log files
by user.