Compare commits

...

23 Commits

Author SHA1 Message Date
601b70a35d better format how it works area. Also explain some scripts better. I forget that videoID and channelID and so many ID's get's confusing to work with. 2025-10-25 10:28:43 -04:00
5e2acc054d edit values so understand better from external view. 2025-10-25 10:27:54 -04:00
6c87737393 update formatting.. 2025-10-25 10:00:16 -04:00
46268af42d Update example input filename and output. 2025-10-25 09:58:47 -04:00
799cf76b44 Update example you will see from elasticsearch 2025-10-25 09:55:52 -04:00
939cc4c5ee fix link so shows on github correctly too. 2025-10-24 20:19:18 -04:00
3c28f1aa76 fix formatting in how it works area, and add additional comments. 2025-10-24 20:17:05 -04:00
09fcf0fe6d add link to ElasticSearch-Common-Commands.md 2025-10-24 20:11:04 -04:00
90a0f6514d initialize file, input basic data like commands used, and ways to get api key. Althought with user:pass shouldn't really need api key just wanted to know how to do it. 2025-10-24 20:00:01 -04:00
372376b445 Example Channel info from elasticsearch curl call. 2025-10-24 19:53:50 -04:00
337fd19dc7 New script, explain documentation. also renamed a couple files. 2025-10-22 20:50:35 -04:00
5793523215 add other optional data you may want to copy to all files. Or possibly adding tags pulled from another file, categories, description, etc. 2025-10-22 19:55:21 -04:00
3474918972 rename file 2025-10-22 19:46:07 -04:00
59997a8831 rename file 2025-10-22 19:45:57 -04:00
67ed4869ea move/find id to end of filename. In my situation it was between second and third " - " So splits apart and put back together. 2025-10-22 19:45:41 -04:00
e7fd10d3a5 in order to import this archive
https://archive.org/details/TerryDavisAltChannel
I had to run this script. Now new error tho
2025-10-21 22:22:48 -04:00
6bc9772f58 example .info.json file to start with that worked for me. 2025-10-21 19:46:15 -04:00
1aa0388e80 Insert the date (if available) into the sidecar JSON. 2025-10-21 19:46:00 -04:00
08ffabe0d6 Insert the cleaned title into the sidecar JSON. 2025-10-21 19:45:48 -04:00
d5bfda08c2 Populate the sidecar JSON with the video ID field. 2025-10-21 19:45:38 -04:00
6800d6f979 Create an empty .info.json file for each video filename (sidecar). 2025-10-21 19:45:27 -04:00
af5ebf21f1 Ensure the video ID appears at the end of the filename inside square brackets. 2025-10-21 19:45:17 -04:00
700f36c2c5 initialize file 2025-10-21 19:44:32 -04:00
11 changed files with 420 additions and 1 deletions

View File

@@ -0,0 +1,86 @@
# ElasticSearch Commands to get you into using it.
Personally, I run Elasticsearch, Kibana, Metricbeat, and Filebeat in a single docker-compose stack managed with Portainer. Kibana is useful for viewing data, although I don't like that it doesn't let you edit data. I connect TubeArchivist using the `elastic` password generated by the compose stack. I also want to use Elasticsearch for other purposes and avoid running a separate instance.
From kibana i just created an api key for ta_channels to update data within them. Here's a curl command to generate an api key without kibana below.
## Create API key scoped to specific indices (HTTP)
```
curl -s -u 'elastic:ELASTIC_PASS' \
-H 'Content-Type: application/json' \
-X POST 'http://localhost:9200/_security/api_key' \
-d '{
"name": "ta_scoped_key",
"expiration": "30d",
"role_descriptors": {
"ta_scoped_role": {
"cluster": ["monitor"],
"index": [
{ "names": ["ta_channels_*"], "privileges": ["read","write"] },
{ "names": ["ta_metadata"], "privileges": ["read","write","create_index"] }
]
}
}
}'
```
### HTTPS (with CA)
```
curl -s --cacert /path/to/chain.pem -u 'elastic:ELASTIC_PASS' \
-H 'Content-Type: application/json' \
-X POST 'https://localhost:9200/_security/api_key' \
-d '{"name":"ta_scoped_key","expiration":"30d","role_descriptors":{"ta_scoped_role":{"cluster":["monitor"],"index":[{"names":["ta_channels_*"],"privileges":["read","write"]}]}}}'
```
Save the JSON response (it contains id and api_key), then build the ApiKey header:
Looks like this:
```
{"id":"F0eWBJ0BLX_vEATxQJuu","name":"ta_scoped_key","expiration":1763932732593,"api_key":"39RandomLettersandNumbers","encoded":"60RandomNumbersandLettsasldkfjwithA=="}
```
Use the 'encoded' key and not the 'api_key'. Not sure why, but that's what I had to use to work.
## Test using the API key
curl -s -H "Authorization: ApiKey $AUTH" http://localhost:9200/_security/_authenticate
## Creating Another User
```
curl -u 'elastic:Yourhardrandompassword' \
-X POST "http://localhost:9200/_security/user/sickprodigy" \
-H 'Content-Type: application/json' \
-d '{"password":"PasswordforUser","roles":["my_readonly_role"],"full_name":"Sick Prodigy","email":"sick@sickgaming.net"}'
```
## Creating another user with full Privs (SuperUser)
I prefer to have a user with full privs other than elastic, although TubeArchivist apparently uses elastic(default superuser)
```
curl -u 'elastic:Yourhardrandompassword' -X POST "http://localhost:9200/_security/user/sickprodigy" \
-H 'Content-Type: application/json' \
-d '{
"password": "SomeHardPassword",
"roles": ["superuser"],
"full_name": "SickProdigy",
"email": "sickprodigy@sickgaming.net"
}'
```
### Query certain channel within ta_channel:
The channel ID can be found on TubeArchivist, got to channel and in URL "https://tubearchivist.rcs1.top/channel/UChOve2dsTRMrW8DslLKJ9eg" after channel/ is channel ID. You can test around with query and see what comes back, but this usually bring back the exact channel you want.
```
curl -X POST "http://es:9200/ta_channel/_search?pretty" \
-H "Authorization: ApiKey "YourRandomAPIkey123455123123=="" \
-H "Content-Type: application/json" \
-d'
{
"query": {
"query_string": {
"query": "Channel ID"
}
}
}'
```
You should get back an example like this: [ Example-channel-info-elasticsearch.json](Example-channel-info-elasticsearch.json)
I'm using this as an example to update channels in TubeArchivist that are missing data. I wish kibana would let me do it. Maybe I can Just haven't figured it out yet.

View File

@@ -0,0 +1,64 @@
{
"_index" : "ta_channel",
"_id" : "UC1ORA3oNGYuQ8yQHrC7MzBg",
"_score" : 6.799427,
"_source" : {
"channel_name" : "syncbricks",
"channel_thumb_url" : "https://yt3.googleusercontent.com/ytc/AIdro_k5ySAgLYNTLWCkt7ZNBCpQLgEdh8nd2HbVf93PIYm8qQ=s900-c-k-c0x00ffffff-no-rj",
"channel_active" : true,
"channel_description" : "syncbricks offers training, reviews, tutorials and guides on core IT, Application development, Application Maintenance, Data Centre , Cloud Services, Automation, IoT, Cyber security, Infrastructure Management, Business Intelligence, Business Process Management, Project Management, IT consulting, Open source, brands, products, servers, hosting, gadgets, service providers and much more.\nWe are focused on innovation and integration. \nOn timely basis we will be uploading Training Video, Tutorials, Speeches, Tech and Business Reviews, Product Reviews, Travel and More.",
"channel_subscribed" : false,
"channel_subs" : 35600,
"channel_views" : 0,
"channel_tabs" : [
"videos",
"streams",
"shorts"
],
"channel_last_refresh" : 1746300537,
"channel_banner_url" : "https://yt3.googleusercontent.com/oOzViYjjHZb1XKBhEhAkp36062e2NUJiS37InRd6JBhmXeba-n8QXl2YZSMB9HdUvT8_5_59=w2560-fcrop64=1,00005a57ffffa5a8-k-c0xffffffff-no-nd-rj",
"channel_tvart_url" : "https://yt3.googleusercontent.com/oOzViYjjHZb1XKBhEhAkp36062e2NUJiS37InRd6JBhmXeba-n8QXl2YZSMB9HdUvT8_5_59=s0",
"channel_id" : "UC1ORA3oNGYuQ8yQHrC7MzBg",
"channel_tags" : [
"tech tutorials",
"self-hosted automation",
"n8n",
"docker",
"proxmox",
"home lab",
"network security",
"open source tools",
"IT infrastructure",
"AI automation",
"sysadmin tips",
"Linux",
"server virtualization",
"pfSense",
"OPNSense",
"SyncBricks",
"automation tools",
"Proxmox VE",
"tech setup",
"n8n",
"firewall",
"tech tutorials",
"n8n",
"docker",
"home lab",
"IT infrastructure",
"AI automation",
"server virtualization",
"pfSense",
"SyncBricks",
"Amjid Ali",
"Proxmox VE",
"docker compose",
"n8n",
"tutorial",
"linux",
"open source",
"ngw",
"firewall"
]
}
}

14
Example.info.json Normal file
View File

@@ -0,0 +1,14 @@
{
"id": "Scripted in Video ID",
"channel_id": "Change to Youtube Username",
"uploader": "Change to Youtube Username",
"uploader_id": "Change To Channel ID",
"uploader_url": "https://www.youtube.com/channel/ChangeToChannelID-or-username",
"title": "Scripted in Title",
"description": null,
"upload_date": "ToBeScriptedInYYYYMMDD",
"categories": null,
"tags": null,
"thumbnail": null,
"view_count": null
}

View File

@@ -1 +1,76 @@
Test
# Tube-Archivist Scripts
Small collection of Bash helpers used to prepare offline / archived YouTube videos for import into TubeArchivist. Written for Debian-like systems; should work in other Linux distributions with Bash and standard GNU utilities.
---
## Goal
Normalize filenames and create accompanying metadata (.info.json) so TubeArchivist can ingest local archives (especially those from archive.org or other offline sources).
Example input filename:
- Example A: `20170311 (5XtCZ1Fa9ag) Terry A Davis Live Stream.mp4`
- Example B: `20131003 - 001 - 1okW1RTPZ7Q - TempleOS Hymns #1.mp4`
Resulting filename and sidecar JSON:
- Example A:
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].mp4`
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].info.json`
- Example B:
- `20131003 - 001 - TempleOS Hymns #1 [1okW1RTPZ7Q].mp4`
- `20131003 - 001 - TempleOS Hymns #1 [1okW1RTPZ7Q].info.json`
---
## How it works / Usage
1. Put all the scripts in the directory with your video files (scripts currently do not recurse into subdirectories).
2. Edit 'Example.info.json'
- Update these lines (And also any other lines you want copied to each video that won't be scripted in. Null values I didn't have data for yet.)
```
"channel_id": "Change to Youtube Username",
"uploader": "Change to Youtube Username",
"uploader_id": "Change To Channel ID",
"uploader_url": "https://www.youtube.com/channel/ChangeToChannelID-or-username",
```
3. Run the scripts in order from the directory containing your media below:
Each script performs a single transformation so you can inspect results between steps.
## Scripts (order and purpose)
1a. `convert-()-to-[].bash`
- Replace parentheses containing an ID with square brackets (e.g. `(ID)` -> `[ID]`) and clean spacing.
- If already have id at end skip to 3.
1b. `move-find-id-to-end-filename.bash`
- Split filename into parts. Find video id between second and third " - " without brackets, adds backets, moves [id] to end of filename before extension.
- Skip 1a/2a, straight to 3.
2a. `move-[id]-to-end-filename.bash`
- Ensure the video ID appears at the end of the filename inside square brackets.
3. `create-json-alongside-each-file.bash`
- Create an empty `.info.json` file for each video filename (sidecar).
4. `insert-id-into-json.bash`
- Populate the sidecar JSON with the video ID field.
5. `insert-title-into-json.bash`
- Insert the cleaned title into the sidecar JSON.
6. `insert-date-into-json.bash`
- Insert the date from filename (if available) into the sidecar JSON.
---
## Notes and tips
- Scripts do not process subdirectories. Run at the directory root for each archive.
- Always test on a copy or run a subset first to confirm behavior.
- If filenames contain unusual characters, run a quick grep for non-ASCII prior to processing.
- Modify scripts to add dry-run mode if you want safer previews.
- ElasticSearch Common Commands for updates: [ElasticSearch Common Commands](ElasticSearch-Common-Commands.md)
---
## Example archive
Archive used for testing:
`https://archive.org/details/TempleOS-TheMissingVideos`
---

View File

@@ -0,0 +1,21 @@
#!/bin/bash
template="Example.info.json"
# Verify template exists
if [ ! -f "$template" ]; then
echo "Error: $template not found."
exit 1
fi
# Collect all matching video files safely
for f in *.mp4 *.mkv *.mov *.avi; do
# Skip if no matching files
[ -e "$f" ] || continue
[ -f "$f" ] || continue
base="${f%.*}" # Remove extension
target="${base}.info.json" # Construct new name
cp -- "$template" "$target"
echo "Created: $target"
done

View File

@@ -0,0 +1,33 @@
#!/bin/bash
for f in *.info.json; do
[ -f "$f" ] || continue
# Extract the uploader_url line safely using jq if available
if command -v jq >/dev/null 2>&1; then
uploader_url=$(jq -r '.uploader_url // empty' "$f")
else
# fallback to grep/sed if jq not present
uploader_url=$(grep -oP '(?<="uploader_url": ")[^"]+' "$f")
fi
# Skip if no uploader_url found
[ -z "$uploader_url" ] && { echo "No uploader_url in $f, skipping."; continue; }
# Extract the channel ID (everything after '/channel/')
channel_id=$(echo "$uploader_url" | sed -n 's|.*/channel/\([^/"]*\).*|\1|p')
# Skip if extraction failed
[ -z "$channel_id" ] && { echo "No channel ID found in $f, skipping."; continue; }
# Update JSON with channel_id field
if command -v jq >/dev/null 2>&1; then
tmpfile=$(mktemp)
jq --arg cid "$channel_id" '.channel_id = $cid' "$f" > "$tmpfile" && mv "$tmpfile" "$f"
echo "Inserted channel_id into $f: $channel_id"
else
# Simple sed fallback (assumes JSON is flat)
sed -i "/\"uploader_url\"/a \ \ \"channel_id\": \"$channel_id\"," "$f"
echo "Inserted channel_id into $f (using sed): $channel_id"
fi
done

View File

@@ -0,0 +1,29 @@
#!/bin/bash
for f in *.info.json; do
[ -f "$f" ] || continue
# Extract title without extension
title="${f%.info.json}"
# Remove trailing ID if present
title="${title% \[*\]}"
# Extract the date at the beginning of the title (YYYYMMDD)
if [[ "$title" =~ ^([0-9]{8}) ]]; then
upload_date="${BASH_REMATCH[1]}"
else
echo "No date found in $f, skipping."
continue
fi
# Update JSON upload_date
if command -v jq >/dev/null 2>&1; then
tmpfile=$(mktemp)
jq --arg date "$upload_date" '.upload_date = $date' "$f" > "$tmpfile" && mv "$tmpfile" "$f"
echo "Updated upload_date in $f: $upload_date"
else
# Simple sed fallback (only for simple JSON)
sed -i "s/\"upload_date\": *\"\"/\"upload_date\": \"$upload_date\"/" "$f"
echo "Updated upload_date in $f: $upload_date (using sed)"
fi
done

23
insert-id-into-json.bash Normal file
View File

@@ -0,0 +1,23 @@
#!/bin/bash
# Loop over all .info.json files in the current directory
for f in *.info.json; do
[ -f "$f" ] || continue # skip if no files
# Extract ID from filename: match [ID] at the end (before .info.json)
id=$(echo "$f" | sed -n 's/.*\[\([^]]*\)\]\.info\.json/\1/p')
# If no ID found, skip this file
[ -z "$id" ] && continue
# Use 'jq' to safely update the "id" field
if command -v jq >/dev/null 2>&1; then
tmpfile=$(mktemp)
jq --arg newid "$id" '.id = $newid' "$f" > "$tmpfile" && mv "$tmpfile" "$f"
echo "Updated $f with id: $id"
else
# If jq not installed, fallback with sed (assumes simple JSON format)
sed -i "s/\"id\": *\"\"/\"id\": \"$id\"/" "$f"
echo "Updated $f with id: $id (using sed)"
fi
done

View File

@@ -0,0 +1,20 @@
#!/bin/bash
for f in *.info.json; do
[ -f "$f" ] || continue
# Extract title: remove .info.json and remove ID in brackets at the end
title="${f%.info.json}" # Remove extension
title="${title% \[*\]}" # Remove trailing [ID]
# Update JSON title only
if command -v jq >/dev/null 2>&1; then
tmpfile=$(mktemp)
jq --arg newtitle "$title" '.title = $newtitle' "$f" > "$tmpfile" && mv "$tmpfile" "$f"
echo "Updated title in $f: $title"
else
# Simple sed fallback
sed -i "s/\"title\": *\"\"/\"title\": \"$title\"/" "$f"
echo "Updated title in $f: $title (using sed)"
fi
done

View File

@@ -0,0 +1,28 @@
#!/bin/bash
for f in *; do
# Skip directories
[ -f "$f" ] || continue
# Extract the ID inside () or []
id=$(echo "$f" | sed -n 's/.*[([]\([^])]*\)[])].*/\1/p')
# If there's no ID, skip
[ -z "$id" ] && continue
# Remove the ID portion (and any leftover extra spaces)
base=$(echo "$f" | sed 's/[([][^])]*[])]//g' | sed 's/ / /g' | sed 's/ *$//')
# Separate name and extension
name="${base%.*}"
ext="${base##*.}"
# Rebuild new name (handle files with and without extensions)
if [ "$name" != "$ext" ]; then
newname="${name} [${id}].${ext}"
else
newname="${base} [${id}]"
fi
# Only rename if different
[ "$f" != "$newname" ] && mv -- "$f" "$newname"
done

View File

@@ -0,0 +1,26 @@
#!/bin/bash
shopt -s nullglob
for f in *.mp4 *.mkv *.mov *.avi; do
[ -f "$f" ] || continue
base="${f%.*}"
ext="${f##*.}"
# Match: date - number - id - rest
# Example: 20140720 - 097 - oaHzqMgnI70 - TempleOS - God for Larry Page 7_20 K
if [[ "$base" =~ ^([^[:space:]]+)[[:space:]]*-[[:space:]]*([^[:space:]]+)[[:space:]]*-[[:space:]]*([A-Za-z0-9_-]+)[[:space:]]*-[[:space:]]*(.*)$ ]]; then
datepart="${BASH_REMATCH[1]}"
numpart="${BASH_REMATCH[2]}"
id="${BASH_REMATCH[3]}"
rest="${BASH_REMATCH[4]}"
newname="${datepart} - ${numpart} - ${rest} [${id}].${ext}"
echo "Would rename: $f$newname"
# Comment next line to test without renaming
mv -i -- "$f" "$newname"
else
echo "Skipping (pattern mismatch): $f"
fi
done